Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

978 Migrate Local Users, Datasets, Folders and Files #1058

Open
wants to merge 47 commits into
base: release/v2.0-beta-3
Choose a base branch
from

Conversation

tcnichol
Copy link
Contributor

@tcnichol tcnichol commented May 20, 2024

To test:

If you have a local running V1 instance, you can use that. And take a look at notes below to properly set up V1 instance.
↓↓↓↓

Otherwise you can use the demo instance if possible.

  • Create an .env file and place it under the migration folder
  • This .env should look like below
CLOWDER_V1=https://clowder.ncsa.illinois.edu/clowder
ADMIN_KEY_V1={v1 api key}

CLOWDER_V2=http://0.0.0.0:8000
ADMIN_KEY_V2={v2 api key}

Adjust the v1 users line for testing


The first part of user migration. Here we will migrate users from a clowder v1 instance into a clowder v2 instance. This works for local users, datasets, folders, and files.

I have refactored this so that there is a 'migrate_user' method. This will probably be a useful pattern to follow since we might want to handle CILogon users differently. We also might want to make migration something a user can initiate from another instance, where there might be options they need to select. This might become important once we get into collection hierarchies, how to handle spaces and sharing, metadata, and then things like licenses.

A later pull request will address collection hierarchies, metadata, CILogon accounts, and other features.

To test this, you will want to run a clowder v1 instance. Since both need to run a different version of MongoDB, change this in the .yml file for the dependencies in v1.

  # database to hold metadata (required)
  mongo:
    image: mongo:3.6
    restart: unless-stopped
    networks:
      - clowder
    command: mongod --port 27018
    ports:
      - '27018:27018'
    volumes:
      - mongo:/data/db

also change this in application.conf

mongodbURI = "mongodb://127.0.0.1:27018/clowder"

You will also need to do add this line in application.conf so that the emails will be printed in terminal

smtp.mock=true

It can be added anywhere.

To run v1, use

docker-compose up mongo -d

And then run clowder v1 using IntelliJ.
If you add entries to the v1 instance, try migrating them.

Also, you will need an admin user from the v1 instance and an api key to be placed in a .env file.

This can also be tested with another instance. The migration is handled using the API so you can use an existing instance and try to migrate.

TODO:

@tcnichol tcnichol linked an issue May 20, 2024 that may be closed by this pull request
@tcnichol tcnichol requested a review from lmarini May 20, 2024 18:02
@tcnichol tcnichol changed the title 978 migrate users 978 Migrate Local Users, Datasets, Folders and Files Jun 3, 2024
@tcnichol tcnichol marked this pull request as ready for review June 3, 2024 18:45
this might need to be modified for CILogon, but this will work for local users
@tcnichol tcnichol changed the base branch from main to release/v2.0-beta-3 August 6, 2024 15:47
@longshuicy
Copy link
Member

Move descriptions from other PRs over there

This pull request was created against @longshuicy branch for the migration.
To migrate spaces to groups, I have left the migration of datasets, files etc as before. Added the creation of 2 dictionaries that map the user_v1 ids to the user_v2 ids, and the dataset ids from v1 to datasets from v2.
After the datasets and users are migrated, I go through the spaces of the users in v1 and use them to create groups in v2, and then map the right users into the groups and then the groups are shared with the new datasets migrated to v2. This way we don't do anything with spaces until all the users and data are moved, so that we don't miss anything.

I am working on a branch that I created from this one for spaces.
Something I noticed - right now I create user spaces and make them groups in processing the user. This would mean that if a user created a space, then all the datasets that the user created will be added properly.
But if a dataset is in the space and that dataset is from a user that hasn't been migrated yet, then it wouldn't be in the space. And we cannot add a user that isn't in v2.
So I'm thinking that somehow we should temporarily save the api keys for users, and do spaces after everything else? I'll be changing my branch to fit that.

This pull request is for migrating spaces. It's from this branch.
I tested and so far, datasets, files, folders, users are migrated correctly, and this one will change spaces to groups. I had to handle the groups after all users and datasets were created, because otherwise the users and datasets that need to be in the group in v2 might not exist yet.
#1186

@tcnichol
Copy link
Contributor Author

On spaces, right now if you want to test spaces you'll need to do it locally with this branch:

clowder-framework/clowder#453

Also add the changes to v1 mentioned above.

Currently v1 api does not return creator of spaces with space api calls. The above pull request adds that.

longshuicy and others added 2 commits August 29, 2024 09:12
* dataset metadata is working

* register migration extractor and successfully migrate machine metadata
* match other branch

* adding new collection metadata

* str not float

* getting collections for a dataset in v1, fixing metadata for collections

* posts collection name and id

* adding routes for getting collections

* making a method like the one in v1 for self and ancestors.
it will be easier to build a collection hierarchy from this

* sample json for mdata

* posts collection name and id

* building the data for collections

* something works now

* matching with other branch

* methods for migrating collections as metadata

* need to post it as metadata

* change name

* adding the metadata for collections

* adding context url and right endpoint

* getting spaces as well as collections

* change name

* remaning method

* created v2 license based on v1 license details (#1193)

Co-authored-by: Chen Wang <[email protected]>

* removing print statements

* better error logging

---------

Co-authored-by: Dipannita <[email protected]>
Co-authored-by: Chen Wang <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

migrate local users, datasets, folders and files
3 participants