Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support Studio datasets in Python #874

Open
shcheklein opened this issue Jan 31, 2025 · 2 comments · May be fixed by #901
Open

Support Studio datasets in Python #874

shcheklein opened this issue Jan 31, 2025 · 2 comments · May be fixed by #901
Assignees
Labels
enhancement New feature or request

Comments

@shcheklein
Copy link
Member

Description

Now AFAIU it is required to run dc pull (or clone? what is the difference @ilongin ?) to be able then in code do DataChain.from_dataset("name").

Let's avoid this manual step and clone dataset automatically if token is set into the local DB (also clone the updated version). Consider remote datasets as an extension to local.

That would simplify the workflow quite a lot.

Btw how do we handle version collisions atm @ilongin ?

@shcheklein shcheklein added the enhancement New feature or request label Jan 31, 2025
@amritghimire amritghimire self-assigned this Feb 5, 2025
@ilongin
Copy link
Contributor

ilongin commented Feb 6, 2025

To me it makes sense to instantiate automatically, but this needs to be communicated to the user somehow I think, with some message as it will slow down process a lot.
Currently, if dataset with same name/ version exist locally but it has different uuid (it's not the same dataset) the message is shown to the user saying something like: "Local dataset dogs@v5 already exists with different uuid, please choose different local dataset name or version"

amritghimire added a commit that referenced this issue Feb 6, 2025
If the following case are met, this will pull dataset from Studio.
- User should be logged in to Studio.
- The dataset or version doesn't exist in local
- User has not pass studio=False to from_dataset.

In such case, this will pull the dataset from studio before continuing
further.

The test is added to check for such behavior.

Closes #874
@amritghimire amritghimire linked a pull request Feb 6, 2025 that will close this issue
@amritghimire
Copy link
Contributor

Created a pull request at #901

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants