Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Upsert dataset #12

Merged

Conversation

abigailalexander
Copy link

We often upsert payloads into Scicat, especially when the datasets have been modified or appended. We are in the process of refactoring code to use Pyscicat and would make good use of an upsert functionality.

Here is our attempt, adding in upsert_dataset, upsert_raw_dataset and upsert_derived_dataset functions along with a test case. They utilise the already existing get_dataset function.

Copy link
Collaborator

@dylanmcreynolds dylanmcreynolds left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems like you might want to just support the single generic call that supports Raw or Derived, or the two separate calls for Raw and Derived. But perhaps I'm missing something.

pyscicat/client.py Outdated Show resolved Hide resolved
pyscicat/client.py Outdated Show resolved Hide resolved
@dylanmcreynolds
Copy link
Collaborator

I see that the build is failing on the pre-commit step. I like the idea of the pre-commit step, but it has been really buggy. This particular error was about a dependency of black that needs to be pinned.

I have removed the pre-commit step in gh actions in #15 . If you merge this into your branch, you can probably get the build running on it.

@abigailalexander
Copy link
Author

Thank you for your comments, I will implement changes based upon them. Regarding the generic versus raw/derived upsert functions, I included both because I wasn't sure which way would be preferred. Is there a preference in the existing Scicat code for one method over another?

@dylanmcreynolds
Copy link
Collaborator

Funny, I just noticed now that the upload follows the same pattern of three functions. I don't have a strong opinion. I think it's a little unfortunate that the backend API supports this same pattern as well. I think I personally prefer two separate methods.

@abigailalexander
Copy link
Author

Okay, I'll go ahead with the two separate methods then. That seems the best approach to me as well. I've made the modifications suggested r.e. allowing the upsert method to insert if no record exists already (which is what it should do, you're correct) and the two separate functions. It's likely these will fail on pre-commit again however, as I have yet to merge the removal of pre-commit, apologies.

@abigailalexander
Copy link
Author

With the pre-commit removal merged, it looks like it passes all the checks now. Hopefully I've implemented all the changes you suggested correctly, many thanks for your help.

@dylanmcreynolds
Copy link
Collaborator

An interesting note...there is an upcoming new version of the scicat backend (https://github.com/SciCatProject/scicat-backend-next) which will no longer have separate URLs for Raw vs. Derived datasets. Pyscicat still needs to for now, but in the future, we might remove separate upload functions for Raw and Derived to match that pattern.

@dylanmcreynolds dylanmcreynolds merged commit c1e5369 into SciCatProject:main Jun 1, 2022
@dylanmcreynolds
Copy link
Collaborator

I'll merge this now. Thanks for the PR and sorry it took so long! As a side note, I'm thinking about consolidating upsert and upload methods to match the new (but not yet released) version of the backend. See #18

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants