UI/UX improvements for zarr uploads (validation, feedback loop) #1866
Replies: 7 comments
-
I think the reason for the number-of-[bytes|files] inconsistency is because of how we track Zarr archives separately from all other file types. There are reasons for this, but it is clear that that decision has led to confusing things like this (if my attribution here is correct), which I would indeed consider a bug. Thanks for the detailed trail of activity you engaged in to reproduce the behaviors you're talking about. However, it's hard for me to be certain just which things you are reporting here as bugs or difficulties. I'd suggest that we meet if you want to show me interactively exactly where you ran into frustration, or update your issue description with a summary of incorrect behaviors and/or changes you'd like to see. To kick things off, I agree that the And, I believe you are reporting inconsistent behavior in steps 6 and 7 of your repro workflow. One by one:
What were you expecting in place of this situation? I assume it's something like: "if the zarr archive passed validation, why is it showing validation errors after upload?" But, I'm not sure why you wouldn't expect it to have been uploaded, given that the CLI passed it as valid. Still, I can sense the general contradiction here.
Connected to the above, what was your expectation here? Some background may be helpful (subject to what you expected to see): we currently don't allow publishing of Zarr-containing Dandisets, as you mentioned. One mechanism to enforce this is validation--in this case, a "fake" validation that simply indicates that the Dandiset contains any assets at all. Once we figure out a way to deal with the size and peculiarities of Zarrs, we intended to remove that ad-hoc validation step, thus clearing all otherwise-valid, Zarr-containing Dandisets to become publishable. Does that help to account for the oddities you're experiencing? |
Beta Was this translation helpful? Give feedback.
-
Thanks for all the feedback @waxlamp -- it might be more effective to walk through interactively with @kabilar if possible to refine where we want to go with this in the short-term (and a sanity check for To your question of: are these bugs vs. confusion -- we would classify this as confusion (we also noticed in the handbook that there isn't much explanation for how/where handling of Apologies for the confusion for Steps 6 & 7 above in the description of Issue -- to simplify, those steps signified "hey, although the UI/UX feedback displayed some 'errors', upload in all the right places worked 😄 " -- I think that is our hope to convey that to our end users if their zarrs are valid. I'll start to draft some related Issues for |
Beta Was this translation helpful? Give feedback.
-
something possibly obvious but worth reiterating -- you are most welcome to propose a PR. |
Beta Was this translation helpful? Give feedback.
-
@aaronkanzer, should we schedule a meeting to go through this stuff? |
Beta Was this translation helpful? Give feedback.
-
FWIW: apparently we have a good number of broken zarrs in the archive/000108 per with chatgpt we came up with this crude "checker of the structure": https://github.com/dandi/zarr-manifests/blob/master/validate_zarr.py which unfortunately doesn't trigger on that initial reported bad zarr unless we re-enable loading that "slice" ... so smth to be figured out about that. But the point is that we might want to look into some relatively speedy validation to be done on zarrs in the archive to validate their internal integrity. |
Beta Was this translation helpful? Give feedback.
-
other related references: |
Beta Was this translation helpful? Give feedback.
-
@aaronkanzer I converted this from an issue to a discussion, since I'm not sure exactly what actions are coming out of your observations yet. I'll reiterate my question from above: should we schedule a meeting to go over this stuff? (I think the answer is "yes" 🙂) |
Beta Was this translation helpful? Give feedback.
-
Summary
(I am a sample size of 1 opinion, so feel free to push back 😄 , nevertheless here are the details)
As of now, dandisets with only zarr files remain in
draft
state due to their inability to be versioned. This is intended. However, during upload, this isn't 100% expressed with misleading validation provided to the end user. Fortunately, the upload process for the zarr files uploading to S3 and Dandi works successfully (the data/files are there after the user invokes an upload)The initial goal of this Issue would be to discuss the appropriate UI/UX for this scenario (Cc @kabilar @waxlamp @yarikoptic @satra ). The issue was initially observed in
linc-archive
(a fork ofdandi-archive
) but then replicated indandi-archive
production environment. @kabilar and I are curious to get the rest of the Dandi team's thoughts before proceeding further.Details
This was discovered via the following workflow:
dandi download <dandiset-url/draft>
to get thedandiset.yaml
locallydandiset.yaml
dandi validate .
to confirm that the file is validdandi upload
to upload the data to S3/dandi-archiveFiles
The response during
dandi upload
is the following, where the dandiset.yaml is supposed to be edited online.Visiting the UI after getting that response via
dandi upload
, it isn't 100% apparent where the user would go to edit thedandiset.yaml
The next observance is a failed validation error due to the values of
assetsSummary.numberOfBytes
&assetsSummary.numberOfFiles
remaining as zero even though data and files have been uploaded -- validation error stems from this line of code.This workflow/feedback loop seems that it could be improved for the end user -- I assume we want to provide certainty and confidence that their zarr uploads worked as intended.
As an aside, one of the inconsistencies we noticed was with dandisets that contain both
zarr
and non-zarr files. It seems that they create a false positive with Asset Summary passing validation since both bytes and files are updated to be greater than zero.Next Steps
@aaronkanzer to further investigate why
assetsSummary.numberOfBytes
&assetsSummary.numberOfFiles
remain at zero even after successful upload (e.g. nwb files work in this case, so why not zarr...)Appendix / References
To create the
zarr
asset in this workflow, the following Python script was used:Beta Was this translation helpful? Give feedback.
All reactions