-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking Zarr 3.0.0 Performance and Integration #352
Comments
Just a comment that they are not planning on implmenting a generic object store in zarr 3.0.0 due to security reasons. I'm curious if something like https://zarr.readthedocs.io/en/stable/api/zarr/codecs/index.html#zarr.codecs.VLenBytesCodec or https://zarr.readthedocs.io/en/stable/api/zarr/codecs/index.html#zarr.codecs.VLenUTF8Codec would satisfy most of our usecases. |
Indeed, it is worth making the distinction between ragged array (array with variable length axis of "standard dtype") and object dtype.
|
This seems realtively easy, there is already the VLenBytesCodec which seems like it could be easily extended. Although like the UTF encoding it isn't directly supported.
I think that this case was always going to be a problem. Maybe this isn't something that we should be supporting as that sounds like a fairly big security risk. Someone could potentially make a component that pickels some object and then runs some malicious code.
The array of string dtype is supported via VLenUTF8Codec although that is "tecnically" not part of the V3 standard https://zarr.readthedocs.io/en/stable/_modules/zarr/codecs/vlen_utf8.html#VLenUTF8Codec |
That does scare me a little bit though... I'd rather not write a bunch of files which end up being unreadable via other readers... Although maybe as long as we can ready every file version then there isn't a huge risk of losing data. |
Oddly enough, blosc + hdf5 actually seems fairly promissing and is closer to what I expected from zarr3... https://www.blosc.org/posts/pytables-b2nd-slicing/. Especially the 2 level blocking strucuture. @magnunor This could solve the endless equal sized chunks vs chunks which span the signal dimensions debate :) I've been testing this a little and hope to make a little bit of a write up which I can post. Maybe this deserves more of a discussion elsewhere... |
Describe the functionality you would like to see.
I'm going to try to implement support for zarr 3.0.0 over the next couple of days. I'll try to test performance here and see if I can make a small guide to optimizing performance. Specifically, I want to look at optimal sharding for 4D datasets for helping with efficient data slicing and to improve storage on windows computers. I'm not sure how
zarr
+dask
+ sharding will ultimately preform but I assume thatdask
is not quite smart enough to handle that effectively, however, if the perfromance is good enough it might be worth returning the zarr array rather than automatically converting it to a dask array and only doing the conversion when necessary.As far as implementation goes:
The text was updated successfully, but these errors were encountered: