-
-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add default compressors to config #2470
Add default compressors to config #2470
Conversation
I am a little confused: in which cases should we use VLenBytes and in which cases VLenUTF8? Or do we need both at the same time? |
Also, do we need a default compressor or are default filters sufficient? |
# Conflicts: # tests/test_v2.py
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for working on this @brokkoli71. I think @normanrz is also going to review this but I wanted to also bring up an additional point.
In Zarr2 we told people to set zarr.storage.default_compressor = SomeCompressor()
This was simple but also an odd way to manage config. What we have now is much better. However, I wonder if we should do something to catch folks trying do set the default_compressor
variable. Thoughts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Couple of things:
- Print statements need removing
- The config naming could probably be improved
This would also be a good oppurtunity to update or add user facing documentation on what default compressors are used for what types of data - is that something you could add in this PR?
I think we should also have defaults for v3:
|
zstd isn't in the spec yet: zarr-developers/zarr-specs#256 |
thanks for your feedback, i will integrate it this week 👍🏼 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Could you please go through the docstrings again and add some info about the default compressor, filters, codecs? Then, I think this is good to go.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've left lots of small requests for changes, mainly around the docstrings - in general they are great and clear, but I think worth fixing a lot of the little issues while we're here.
I left most of the docstring comments in asynchronous.py, but they also apply to the other files that have updated docstrintgs.
src/zarr/api/asynchronous.py
Outdated
this collection specify the transformation from array values to stored bytes. | ||
V3 only. V2 arrays should use `filters` and `compressor` instead. | ||
If no codecs are provided, default codecs will be used: | ||
- For numeric arrays, the default is `BytesCodec` and `ZstdCodec`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- For numeric arrays, the default is `BytesCodec` and `ZstdCodec`. | |
- For numeric arrays, the default is `BytesCodec` and `ZstdCodec`. |
Can we also document the default compression level (and any other parameters) here?
Co-authored-by: David Stansby <[email protected]>
Head branch was pushed to by a user without write access
# Conflicts: # tests/test_config.py
is there a reason why none of the default compressors / codecs have a configuration? |
and a second question, why aren't strings / bytes compressed with |
This PR adds:
zarr_format=2
zarr.config
fixes #2267
Should
_get_default_array_bytes_codec
forzarr_format=3
also be configurable inzarr.config
?TODO: