-
-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP]: compatibility for zarr-python 3.x #9552
base: main
Are you sure you want to change the base?
Conversation
@@ -499,7 +498,7 @@ def test_dataset_caching(self) -> None: | |||
|
|||
@pytest.mark.filterwarnings("ignore:deallocating CachingFileManager") | |||
def test_roundtrip_None_variable(self) -> None: | |||
expected = Dataset({None: (("x", "y"), [[0, 1], [2, 3]])}) | |||
expected = Dataset({None: (("x", "y"), [[1, 1], [2, 3]])}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These test files include a bunch of changes to avoid creating DataArrays with values equal to the default fill value for some type. Without this change, the test would fail thanks to #5475. It's not great, but we avoid the issue.
1ed4ef1
to
bb2bb6c
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This set of changes should be backwards compatible and work with zarr-python 2.x (so reading and writing zarr v2 data).
I'll work through zarr-python 3.x now. I think we might want to parametrize most of these tests by zarr_version=[2, 3]
to confirm that we can read / write zarr v2 data with zarr-python 3.x
|
||
if consolidated is None: | ||
consolidated = False | ||
if _zarr_v3(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zarr-python 3.x changed this parameter name. xarray should probably go through the same deprecation from zarr_version to zarr_format.
@@ -75,8 +89,10 @@ def __init__(self, zarr_array): | |||
self.shape = self._array.shape | |||
|
|||
# preserve vlen string object dtype (GH 7328) | |||
if self._array.filters is not None and any( | |||
[filt.codec_id == "vlen-utf8" for filt in self._array.filters] | |||
if ( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
zarr-developers/zarr-python#2036 is probably relevant here.
|
||
if _zarr_v3() and zarr_array.metadata.zarr_format == 3: | ||
encoding["codec_pipeline"] = [ | ||
x.to_dict() for x in zarr_array.metadata.codecs |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe this instead?
x.to_dict() for x in zarr_array.metadata.codecs | |
zarr_array.metadata.to_dict()["codecs"] |
A bit wasteful since everything has to be serialized, but presumably zarr knows better how to serialize the codec pipeline than we do here?
9f2cb2f
to
d11d593
Compare
* removed open_consolidated workarounds * removed _store_version check * pass through zarr_version
a324329
to
6087e5e
Compare
This is a WIP for compatibility with zarr-python 3.x. It's intended to be run against zarr-python v3 + the open PRs referenced in #9515.
My initial goal is to ensure that users can still read Zarr v2 hierarchies without issue.
I'll also note that #5475 is going to become a larger issue once people start writing Zarr-V3 datasets.
Lots of failures still. I'll be force-pushing changes to keep the commit history somewhat meaningful.
We might want some new tests that explicitly set
zarr_version
to control whether v2 or v3 data is written.whats-new.rst
api.rst