Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP]: compatibility for zarr-python 3.x #9552

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

TomAugspurger
Copy link
Contributor

@TomAugspurger TomAugspurger commented Sep 27, 2024

This is a WIP for compatibility with zarr-python 3.x. It's intended to be run against zarr-python v3 + the open PRs referenced in #9515.

My initial goal is to ensure that users can still read Zarr v2 hierarchies without issue.

I'll also note that #5475 is going to become a larger issue once people start writing Zarr-V3 datasets.

Lots of failures still. I'll be force-pushing changes to keep the commit history somewhat meaningful.

We might want some new tests that explicitly set zarr_version to control whether v2 or v3 data is written.

  • Closes Zarr Python 3 tracking issue #9515
  • Tests added
  • User visible changes (including notable bug fixes) are documented in whats-new.rst
  • New functions/methods are listed in api.rst

@@ -499,7 +498,7 @@ def test_dataset_caching(self) -> None:

@pytest.mark.filterwarnings("ignore:deallocating CachingFileManager")
def test_roundtrip_None_variable(self) -> None:
expected = Dataset({None: (("x", "y"), [[0, 1], [2, 3]])})
expected = Dataset({None: (("x", "y"), [[1, 1], [2, 3]])})
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These test files include a bunch of changes to avoid creating DataArrays with values equal to the default fill value for some type. Without this change, the test would fail thanks to #5475. It's not great, but we avoid the issue.

@TomAugspurger TomAugspurger force-pushed the fix/zarr-v3 branch 2 times, most recently from 1ed4ef1 to bb2bb6c Compare September 30, 2024 14:04
Copy link
Contributor Author

@TomAugspurger TomAugspurger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This set of changes should be backwards compatible and work with zarr-python 2.x (so reading and writing zarr v2 data).

I'll work through zarr-python 3.x now. I think we might want to parametrize most of these tests by zarr_version=[2, 3] to confirm that we can read / write zarr v2 data with zarr-python 3.x


if consolidated is None:
consolidated = False
if _zarr_v3():
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zarr-python 3.x changed this parameter name. xarray should probably go through the same deprecation from zarr_version to zarr_format.

@@ -75,8 +89,10 @@ def __init__(self, zarr_array):
self.shape = self._array.shape

# preserve vlen string object dtype (GH 7328)
if self._array.filters is not None and any(
[filt.codec_id == "vlen-utf8" for filt in self._array.filters]
if (
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zarr-developers/zarr-python#2036 is probably relevant here.


if _zarr_v3() and zarr_array.metadata.zarr_format == 3:
encoding["codec_pipeline"] = [
x.to_dict() for x in zarr_array.metadata.codecs
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe this instead?

Suggested change
x.to_dict() for x in zarr_array.metadata.codecs
zarr_array.metadata.to_dict()["codecs"]

A bit wasteful since everything has to be serialized, but presumably zarr knows better how to serialize the codec pipeline than we do here?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant