Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

v2->v3 zarr.meta module name change breaks imports #2021

Open
ghidalgo3 opened this issue Jul 9, 2024 · 3 comments
Open

v2->v3 zarr.meta module name change breaks imports #2021

ghidalgo3 opened this issue Jul 9, 2024 · 3 comments
Labels
bug Potential issues with the zarr-python library

Comments

@ghidalgo3
Copy link

Zarr version

3.0.0a0

Numcodecs version

0.12.1

Python Version

3.10.12

Operating System

Linux

Installation

pip install zarr==3.0.0a0

Description

In ZarrV3, the reorganization of the V2 code into the v2 module requires consumers to change their import stanzas to include a .v2.. See this line in Kerchunk for example, now it needs to read from zarr.v2.meta import encode_fill_value .

I'm trying to make VirtualiZarr depend on ZarrV3 for the ArrayMetadata classes in this PR but ZarrV3 causes a problem for Kerchunk due to the module name change.

I think given the discussion on this issue, I think this is intended and the solution is for Kerchunk to add a lower bound to ZarrV3 and change its import statements. However that may be disruptive to many users if there is no plan to maintain the import paths the same.

Steps to reproduce

Simply attempt to import kerchunk.hdf.

import kerchunk.hdf # ModuleNotFoundError: No module named 'zarr.meta'

Additional output

pip freeze

aiobotocore==2.13.1
aiohttp==3.9.5
aioitertools==0.11.0
aiosignal==1.3.1
annotated-types==0.7.0
asciitree==0.3.3
asttokens==2.4.1
async-timeout==4.0.3
attrs==23.2.0
botocore==1.34.131
certifi==2024.7.4
cfgv==3.4.0
cftime==1.6.4
charset-normalizer==3.3.2
codecov==2.1.13
coverage==7.5.4
cramjam==2.8.3
crc32c==2.4.1
decorator==5.1.1
distlib==0.3.8
donfig==0.8.1.post1
exceptiongroup==1.2.1
executing==2.0.1
fasteners==0.19
fastparquet==2024.5.0
filelock==3.15.4
frozenlist==1.4.1
fsspec==2024.6.1
h5netcdf==1.3.0
h5py==3.11.0
identify==2.6.0
idna==3.7
iniconfig==2.0.0
ipython==8.26.0
jedi==0.19.1
jmespath==1.0.1
kerchunk==0.2.6
matplotlib-inline==0.1.7
multidict==6.0.5
mypy==1.10.1
mypy-extensions==1.0.0
netCDF4==1.7.1.post1
nodeenv==1.9.1
numcodecs==0.12.1
numpy==2.0.0
packaging==24.1
pandas==2.2.2
parso==0.8.4
pexpect==4.9.0
platformdirs==4.2.2
pluggy==1.5.0
pooch==1.8.2
pre-commit==3.7.1
prompt_toolkit==3.0.47
ptyprocess==0.7.0
pure-eval==0.2.2
pydantic==2.8.2
pydantic_core==2.20.1
Pygments==2.18.0
pytest==8.2.2
pytest-cov==5.0.0
pytest-mypy==0.10.3
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
requests==2.32.3
ruff==0.5.1
s3fs==2024.6.1
scipy==1.14.0
six==1.16.0
stack-data==0.6.3
tomli==2.0.1
traitlets==5.14.3
typing_extensions==4.12.2
tzdata==2024.1
ujson==5.10.0
universal_pathlib==0.2.2
urllib3==2.2.2
virtualenv==20.26.3
-e git+https://github.com/zarr-developers/VirtualiZarr@92a7e81876bf89ebab4bc1d738d65fd935b06c6e#egg=virtualizarr
wcwidth==0.2.13
wrapt==1.16.0
xarray==2024.6.0
yarl==1.9.4
-e git+https://github.com/zarr-developers/zarr-python@ace96f569490882e89c181387d27f20b0babd6a1#egg=zarr
zstandard==0.22.0
@ghidalgo3 ghidalgo3 added the bug Potential issues with the zarr-python library label Jul 9, 2024
@jhamman
Copy link
Member

jhamman commented Jul 10, 2024

Thanks for raising this issue @ghidalgo3! To be honest, we haven't developed a firm position on this part of the v3 API. My opinion is that Zarr's API surface is too large and things like encode_fill_value should not be part of it going forward. It's on us to catalog the API in v2 and v3 and explain what's in and out.

@ghidalgo3
Copy link
Author

ghidalgo3 commented Jul 11, 2024

Understood, but there seem to be several repos that depend at least on being able to import zarr.meta. Based on this query I see at least these:

  1. https://github.com/CNES/zcollection/blob/192b2ea501c93d178c330187093fae5ea9d1f31d/zcollection/meta.py#L17
  2. https://github.com/xpublish-community/xpublish
  3. https://github.com/CBI-PITT/BrAinPI/blob/70af936c311c55d955dc64a3b0cba93c8dbca544/converters/H5_zarr_store3.py#L41
  4. https://github.com/oceanhackweek/ohw20-proj-gapfree-sst/blob/d8b42ee8139aaac189355c48267b91a3be876338/make_zarr/h5-to-zarr.py#L12
  5. https://github.com/rsignell-usgs/hurricane-ike-water-levels/blob/52c8da9710e9b5784b48887c4ba04c87d9f21d76/h5-to-zarr.py#L12
  6. And of course https://github.com/fsspec/kerchunk/blob/fd86c5806c67721b4c6bfd8f47f9c277531232e1/kerchunk/hdf.py#L25

I think reducing API surface is a good idea, but breaking dependencies just through a module name change seems like it's creating unnecessary work for downstream users. Instead of removing code, could you just mark it as deprecated and let downstream users decide to stop calling it?

@jhamman
Copy link
Member

jhamman commented Sep 13, 2024

Thanks @ghidalgo3 for your work documenting these projects. In Xpublish, we've already pinned to <3 and I just opened a PR in kerchunk to do the same (fsspec/kerchunk#504). My initial reaction to this list is that there is not much to worry about here. The recommended path for users of utilities in these modules is to either a) pin the zarr version to <3 or b) vendor the needed bits in their own library.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Potential issues with the zarr-python library
Projects
None yet
Development

No branches or pull requests

2 participants