-
Notifications
You must be signed in to change notification settings - Fork 27
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Manifest arrays use arrayv3metadata #429
base: zarr-python-3.0
Are you sure you want to change the base?
Manifest arrays use arrayv3metadata #429
Conversation
…not happy about this)
That option seems great to me. Thanks for moving this along @abarciauskas-bgse |
I'm going to continue to review this tomorrow but the tests are passing and I've done an initial reorganization of the code that was in zarr.py. So if any of @TomNicholas @norlandrhagen @ayushnag @sharkinsspatial @jsignell @mpiannucci want to start to review please go ahead 🙏🏽 I also changed the base to a new branch of main |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is absolutely great @abarciauskas-bgse ! Comments are really just minor.
@@ -13,7 +13,7 @@ dependencies: | |||
- ujson | |||
- universal_pathlib | |||
- hdf5plugin | |||
- numcodecs | |||
- numcodecs>=0.15.1 | |||
- imagecodecs>=2024.6.1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should list zarr explicitly in all the envs, and in fact in this upstream
one we could install it from main
.
@pytest.fixture | ||
def array_v3_metadata(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could we reimplement this fixture to internally just call the array_v3_metadata_dict
fixture below?
``` | ||
ZArray(shape=(2920, 25, 53), chunks=(2920, 25, 53), dtype=int16, compressor=None, filters=None, fill_value=None) | ||
ArrayV3Metadata(shape=(2920, 25, 53), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
from .manifests.array import ManifestArray | ||
|
||
CodecPipeline = Tuple[ | ||
Union["ArrayArrayCodec", "ArrayBytesCodec", "BytesBytesCodec"], ... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you prefer since python 3.10 you can use |
instead of Union
.
def extract_codecs( | ||
codecs: CodecPipeline, | ||
) -> tuple[ | ||
tuple[ArrayArrayCodec, ...], ArrayBytesCodec | None, tuple[BytesBytesCodec, ...] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a pretty complicated type that I had to stare at to work out what it is. Use TypeAlias
with an informative name?
Also is it definitely the right type? Seems weird that this would be valid: ((,), None, (,))
], | ||
def test_manifest_array_zarr_v3_with_codecs(self, create_manifestarray): | ||
"""Test get_codecs with ManifestArray using multiple v3 codecs.""" | ||
test_codecs = [ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could also be a global variable TEST_CODECS
, or even a fixture providing multiple sets of test codecs. See also @dcherian's codecs
hypothesis strategy that we could use in future (zarr-developers/zarr-python#2822).
order="C", | ||
shape=(5, 1, 20), | ||
zarr_format=2, | ||
# FAILING: TypeError: no implementation found for 'numpy.concatenate' on types that implement __array_function__: [<class 'virtualizarr.manifests.array.ManifestArray'>, <class 'numpy.ndarray'>] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a bad bug and shouldn't be forgotten - it implies that somehow inside the manifest / ManifestArray
concatenation implementation some array is the wrong type.
(This can't be being thrown from the top-level np.concatenate
you can see in the test, so it must be being thrown by the lower-level np.concatenate
call that is actually used to merge the manifests internally in .array_api
.)
"compressor", "chunks", and "shape". | ||
Returns: | ||
------- | ||
ArrayV3Metadata |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ArrayV3Metadata | |
ArrayV3Metadata | |
codecs = zarray._v3_codecs() | ||
|
||
# create array if it doesn't already exist | ||
# TODO: Should codecs be an argument to zarr's AsyncrGroup.create_array? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're asking for an upstream change here right?
@@ -67,13 +90,65 @@ def remove_file_uri_prefix(path: str): | |||
return path | |||
|
|||
|
|||
def convert_v3_to_v2_metadata( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this (a) not live in the kerchunk-specific writer module, (b) actually live in zarr-python upstream? Or is there no use for it upstream?
This is still very much a WIP - many tests and implementations still need to be fixed.
A few notes:
_parse_chunk_encoding_v3
function since it is a private function and may change, which is why some of that logic is replicated inconvert_to_codec_pipeline
Checklist
docs/releases.rst
api.rst