Skip to content

Commit

Permalink
Zip as possible Zarr extenstion (#426)
Browse files Browse the repository at this point in the history
<!-- Please ensure the PR fulfills the following requirements! -->
<!-- If this is your first PR, make sure to add your details to the
AUTHORS.rst! -->
### Pull Request Checklist:
- [ ] This PR addresses an already opened issue (for bug fixes /
features)
    - This PR fixes #xyz
- [x] (If applicable) Documentation has been added / updated (for bug
fixes / features).
- [ ] (If applicable) Tests have been added.
- [x] This PR does not seem to break the templates.
- [x] CHANGELOG.rst has been updated (with summary of main changes).
- [x] Link to issue (:issue:`number`) and pull request (:pull:`number`)
has been added.

### What kind of change does this PR introduce?

* The directory parser will now assign a "zarr" format to files with
.zarr.zip or .zip extensions. The extension still needs to be included
in the pattern.
* `get_engine` returns `zarr` for the same two file extensions.

### Does this PR introduce a breaking change?
No.

### Other information:
  • Loading branch information
aulemahal authored Jul 4, 2024
2 parents a758977 + 9e3e72a commit 93c9443
Show file tree
Hide file tree
Showing 3 changed files with 19 additions and 3 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ Internal changes
^^^^^^^^^^^^^^^^
* Include domain in `weight_location` in ``regrid_dataset``. (:pull:`414`).
* Added pins to `xarray`, `xclim`, `h5py`, and `netcdf4`. (:pull:`414`).
* Add ``.zip`` and ``.zarr.zip`` as possible file extensions for Zarr datasets. (:pull:`426`).

v0.9.1 (2024-06-04)
-------------------
Expand Down
14 changes: 13 additions & 1 deletion src/xscen/catutils.py
Original file line number Diff line number Diff line change
Expand Up @@ -39,6 +39,18 @@
# ## File finding and path parsing ## #


SUFFIX_TO_FORMAT = {
".nc": "nc",
".nc4": "nc",
".zip": "zarr",
".zarr.zip": "zarr",
".zarr": "zarr",
}
"""Mapping from file suffix to format.
This is used to populate the "format" esm catalog column from the parsed path.
"""

EXTRA_PARSE_TYPES = {}
"""Extra parse types to add to parse's default.
Expand Down Expand Up @@ -223,7 +235,7 @@ def _name_parser(
return None

d["path"] = abs_path
d["format"] = path.suffix[1:]
d["format"] = SUFFIX_TO_FORMAT.get(path.suffix, path.suffix[1:])

if "DATES" in d:
d["date_start"], d["date_end"] = d.pop("DATES")
Expand Down
7 changes: 5 additions & 2 deletions src/xscen/io.py
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,10 @@


def get_engine(file: Union[str, os.PathLike]) -> str:
"""Use functionality of h5py to determine if a NetCDF file is compatible with h5netcdf.
"""Determine which Xarray engine should be used to open the given file.
The .zarr, .zarr.zip and .zip extensions are recognized as Zarr datasets,
the rest is seen as a netCDF. If the file is HDF5, the h5netcdf engine is used.
Parameters
----------
Expand All @@ -60,7 +63,7 @@ def get_engine(file: Union[str, os.PathLike]) -> str:
Engine to use with xarray
"""
# find the ideal engine for xr.open_mfdataset
if Path(file).suffix == ".zarr":
if Path(file).suffix in [".zarr", ".zip", ".zarr.zip"]:
engine = "zarr"
elif h5py.is_hdf5(file):
engine = "h5netcdf"
Expand Down

0 comments on commit 93c9443

Please sign in to comment.