Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

NCZarr Support Part I: Local Datasets #884

Closed
wants to merge 17 commits into from

Conversation

openSourcerer9000
Copy link
Contributor

In response to #672, I've added in the logic to handle Zarr datasets by passing them through to netCDF-C's NCZarr protocol. protocols/zarr.py takes a Zarr dataset specified in any of the following formats and returns a valid NCZarr URI (specified here), to be recognized by netCDF-C:

"http://s3.amazonaws.com/bucket/dataset.zarr"
"http://s3.amazonaws.com/bucket/dataset.zarr"#mode=nczarr,s3
"/home/path/to/dataset.zarr"
Path('/home/path/to/dataset.zarr')
"file:///home/path/to/dataset.zarr"
"file:///home/path/to/dataset.randomExt#mode=nczarr,file"
"file:///home/path/to/dataset.zarr#mode=nczarr,zip"

Note that so far, this will only work on LOCAL datasets, with the default libnetcdf build installed when compliance-checker is set up with conda. NCZarr is also only fully supported in Linux at the moment, I added an OS check to pass this caveat through to a user trying to run on a Zarr from another OS.

Getting S3 support down for netCDF-C is an ongoing effort. Once it's solid, the S3 test that is currently commented out in test_cli.py should pass and it should work on S3 Zarr datasets.

Update: it looks like this is on the home stretch, AWSome!

While I was in test_protocols.py, I also refactored it to use Pytest, continuing the upgrade to Pytest.

#test_protocols.py

id_url = {
# Check that urls with Content-Type header of "application/x-netcdf" can
# successfully be read into memory for checks.
'netcdf_content_type':"https://gliders.ioos.us/erddap/tabledap/amelia-20180501T0000.ncCF?&time%3E=max(time)-1%20hour",
# Tests that a connection can be made to ERDDAP's GridDAP
'erddap':"http://coastwatch.pfeg.noaa.gov/erddap/griddap/osuChlaAnom",
# Tests that a connection can be made to Hyrax
'hyrax':"http://ingria.coas.oregonstate.edu/opendap/hyrax/aggregated/ocean_time_aggregation.ncml",
# Tests that a connection can be made to a remote THREDDS endpoint
'thredds':"http://thredds.ucar.edu/thredds/dodsC/grib/NCEP/GFS/Global_0p25deg_ana/TP",
# Tests that a connection can be made to an SOS endpoint
'sos':"https://data.oceansmap.com/thredds/sos/caricoos_ag/VIA/VIA.ncml",
}

When run they look like:

test_protocols.py::TestProtocols::test_connection[netcdf_content_type] PASSED
test_protocols.py::TestProtocols::test_connection[erddap] PASSED
test_protocols.py::TestProtocols::test_connection[hyrax] PASSED
test_protocols.py::TestProtocols::test_connection[thredds] PASSED
test_protocols.py::TestProtocols::test_connection[sos] PASSED

@ocefpaf
Copy link
Member

ocefpaf commented May 22, 2024

Closing in favor of #1071.

@ocefpaf ocefpaf closed this May 22, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants