Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CFA Virtualisation using CMIP6 example data: Unable to aggregate #793

Open
dwest77a opened this issue Jul 10, 2024 · 9 comments
Open

CFA Virtualisation using CMIP6 example data: Unable to aggregate #793

dwest77a opened this issue Jul 10, 2024 · 9 comments
Labels
bug Something isn't working

Comments

@dwest77a
Copy link

dwest77a commented Jul 10, 2024

Example CMIP6 data (JASMIN)

files = [
    '/badc/cmip6/data/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-ESM2-1/ssp119/r1i1p1f2/3hr/huss/gr/v20190328/huss_3hr_CNRM-ESM2-1_ssp119_r1i1p1f2_gr_201501010300-203501010000.nc',
    '/badc/cmip6/data/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-ESM2-1/ssp119/r1i1p1f2/3hr/huss/gr/v20190328/huss_3hr_CNRM-ESM2-1_ssp119_r1i1p1f2_gr_203501010300-205501010000.nc',
    '/badc/cmip6/data/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-ESM2-1/ssp119/r1i1p1f2/3hr/huss/gr/v20190328/huss_3hr_CNRM-ESM2-1_ssp119_r1i1p1f2_gr_205501010300-207501010000.nc',
    '/badc/cmip6/data/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-ESM2-1/ssp119/r1i1p1f2/3hr/huss/gr/v20190328/huss_3hr_CNRM-ESM2-1_ssp119_r1i1p1f2_gr_207501010300-209501010000.nc',
    '/badc/cmip6/data/CMIP6/ScenarioMIP/CNRM-CERFACS/CNRM-ESM2-1/ssp119/r1i1p1f2/3hr/huss/gr/v20190328/huss_3hr_CNRM-ESM2-1_ssp119_r1i1p1f2_gr_209501010300-210101010000.nc'
]

Attempted to aggregate the first two example files (successful)

f = cf.read(files)
g = cf.aggregate(f[:2])

Normal cf.write functions properly here by creating a combined netCDF file of both files, but using with cfa=True results in one of the following, depending on if I take the whole of both Fields (116880 time steps):

RuntimeError: NetCDF: HDF error

or a subselection of the last 10 from file 1 and the first 10 from file 2 .

g = cf.aggregate([ f[0][-10:], f[1][:10] ])

File "/home/users/dwest77/Documents/cfa_python_dw/cf_dw/cf_python/cf/read_write/netcdf/netcdfwrite.py", line 106, in _write_as_cfa
    raise ValueError(
ValueError: Can't write <CF Field: specific_humidity(time(20), latitude(128), longitude(256)) 1> as a CFA-netCDF aggregation variable. Consider setting cfa={'strict': False}

cf-python 3.16.2 (latest)
cfdm 1.11.1.0 (latest)

@dwest77a dwest77a added the bug Something isn't working label Jul 10, 2024
@davidhassell
Copy link
Collaborator

davidhassell commented Jul 10, 2024

Hi Dan, short answer (because I'm going home!), try:

>>> f = cf.read(files, chunks=None)
>>> cf.write(f, 'cfa.nc', cfa=True)

I did this on your data on JASMIN and it worked OK.

Long answer and explanations to follow ...

@dwest77a
Copy link
Author

Tried this exactly as you've stated but I still get the runtime error with netcdf. FYI I'm using netcdf4==1.7.1.post1. I can add my whole conda package list here if needed. I'm off as well now!

@davidhassell
Copy link
Collaborator

Interesting. I was using netCDF4==1.6.5 when it worked fine, but I got a seg fault with 1.7.1.post1

>>> cf.environment(paths=False)
Platform: Linux-3.10.0-1160.114.2.el7.x86_64-x86_64-with-glibc2.17
HDF5 library: 1.12.2
netcdf library: 4.9.3-development
udunits2 library: ~/miniconda3/lib/libudunits2.so.0
esmpy/ESMF: not available
Python: 3.12.2
dask: 2024.7.0
netCDF4: 1.6.5
psutil: 5.9.8
packaging: 23.1
numpy: 1.26.4
scipy: 1.12.0
matplotlib: not available
cftime: 1.6.3
cfunits: 3.3.7
cfplot: not available
cfdm: 1.11.1.0
cf: 3.16.2
>>>

@davidhassell
Copy link
Collaborator

netCDF4==1.7.0 works for me, too, but I notice that 1.7.0 and 1.7.1 have both been yanked (https://pypi.org/project/netCDF4/#history), for some reasons. Could this be related to Unidata/netcdf4-python#1343?

@dwest77a
Copy link
Author

Couple of questions with the above, is the hdf5 library just installed with h5py or does it require a non-python library to be installed? Otherwise I'll just fix the h5py and netCDF4 versions in my environment and make a note of it. Looks like the versions fall out of sync just because of a lack of coordination.

@dwest77a
Copy link
Author

dwest77a commented Jul 11, 2024

I've backdated netCDF4 to 1.6.5 and also adjusted my scipy and numpy versions to match yours as well. It looked like I was making progress because I had a file that appeared which was about 6MB, but after 4-5 minutes the process exited with the same error as before (Can't write aggregated variable...) and the file disappeared.

Note: Immediately rerunning this process only took 10 seconds to reach the same error so I think those 4-5 minutes were fetching the data (if that's even supposed to happen here?)

@davidhassell
Copy link
Collaborator

Hi Dan, I just defer to netCDF4 to install the correct and consistent netCDF-C and HDF5 libraries, and that has, for many years, just worked ....

Strange about your results - the write took ~1 minute for me. Are you using the C libraries installed by the python packages?

@dwest77a
Copy link
Author

I haven't done any extra steps to install alternative C libraries so I would assume yes, although I wouldn't know how to check.

@dwest77a
Copy link
Author

dwest77a commented Jul 11, 2024

My current environment setup for reference

asciitree==0.3.3
binpacking==1.5.2
ceda-elasticsearch-client==0.0.1
certifi==2024.7.4
cftime==1.6.4
cfunits==3.3.7
click==8.1.7
cloudpickle==3.0.0
dask==2024.7.0
elastic-transport==8.13.1
elasticsearch==8.14.0
fasteners==0.19
h5py==3.11.0
kerchunk==0.2.5
locket==1.0.0
mypy-extensions==1.0.0
netcdf-flattener==1.2.0
netCDF4==1.6.5
numcodecs==0.12.1
numpy==1.26.4
pandas==2.2.2
partd==1.4.2
python-dateutil==2.9.0.post0
pytz==2024.1
PyYAML==6.0.1
rechunker==0.5.2
scipy==1.12.0
tabulate==0.9.0
toolz==0.12.1
tzdata==2024.1
ujson==5.10.0
zarr==2.18.2

-e git+ssh://[email protected]/NCAS-CMS/cf-python.git@ca69ad166109e1eba4d4fb816af41b8058fcaa10#egg=cf_python
-e git+ssh://[email protected]/NCAS-CMS/cfdm.git@4106b448adf87ccef7c5285ac8624daf60f9956b#egg=cfdm
-e git+ssh://[email protected]/fsspec/filesystem_spec.git@262f664574e091228251b467ac92b2a6c327034b#egg=fsspec
-e git+ssh://[email protected]/cedadev/padocc.git@72e8e3538bd8ffe335c900a4f718e998a8ec9a7a#egg=pipeline
-e git+ssh://[email protected]/dwest77a/xarray.git@bef04067dd87f9f0c1a3ae7840299e0bbdd595a8#egg=xarray

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants