Replies: 4 comments 6 replies
-
We routinely process ~30 netCDF4 files in the 25M-75M size range to calculate month-averaged fields from day-averaged fields. Less frequently, we process 365 netCDF4 files in the 200M-2G size range to construct year-long datasets from daily hour-averaged fields. The files are all locally stored on a Linux file system. We use a |
Beta Was this translation helpful? Give feedback.
-
I've done O(10,000) files with a big-ish distributed cluster. Your memory usage might result from useless comparisons. See the "Note" in https://docs.xarray.dev/en/stable/user-guide/io.html#reading-multi-file-datasets (xref #8778) |
Beta Was this translation helpful? Give feedback.
-
Thanks @dcherian, this is somewhat related to a comment you made not long ago on how to once that we have these many files open, operations should be faster if we use flox. I think your suggestion of changing the default values in For context, I'm prototyping a way in which earthaccess will take advantage of knowing how compliant/regular a given dataset is and use this information to speed up access by inferring sensible defaults in xarray and using the right caching strategy (via fsspec). |
Beta Was this translation helpful? Give feedback.
-
Hi @betolink. I often used In my experience was very important to:
If the reason to open lot of files to then rechunk it (i.e. with rechunker) you should then also pay attention to the max chunk size you are gonna to create and the compressor you use. Some compressor have an upper limit on the chunk size they can deal with ... and cause some cryptic error messages or failure of the rechunking. I saw the recent efforts you put on |
Beta Was this translation helpful? Give feedback.
-
I'm curious to know, what's the max number of NetCDF/HDF5 files (remote or local) that someone has opened with xarray's
xr.open_mfdataset(files, parallel=True)
? This is using a Dask cluster. Context: I'm opening ~4000 small files and I'm seeing a lot of memory usage just on the opening step, these are small files (3MB each), even loaded into memory they should fit on my instance.Beta Was this translation helpful? Give feedback.
All reactions