Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Appending ndpyramid data along a time dimension #56

Open
keltonhalbert opened this issue Mar 3, 2023 · 4 comments
Open

Appending ndpyramid data along a time dimension #56

keltonhalbert opened this issue Mar 3, 2023 · 4 comments

Comments

@keltonhalbert
Copy link

Hello,

I'm interested in using ndpyramid and the CarbonPlan Maps visualization platform for displaying data, with the unique quirk that I would like to be able to append to an existing Zarr store as new data becomes available. I am working with 2D images that vary in time, with all other grid attributes effectively static. Perhaps this is the wrong repository to ask this question, since this may be a quirk of the mapping framework or the zarr javascript library, but figured this would be worth a try.

If I read all time steps into memory, tile, and then write, the data plays nicely with the CarbonPlan Maps viewer. This is effectively following the "3d, one variable, multiple time points" demo. However, when I try to tile one time-step at a time and append to an existing Zarr store, something about the data structure or metadata structure does not behave with the map viewer.

Reading an entire datasets temporal range into memory before tiling is very memory inefficient, especially for high-fidelity datasets or long time-range datasets. Is there a better or preferred means of being able to achieve temporal appends of ndpyramid stores?

@csteele2
Copy link

csteele2 commented Oct 5, 2023

I second this motion. The way it handles the data now takes away the advantage of zarr and larger-than-RAM datasets that are so common in climate and weather on the processing end. I just started poking at this now, but perhaps this could be as simple as regenerating the metadata after every write?

@caiostringari
Copy link

Any updates on this functionality ?

@norlandrhagen
Copy link
Contributor

Hey there @keltonhalbert, thanks for opening the issue. It's possible something is going wrong in the generation phase or maybe in the maps library. When you are appending to the existing zarr store, are you reconsolidating the metadata?

I think a good start would be to create a MRE to build both a standard pyramid and a pyramid created from appending and see if there are any obvious differences.

@caiostringari
Copy link

I think @keltonhalbert has a scenario similar to mine, which would be something like this:

import xarray as xr
import rioxarray  # here so that we can use rioxarray functions
from ndpyramid import pyramid_reproject

# get the sample dataset, let's pretend this is a timestamp of a 1 hour dataset such as MRMS
ds = xr.tutorial.open_dataset('air_temperature').isel(time=slice(1))
ds = ds.rio.write_crs("EPSG:4326")
dt = pyramid_reproject(ds, levels=2, clear_attrs=False, pixels_per_tile=64)
dt.to_zarr("fake_mrms/air_temperature_t1.zarr", mode="w")

# one hour later, we get a new timestamp and want to append to the existing store
ds = xr.tutorial.open_dataset('air_temperature').isel(time=slice(2))
ds = ds.rio.write_crs("EPSG:4326")
dt = pyramid_reproject(ds, levels=2, clear_attrs=False, pixels_per_tile=64)

# what do I do now to append to the existing along the time dimension?

# this works, but reading the data afterwards is not that trivial
dt.to_zarr("fake_mrms/air_temperature_t2.zarr", mode="w") # ???

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants