Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add MultiZarrToZarr.append() #404

Merged
merged 9 commits into from
Jan 29, 2024
Merged

Add MultiZarrToZarr.append() #404

merged 9 commits into from
Jan 29, 2024

Conversation

martindurant
Copy link
Member

@martindurant
Copy link
Member Author

@rsignell-usgs , have you had a chance to do any experimenting with this?

@rsignell
Copy link

@martindurant I've (finally) tested and both appending to JSON and Parquet are working fine!

https://nbviewer.org/gist/rsignell/ad1f3c66530a81e77585e05d9683984e

@martindurant
Copy link
Member Author

parquet part requires fsspec/filesystem_spec#1510 (should be merged, then released soon)

kerchunk/combine.py Outdated Show resolved Hide resolved
kerchunk/combine.py Outdated Show resolved Hide resolved
@martindurant martindurant merged commit 0636846 into fsspec:main Jan 29, 2024
5 checks passed
@martindurant martindurant deleted the append branch January 29, 2024 19:48
@maxrjones
Copy link
Contributor

@martindurant I added a Pythia chapter on the new feature in ProjectPythia/kerchunk-cookbook#52. It works great, but I got stuck for a while on the need for the coo_map={"time": "cf:time"}, since the original MultiZarrtoZarr worked fine without specifying that. Am I missing something regarding how the coordinate alignment works usually?

@martindurant
Copy link
Member Author

need for the coo_map={"time": "cf:time"}

I can't tell you how annoying these time units are in general for someone not from the field. I used to care about leap seconds https://en.wikipedia.org/wiki/Terrestrial_Time , and barycentring, but it still seemed simpler.

the original MultiZarrtoZarr worked fine without specifying that

It ... depends. Since the normal mode is to copy the values and attributes as-is, this will be no different to "cf:" and output dtype unspecified, if the inputs happened to have the same time epoch.

@rsignell
Copy link

So just to make sure I've got it right: If all the files have the same time units (e.g. "seconds since 1970-01-01 00:00") then we don't need it, but if the files in the collection we are combining have varying time units, like ["hours since 2012-01-01 00:00", "hours since 2013-01-01 00:00"] then we need the map. Correct?

@martindurant
Copy link
Member Author

Correct, but having cf: in the first case shouldn't hurt either (except that you need cftime in the env, which is a given anyway for this case)

@maxrjones
Copy link
Contributor

hmm I think in ProjectPythia/kerchunk-cookbook#52 all the files had the same time units, and excluding cf: worked in the original concatenation but failed when using append. I created a small test for demonstration in #414, @martindurant should that be expected to work?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants