Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devel/extend #1

Merged
merged 24 commits into from
Feb 13, 2023
Merged

Devel/extend #1

merged 24 commits into from
Feb 13, 2023

Conversation

oloapinivad
Copy link
Collaborator

@oloapinivad oloapinivad commented Jan 31, 2023

This branch addresses a few issues of smmregrid so far

  • Support for xarray.Dataset
  • Support for land-sea masks
  • Support for vertical levels
  • Tests

@oloapinivad
Copy link
Collaborator Author

I was able to run it on both DataArray and Dataset, create a series of tests for multiple grids and find a draft way to deal with masks. Pressure levels seems supported by default.

@oloapinivad
Copy link
Collaborator Author

I had a bit of fights with the different versions, but tests are now in place from 3.7 to 3.10. Environment creation is now safe.

@oloapinivad oloapinivad marked this pull request as draft February 1, 2023 10:18
@oloapinivad oloapinivad changed the title [Draft] Devel/extend Devel/extend Feb 1, 2023
@oloapinivad
Copy link
Collaborator Author

Last commit introduced a few fixes to work with Datasets, especially with masks andtime_bnds. However, there is slowdown in vertical levels that should be investigated.

@jhardenberg
Copy link
Owner

jhardenberg commented Feb 1, 2023

Hi, what is the "remove dobule mask computation" commit? Did you find a way to avoid computing the mask itself or was it duplicated?

@oloapinivad
Copy link
Collaborator Author

Ahaha no I did it twice :-)

@oloapinivad
Copy link
Collaborator Author

Last commit includes a new treatment of the masks - computed once when the weights are loaded and initialised - and significantly reduces the overhead when working with xarray.Dataset() to about 10%. Speedup figures are discussed in #2, and mostly suggest that with a full loading of the files the speed up compared to CDO is about ~1.5-3x. Slower performance are achieved for very big files. However, the good news is that with the current configuration it seems that writing to the disk is not a bottleneck, so that we are faster than CDO also in this case.

I will consider this as done and ask for review from @jhardenberg
We still need to figure out how to better exploit dask for parallelisation.

@oloapinivad oloapinivad marked this pull request as ready for review February 2, 2023 16:55
@oloapinivad
Copy link
Collaborator Author

Last commit introduces a very small fix which allows for automatically transferring attributes from the original dataset to the regridded one.

@oloapinivad
Copy link
Collaborator Author

Proceeding to the merge!

@oloapinivad oloapinivad merged commit 419e3a2 into main Feb 13, 2023
@oloapinivad oloapinivad deleted the devel/extend branch April 13, 2023 14:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants