Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactoring of smmregrid to support for more complex dataset structures #32

Open
oloapinivad opened this issue Oct 14, 2024 · 0 comments · May be fixed by #33
Open

Refactoring of smmregrid to support for more complex dataset structures #32

oloapinivad opened this issue Oct 14, 2024 · 0 comments · May be fixed by #33

Comments

@oloapinivad
Copy link
Collaborator

There is the growing evidence that smmregrid is not flexible enough to support to ongoing stress coming from external application. This issue will serve as a landing point for the develpoment that we are planning to bring in order to make it more flexible and sustainable.

Most important limitation that I am aware of is the impossibility of dealing with xarray dataset which does not share dimensions in exact way. You cannot remap a source which have both oceanic and atmospheric data.

In order to overcome this limitation, the idea is make smmregrid work based on multiple gridtype instead of assuming that the data is all on the same grids. This is somehow not different from what CDO is doing.

  1. Introduction of a function to identify the gridtype that are available in a specific Dataset/DataArray based on shared dimensions, and exclude those which are not relevant (the ones that include bnds for example. Please note that this might be required by CDO so we should learn how to bring them along). Any gridtype can be identifed based on a tuple of dimensions (no strict naming required). Then, we can build a dictionary and exploit all the tools we have already available to detect vertical and horizontal dimensions, as well as variables which are lying on the specific gridtype. This is planned to be a class and I did already some successful test offline.
  2. Make all the smmregrid class object based on the gridtype. So the cdo_generate_weights will return not a simple xarray object, but a dictionary associated with each gridtype. All the object of the class are now expected to work with dictionaries, so that we can point to the required grid everytime.
  3. As a final step, the regrid call should check what is the gridtype of the data that is fed into the method, and match it with the ones available in weights.

Overall, this should massively improve the flexibility of the tool as well as the handling of the vertical coordinates, which will come now as inner property of the gridtype.
I plan also to move the apply_weights into the Regrid class to minimize the redundancy of the variables.

Tricky issues that I see:

  • Deal with precomputed weights stored on disk in the correct way, so that we associate each of them to the right gridtype. This is probably something more from AQUA but we should be think about it.
  • Do not mess with DataArray and Dataset: currently the core regrid functions works on DataArray but having multiple grids must require the proper handling of the Dataset
  • Find the right position where to load the data: we need to access metadata quite early so even if file are supplied this has to be moved at the init of the class.
  • More that I do not see now

I did a test in my local branch and it seems feasible to achieve. I already have a nice GridInspector class that provides most of the grid information. Still, I very far from having something decent so I am not opening a PR yet.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
1 participant