Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Speed evaluation of the smmregrid tool #2

Open
oloapinivad opened this issue Feb 2, 2023 · 4 comments
Open

Speed evaluation of the smmregrid tool #2

oloapinivad opened this issue Feb 2, 2023 · 4 comments

Comments

@oloapinivad
Copy link
Collaborator

oloapinivad commented Feb 2, 2023

This issue is to keep track of the speed tests that I have been to see what is the optimal configuration for the regridder based on #1

The tests are based on files on different grids (curvilinear, gaussian, gaussian reduced, lonlat and unstructured) to cover all the possibilities, with 2D files, files with mask (i.e. ocean files) and files with pressure levels. We also tested the access of the entire xarray.Dataset versus working on the single xarray.DataArray. The writing of the NetCDF output is also assessed. All tests are run with conservative remapping.

The tests can be found in the playground notebook, and are based on multiple repetition (usually 20 fo each operation). https://github.com/jhardenberg/smmregrid/blob/devel/extend/playground.ipynb

@oloapinivad
Copy link
Collaborator Author

oloapinivad commented Feb 2, 2023

Commit b08b045 establish a good starting point:

NVars NRecords CDO SMM (Dataset) SMM (DataArray) SMM (DataArray+NoMask) SMM (Dataset+Write) SMM (DataArray+Write)
onlytos-ipsl.nc 1 (12, 332, 362) 1 0.216799 0.0778903 0.0504366 0.997036 0.77708
tas-ecearth.nc 1 (12, 256, 512) 1 0.226347 0.0900398 0.0631857 1.14144 0.958676
2t-era5.nc 1 (12, 73, 144) 1 0.170659 0.094557 0.0610341 0.845712 0.765937
tos-fesom.nc 1 (12, 126859) 1 0.113976 0.0399258 0.0256824 0.755877 0.623671
ua-ecearth.nc 1 (2, 19, 256, 512) 1 0.398825 0.0677817 0.0441846 1.61761 1.35922
mix-cesm.nc 4 (12, 192, 288) 1 0.549228 0.0688021 0.0452126 1.73817 0.670783
era5-mon.nc 1 (864, 721, 1440) 1 0.825034 0.000651573 0.000439713 1.883 1.0852

Few points:

  • There is still too much overhead in the Dataset call. This is likely due to some safecheck we are using for the masks.
  • Masks count for about 50% more computational increase. So far it is unsafe to run without masks.
  • Performance are very good for big files (see the last era5-montly interpolation) which suggest great scaling
  • Writing files bring we back to square one, putting on the same line as CDO (but this is expected).

IMPORTANT: this numbers does not take into account the loading of the data

@oloapinivad
Copy link
Collaborator Author

oloapinivad commented Feb 2, 2023

Numbers are less incredible if we take into account the loading of the data into memory i.e. xarray.load()

NVars NRecords CDO CDO (NoLoad) SMM (Dataset) SMM (DataArray) SMM (DataArray+NoLoad) SMM (DataArray+NoMask) SMM (Dataset+Write) SMM (DataArray+Write)
onlytos-ipsl.nc 1 (12, 332, 362) 1 0.85244 0.468714 0.36703 0.0898978 0.261572 0.451468 0.436607
tas-ecearth.nc 1 (12, 256, 512) 1 0.976596 0.556708 0.498547 0.103215 0.35501 0.583348 0.561232
2t-era5.nc 1 (12, 73, 144) 1 1.00737 0.295552 0.250174 0.0685831 0.163858 0.354942 0.341436
tos-fesom.nc 1 (12, 126859) 1 0.975193 0.354071 0.321254 0.052417 0.226885 0.370075 0.367297
ua-ecearth.nc 1 (2, 19, 256, 512) 1 0.886853 0.623172 0.597676 0.171645 0.476483 0.834928 0.849127
mix-cesm.nc 4 (12, 192, 288) 1 0.828504 0.903087 0.246191 0.0574815 0.172714 1.21837 0.3044
era5-mon.nc 1 (864, 721, 1440) 1 1.00124 0.68279 0.688481 0.338506 0.66355 0.696658 0.705815

We are still faster than CDO for single DataArray, but the speedup is small.

@oloapinivad
Copy link
Collaborator Author

Conversely, we get to very bad scaling when we use dask. This is pretty unclear why.

Workers CDO Dask Compute Dask Load
onlytos-ipsl.nc 0 1.33525 0.802709 0.80034
tas-ecearth.nc 0 1.31848 1.04944 0.976837
onlytos-ipsl.nc 1 1.36808 3.72327 2.29133
tas-ecearth.nc 1 1.36115 2.75875 2.80817
onlytos-ipsl.nc 2 1.3585 4.1103 2.21112
tas-ecearth.nc 2 1.40967 2.85582 2.78464
onlytos-ipsl.nc 8 1.356 7.77573 5.73662
tas-ecearth.nc 8 1.43555 3.41429 2.84304

@oloapinivad oloapinivad mentioned this issue Feb 2, 2023
@oloapinivad
Copy link
Collaborator Author

Last commit in #2 suggest significant improvements. Considering that we are not using dask yet, this can be considered as a success.

NVars NRecords CDO CDO (NoLoad) SMM (Dataset) SMM (DataArray) SMM (DataArray+NoLoad) SMM (Dataset+Write) SMM (DataArray+Write)
onlytos-ipsl.nc 1 (12, 332, 362) 1 0.924445 0.377513 0.347206 0.078164 0.429603 0.431012
tas-ecearth.nc 1 (12, 256, 512) 1 0.927699 0.410739 0.38098 0.0854462 0.468425 0.461594
2t-era5.nc 1 (12, 73, 144) 1 0.924335 0.204548 0.160671 0.0439383 0.253002 0.248918
tos-fesom.nc 1 (12, 126859) 1 1.01328 0.332096 0.32686 0.048309 0.379621 0.372826
ua-ecearth.nc 1 (2, 19, 256, 512) 1 0.898331 0.508237 0.475537 0.156023 0.74989 0.735353
mix-cesm.nc 4 (12, 192, 288) 1 0.886593 0.652437 0.181319 0.0456369 0.92309 0.248312
era5-mon.nc 1 (864, 721, 1440) 1 0.990528 0.757665 0.794592 0.33693 0.822056 0.803797

oloapinivad added a commit that referenced this issue Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant