Skip to content

Commit

Permalink
Merge pull request #1 from jhardenberg/devel/extend
Browse files Browse the repository at this point in the history
Devel/extend
  • Loading branch information
oloapinivad authored Feb 13, 2023
2 parents 6706fa8 + 7c52b89 commit 419e3a2
Show file tree
Hide file tree
Showing 18 changed files with 1,134 additions and 36 deletions.
51 changes: 51 additions & 0 deletions .github/workflows/mambatest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# This workflow will install Python dependencies using Conda, run tests and lint with a single version of Python
# For more information see: https://autobencoder.com/2020-08-24-conda-actions/

name: Mamba PyTest

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch:

permissions:
contents: read

jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10"]
defaults:
run:
shell: bash -el {0}
steps:
- uses: actions/checkout@v3
- name: provision-with-micromamba
uses: mamba-org/provision-with-micromamba@v14
with:
environment-file: environment.yml
environment-name: smmregrid
cache-downloads: true
extra-specs: |
python=${{ matrix.python-version }}
- name: Install smmregrid
run: |
# install package
pip install -e .
- name: Lint with flake8
run: |
# install flake8
conda install flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
conda install pytest
python -m pytest
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
smmregrid.egg-info
__pycache__
*.idx
25 changes: 22 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,30 @@
# smmregrid
A compact regridder using sparse matrix multiplication

This repository represents a modification of the regridding routines in [climtas](https://github.com/ScottWales/climtas) by Scott Wales, which already implements efficiently this idea and has no other significant dependencies (it does not use iris or esmf for regridding).
This repository represents a modification of the regridding routines in [climtas](https://github.com/ScottWales/climtas) by Scott Wales, which already implements efficiently this idea and has no other significant dependencies (it does not use iris).
The regridder uses efficiently sparse matrix multiplication with dask + some manipulation of the coordinates.

I only had to change a few lines of code to make it compatible with unstructured grids. The regridder uses efficiently sparse matrix multiplication with dask + some manipulation of the coordinates (which would have to be revised/checked again)
Please note that this tool is not thought as "another interpolation tool", but rather a method to apply pre-computed weights (with CDO, which is currently tested, and with ESMF, which is not yet supported) within the python environment.
The speedup is estimated to be about ~1.5 to ~5 times, slightly lower if then files are written to the disk. 2D and 3D data are supported on all the grids supported by CDO, both xarray.Dataset and xarray.DataArray can be used. Masks are treated in a simple way but are correctly transfered. Attributes are kept.

It is safer to run it through conda/mamba. Install with:

```
conda env create -f environment.yml
```

then activate the environment:

```
conda activate smmregrid
```
and install smmregrid in editable mode:

Install with
```
pip install -e .
```

Cautionary notes:
- It does not work correctly if the Xarray.Dataset includes fields with different land-sea masks (e.g. temperature and SST)
- It does not support ESMF weigths.

206 changes: 206 additions & 0 deletions dask_playground.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tests for SMM (with dask) versus CDO\n",
"\n",
"There are the same speed test but using dask. Surprisingly, the code is much slower. There should be something wrong"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/work/users/paolo/miniconda3/envs/DevECmean4/lib/python3.10/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use.\n",
"Perhaps you already have a cluster running?\n",
"Hosting the HTTP server on port 44465 instead\n",
" warnings.warn(\n"
]
}
],
"source": [
"from time import time\n",
"import timeit\n",
"import os\n",
"import numpy as np\n",
"import xarray as xr\n",
"from smmregrid import cdo_generate_weights, Regridder\n",
"from smmregrid.checker import check_cdo_regrid # this is a new function introduced to verify the output\n",
"from cdo import Cdo\n",
"import pandas as pd\n",
"cdo = Cdo()\n",
"\n",
"# where and which the data are\n",
"indir='tests/data'\n",
"filelist = ['onlytos-ipsl.nc','tas-ecearth.nc', '2t-era5.nc','tos-fesom.nc']\n",
"tfile = os.path.join(indir, 'r360x180.nc')\n",
"\n",
"# method for remapping\n",
"methods = ['nn','con']\n",
"accesses = ['DataArray', 'Data']\n",
"\n",
"from dask.distributed import LocalCluster, Client\n",
"cluster = LocalCluster(ip=\"0.0.0.0\", threads_per_worker=1, n_workers=2)\n",
"client = Client(cluster)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Remapping (with weights available)\n",
"\n",
"This is the real goal of smmregrid. Here we test the computation of the remap when the weights are pre-computed. Considering that SMM does not have to write anything to disk, it is several times faster, between 5 to 10. Running with Dataset implies a bit of overhead (20%). Masks so far does not seem to be an issue."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CDO</th>\n",
" <th>SMM (Dataset)</th>\n",
" <th>SMM (DataArray)</th>\n",
" <th>SMM (DataSet+NoMask)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>onlytos-ipsl.nc</th>\n",
" <td>1.0</td>\n",
" <td>0.726599</td>\n",
" <td>0.427539</td>\n",
" <td>0.427731</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tas-ecearth.nc</th>\n",
" <td>1.0</td>\n",
" <td>0.902123</td>\n",
" <td>0.841263</td>\n",
" <td>0.869179</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2t-era5.nc</th>\n",
" <td>1.0</td>\n",
" <td>0.673339</td>\n",
" <td>0.694263</td>\n",
" <td>0.642410</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tos-fesom.nc</th>\n",
" <td>1.0</td>\n",
" <td>0.407764</td>\n",
" <td>0.405918</td>\n",
" <td>0.411269</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" CDO SMM (Dataset) SMM (DataArray) SMM (DataSet+NoMask)\n",
"onlytos-ipsl.nc 1.0 0.726599 0.427539 0.427731\n",
"tas-ecearth.nc 1.0 0.902123 0.841263 0.869179\n",
"2t-era5.nc 1.0 0.673339 0.694263 0.642410\n",
"tos-fesom.nc 1.0 0.407764 0.405918 0.411269"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# nrepetition for the check\n",
"nr = 10\n",
"\n",
"data =[]\n",
"for filein in filelist: \n",
"\n",
" # CDO\n",
" wfile = cdo.gencon(tfile, input = os.path.join(indir,filein))\n",
" one = timeit.timeit(lambda: cdo.remap(tfile + ',' + wfile, input = os.path.join(indir,filein), returnXDataset = True), number = nr)\n",
" #print(filein + ': Exectime CDO Remap ' + str(one/nr))\n",
"\n",
" # SMM\n",
" xfield = xr.open_mfdataset(os.path.join(indir,filein))\n",
" wfield = cdo_generate_weights(os.path.join(indir,filein), tfile, method = 'con')\n",
" interpolator = Regridder(weights=wfield)\n",
" # var as the one which have time and not have bnds (could work)\n",
" myvar = [var for var in xfield.data_vars \n",
" if 'time' in xfield[var].dims and 'bnds' not in xfield[var].dims]\n",
" two = timeit.timeit(lambda: interpolator.regrid(xfield), number = nr)\n",
" three = timeit.timeit(lambda: interpolator.regrid(xfield[myvar]), number = nr)\n",
" four = timeit.timeit(lambda: interpolator.regrid(xfield[myvar], masked = False), number = nr)\n",
" data.append([one, two, three, four])\n",
"\n",
" #print(filein + ': Exectime SMM Remap (DataSet) ' + str(two/nr))\n",
" #print(filein + ': Exectime SMM Remap (DataArray) ' + str(three/nr))\n",
" #print(filein + ': Exectime SMM Remap (DataSet+NoMask) ' + str(four/nr))\n",
"\n",
"cnames = ['CDO', 'SMM (Dataset)', 'SMM (DataArray)', 'SMM (DataSet+NoMask)']\n",
"df = pd.DataFrame(data, index = filelist, columns = cnames)\n",
"df.div(df[cnames[0]],axis =0)\n",
"\n",
"client.shutdown()\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "DevECmean4",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "d1a27f430e855354fabe9b58ad426cbc88af57f8b66247655f5de977d5b44f64"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
10 changes: 8 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,17 @@ name : smmregrid
channels:
- conda-forge
dependencies:
- python>=3.8,<3.11
- python>=3.7,<3.11
- numpy
- netcdf4
- dask
- xarray
- cfgrib
- xesmf
- sparse
- cfunits
- cdo
- python-cdo
- pytest
- ipykernel
- pip:
- sparse==0.13.0
Loading

0 comments on commit 419e3a2

Please sign in to comment.