Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Devel/extend #1

Merged
merged 24 commits into from
Feb 13, 2023
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
51 changes: 51 additions & 0 deletions .github/workflows/mambatest.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,51 @@
# This workflow will install Python dependencies using Conda, run tests and lint with a single version of Python
# For more information see: https://autobencoder.com/2020-08-24-conda-actions/

name: Mamba PyTest

on:
push:
branches: [ main ]
pull_request:
branches: [ main ]
workflow_dispatch:

permissions:
contents: read

jobs:
build:
runs-on: ubuntu-latest
strategy:
fail-fast: false
matrix:
python-version: ["3.7", "3.8", "3.9", "3.10"]
defaults:
run:
shell: bash -el {0}
steps:
- uses: actions/checkout@v3
- name: provision-with-micromamba
uses: mamba-org/provision-with-micromamba@v14
with:
environment-file: environment.yml
environment-name: smmregrid
cache-downloads: true
extra-specs: |
python=${{ matrix.python-version }}
- name: Install smmregrid
run: |
# install package
pip install -e .
- name: Lint with flake8
run: |
# install flake8
conda install flake8
# stop the build if there are Python syntax errors or undefined names
flake8 . --count --select=E9,F63,F7,F82 --show-source --statistics
# exit-zero treats all errors as warnings. The GitHub editor is 127 chars wide
flake8 . --count --exit-zero --max-complexity=10 --max-line-length=127 --statistics
- name: Test with pytest
run: |
conda install pytest
python -m pytest
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
smmregrid.egg-info
__pycache__
*.idx
25 changes: 22 additions & 3 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,11 +1,30 @@
# smmregrid
A compact regridder using sparse matrix multiplication

This repository represents a modification of the regridding routines in [climtas](https://github.com/ScottWales/climtas) by Scott Wales, which already implements efficiently this idea and has no other significant dependencies (it does not use iris or esmf for regridding).
This repository represents a modification of the regridding routines in [climtas](https://github.com/ScottWales/climtas) by Scott Wales, which already implements efficiently this idea and has no other significant dependencies (it does not use iris).
The regridder uses efficiently sparse matrix multiplication with dask + some manipulation of the coordinates.

I only had to change a few lines of code to make it compatible with unstructured grids. The regridder uses efficiently sparse matrix multiplication with dask + some manipulation of the coordinates (which would have to be revised/checked again)
Please note that this tool is not thought as "another interpolation tool", but rather a method to apply pre-computed weights (with CDO, which is currently tested, and with ESMF, which is not yet supported) within the python environment.
The speedup is estimated to be about ~1.5 to ~5 times, slightly lower if then files are written to the disk. 2D and 3D data are supported on all the grids supported by CDO, both xarray.Dataset and xarray.DataArray can be used. Masks are treated in a simple way but are correctly transfered. Attributes are kept.

It is safer to run it through conda/mamba. Install with:

```
conda env create -f environment.yml
```

then activate the environment:

```
conda activate smmregrid
```
and install smmregrid in editable mode:

Install with
```
pip install -e .
```

Cautionary notes:
- It does not work correctly if the Xarray.Dataset includes fields with different land-sea masks (e.g. temperature and SST)
- It does not support ESMF weigths.

206 changes: 206 additions & 0 deletions dask_playground.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,206 @@
{
"cells": [
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Tests for SMM (with dask) versus CDO\n",
"\n",
"There are the same speed test but using dask. Surprisingly, the code is much slower. There should be something wrong"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stderr",
"output_type": "stream",
"text": [
"/work/users/paolo/miniconda3/envs/DevECmean4/lib/python3.10/site-packages/distributed/node.py:182: UserWarning: Port 8787 is already in use.\n",
"Perhaps you already have a cluster running?\n",
"Hosting the HTTP server on port 44465 instead\n",
" warnings.warn(\n"
]
}
],
"source": [
"from time import time\n",
"import timeit\n",
"import os\n",
"import numpy as np\n",
"import xarray as xr\n",
"from smmregrid import cdo_generate_weights, Regridder\n",
"from smmregrid.checker import check_cdo_regrid # this is a new function introduced to verify the output\n",
"from cdo import Cdo\n",
"import pandas as pd\n",
"cdo = Cdo()\n",
"\n",
"# where and which the data are\n",
"indir='tests/data'\n",
"filelist = ['onlytos-ipsl.nc','tas-ecearth.nc', '2t-era5.nc','tos-fesom.nc']\n",
"tfile = os.path.join(indir, 'r360x180.nc')\n",
"\n",
"# method for remapping\n",
"methods = ['nn','con']\n",
"accesses = ['DataArray', 'Data']\n",
"\n",
"from dask.distributed import LocalCluster, Client\n",
"cluster = LocalCluster(ip=\"0.0.0.0\", threads_per_worker=1, n_workers=2)\n",
"client = Client(cluster)"
]
},
{
"attachments": {},
"cell_type": "markdown",
"metadata": {},
"source": [
"# Remapping (with weights available)\n",
"\n",
"This is the real goal of smmregrid. Here we test the computation of the remap when the weights are pre-computed. Considering that SMM does not have to write anything to disk, it is several times faster, between 5 to 10. Running with Dataset implies a bit of overhead (20%). Masks so far does not seem to be an issue."
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"data": {
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>CDO</th>\n",
" <th>SMM (Dataset)</th>\n",
" <th>SMM (DataArray)</th>\n",
" <th>SMM (DataSet+NoMask)</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>onlytos-ipsl.nc</th>\n",
" <td>1.0</td>\n",
" <td>0.726599</td>\n",
" <td>0.427539</td>\n",
" <td>0.427731</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tas-ecearth.nc</th>\n",
" <td>1.0</td>\n",
" <td>0.902123</td>\n",
" <td>0.841263</td>\n",
" <td>0.869179</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2t-era5.nc</th>\n",
" <td>1.0</td>\n",
" <td>0.673339</td>\n",
" <td>0.694263</td>\n",
" <td>0.642410</td>\n",
" </tr>\n",
" <tr>\n",
" <th>tos-fesom.nc</th>\n",
" <td>1.0</td>\n",
" <td>0.407764</td>\n",
" <td>0.405918</td>\n",
" <td>0.411269</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"</div>"
],
"text/plain": [
" CDO SMM (Dataset) SMM (DataArray) SMM (DataSet+NoMask)\n",
"onlytos-ipsl.nc 1.0 0.726599 0.427539 0.427731\n",
"tas-ecearth.nc 1.0 0.902123 0.841263 0.869179\n",
"2t-era5.nc 1.0 0.673339 0.694263 0.642410\n",
"tos-fesom.nc 1.0 0.407764 0.405918 0.411269"
]
},
"execution_count": 2,
"metadata": {},
"output_type": "execute_result"
}
],
"source": [
"# nrepetition for the check\n",
"nr = 10\n",
"\n",
"data =[]\n",
"for filein in filelist: \n",
"\n",
" # CDO\n",
" wfile = cdo.gencon(tfile, input = os.path.join(indir,filein))\n",
" one = timeit.timeit(lambda: cdo.remap(tfile + ',' + wfile, input = os.path.join(indir,filein), returnXDataset = True), number = nr)\n",
" #print(filein + ': Exectime CDO Remap ' + str(one/nr))\n",
"\n",
" # SMM\n",
" xfield = xr.open_mfdataset(os.path.join(indir,filein))\n",
" wfield = cdo_generate_weights(os.path.join(indir,filein), tfile, method = 'con')\n",
" interpolator = Regridder(weights=wfield)\n",
" # var as the one which have time and not have bnds (could work)\n",
" myvar = [var for var in xfield.data_vars \n",
" if 'time' in xfield[var].dims and 'bnds' not in xfield[var].dims]\n",
" two = timeit.timeit(lambda: interpolator.regrid(xfield), number = nr)\n",
" three = timeit.timeit(lambda: interpolator.regrid(xfield[myvar]), number = nr)\n",
" four = timeit.timeit(lambda: interpolator.regrid(xfield[myvar], masked = False), number = nr)\n",
" data.append([one, two, three, four])\n",
"\n",
" #print(filein + ': Exectime SMM Remap (DataSet) ' + str(two/nr))\n",
" #print(filein + ': Exectime SMM Remap (DataArray) ' + str(three/nr))\n",
" #print(filein + ': Exectime SMM Remap (DataSet+NoMask) ' + str(four/nr))\n",
"\n",
"cnames = ['CDO', 'SMM (Dataset)', 'SMM (DataArray)', 'SMM (DataSet+NoMask)']\n",
"df = pd.DataFrame(data, index = filelist, columns = cnames)\n",
"df.div(df[cnames[0]],axis =0)\n",
"\n",
"client.shutdown()\n"
]
}
],
"metadata": {
"kernelspec": {
"display_name": "DevECmean4",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.10.8"
},
"orig_nbformat": 4,
"vscode": {
"interpreter": {
"hash": "d1a27f430e855354fabe9b58ad426cbc88af57f8b66247655f5de977d5b44f64"
}
}
},
"nbformat": 4,
"nbformat_minor": 2
}
10 changes: 8 additions & 2 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -5,11 +5,17 @@ name : smmregrid
channels:
- conda-forge
dependencies:
- python>=3.8,<3.11
- python>=3.7,<3.11
- numpy
- netcdf4
- dask
- xarray
- cfgrib
- xesmf
- sparse
- cfunits
- cdo
- python-cdo
- pytest
- ipykernel
- pip:
- sparse==0.13.0
Loading