Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove unit registry, use cf-xarray's or any other. #57

Merged
merged 9 commits into from
Feb 11, 2025
3 changes: 2 additions & 1 deletion CHANGELOG.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,8 @@ Contributors: Pascal Bourgault (:user:`aulemahal`), Éric Dupuis (:user:`coxipi`

Changes
^^^^^^^
* No change.
* Remove the units registry declaration and instead use whatever is set as pint's application registry.
Code still assumes it is a registry based upon the one in cf-xarray (which exports the `cf` formatter). (:issue:`44`, :pull:`57`).

Fixes
^^^^^
Expand Down
31 changes: 30 additions & 1 deletion docs/usage.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@
Basic Usage
===========

`xsdba` performs a training on a source dataset `hist` to reproduce a target `ref`. "ref" is short for reference, and "hist" for historical: In climate services, it is common to refer to the training data as historical, a period of time where observations are available. The training is then applied to a dataset `sim` to obtain an adjusted dataset. This may simply be the training data `hist` which is not adjusted in the first training part. An example with a basic Quantile Mapping is given below.
``xsdba`` performs a training on a source dataset `hist` to reproduce a target `ref`. "ref" is short for reference, and "hist" for historical: In climate services, it is common to refer to the training data as historical, a period of time where observations are available. The training is then applied to a dataset `sim` to obtain an adjusted dataset. This may simply be the training data `hist` which is not adjusted in the first training part. An example with a basic Quantile Mapping is given below.

.. code-block:: python

Expand All @@ -14,3 +14,32 @@ Basic Usage
# Perform adjust for data outside the training period, `sim`
adj = ADJ.adjust(sim=sim)
..


Units handling
--------------

``xsdba`` implements some unit handling through ``pint``. High level adjustment objects will (usually) parse units from the passed xarray objects, check that the different inputs have matching units and perform conversions when units are compatible but don't match. ``xsdba`` imports `cf-xarray's unit registry <https://cf-xarray.readthedocs.io/en/latest/units.html>`_ by default and, as such, expects units matching the `CF conventions <https://cfconventions.org/Data/cf-conventions/cf-conventions-1.12/cf-conventions.html#units>`_. This is very similar to ``xclim``, except for two features:

- ``xclim`` is able to detect more complex conversions based on the CF "standard name". ``xsdba`` will not implement this, the user is expected to provide coherent inputs to the adjustment objects.
- ``xclim`` has a convenience "hydro" context that allows for conversion between rates and fluxes, thicknesses and amounts of liquid water quantities by using an implicit the water density (1000 kg/m³).

The context of that last point can still be used if ``xclim`` is imported beside ``xsdba`` :

.. code-block:: python

import xsdba
import xclim # importing xclim makes xsdba use xclim's unit registry

pr_ref # reference precipitation data in kg m-2 s-1
pr_hist # historical precipitation data in mm/d
# In normal xsdba, the two quantities are not compatible.
# But with xclim's hydro context, an implicit density allows for conversion

with xclim.core.units.units.context("hydro"):
QDM = xsdba.QuantileDeltaMapping.train(ref=ref, hist=hist)

..


Under the hood, ``xsdba`` picks whatever unit registry has been declared the "application registry" (`see pint's doc <https://pint.readthedocs.io/en/stable/api/base.html#pint.get_application_registry>`_). However, it expects some features as declared in ``cf-xarray``, so a compatible registry (as xclim's) must be used.
2 changes: 1 addition & 1 deletion src/xsdba/properties.py
Original file line number Diff line number Diff line change
Expand Up @@ -167,7 +167,7 @@ def _var(da: xr.DataArray, *, group: str | Grouper = "time") -> xr.DataArray:
if group.prop != "group":
da = da.groupby(group.name)
out = da.var(dim=group.dim)
out.attrs["units"] = str((units(u) ** 2).units)
out.attrs["units"] = f"{(units(u) ** 2).units:cf}"
return out


Expand Down
50 changes: 13 additions & 37 deletions src/xsdba/units.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,24 +6,14 @@
from __future__ import annotations

import inspect
from copy import deepcopy
from functools import wraps
from typing import Any, cast

import pint
from pint import UndefinedUnitError

# this dependency is "necessary" for convert_units_to
# this dependency is "necessary" for convert_units_to and all unit printing (which use the CF formatter)
# if we only do checks, we could get rid of it

try:
# allows to use cf units
import cf_xarray.units
except ImportError: # noqa: S110
# cf-xarray is not installed, this will not be used
pass

import cf_xarray.units # noqa: F401
import numpy as np
import pint
import xarray as xr

from xsdba.base import parse_offset
Expand All @@ -42,23 +32,7 @@
"units2str",
]

# shamelessly adapted from `cf-xarray` (which adopted it from MetPy and xclim itself)
units = deepcopy(cf_xarray.units.units)
# Changing the default string format for units/quantities.
# CF is implemented by cf-xarray, g is the most versatile float format.
# The following try/except logic can be removed when xclim drops support numpy <2.0.
try:
units.formatter.default_format = "gcf"
except UndefinedUnitError:
units.default_format = "gcf"
# Switch this flag back to False. Not sure what that implies, but it breaks some tests.
units.force_ndarray_like = False # noqa: F841
# Another alias not included by cf_xarray
units.define("@alias percent = pct")

# Default context.
null = pint.Context("none")
units.add_context(null)
units = pint.get_application_registry()

FREQ_UNITS = {
"D": "d",
Expand Down Expand Up @@ -221,7 +195,8 @@ def units2str(value: xr.DataArray | str | units.Quantity | units.Unit) -> str:
pint.Unit
Units of the data array.
"""
return str(units2pint(value))
# Ensure we use CF's formatter. (default with xclim, but not with only cf-xarray)
return f"{units2pint(value):cf}"


# XC
Expand Down Expand Up @@ -269,8 +244,8 @@ def pint_multiply(
f = f.to(out_units)
else:
f = f.to_reduced_units()
out: xr.DataArray = da * f.magnitude
out = out.assign_attrs(units=str(f.units))
out: xr.DataArray = da * float(f.magnitude)
out = out.assign_attrs(units=f"{f.units:cf}")
return out


Expand Down Expand Up @@ -299,7 +274,7 @@ def pint2cfattrs(value: units.Quantity | units.Unit, is_difference=None) -> dict
Units following CF-Convention, using symbols.
"""
value = value if isinstance(value, pint.Unit | units.Unit) else value.units
s = str(value)
s = f"{value:cf}"
if "delta_" in s:
is_difference = True
s = s.replace("delta_", "")
Expand Down Expand Up @@ -344,13 +319,14 @@ def convert_units_to( # noqa: C901
target_unit = units2str(target)
source_unit = units2str(source)
if target_unit == source_unit:
return source if isinstance(source, str) is False else str2pint(source).m
return source if not isinstance(source, str) else float(str2pint(source).m)
else: # Convert units
if isinstance(source, xr.DataArray):
out = source.copy(data=units.convert(source.data, source_unit, target_unit))
out = out.assign_attrs(units=target_unit)
else:
out = str2pint(source).to(target_unit).m
else: # scalar
# explicit float cast because cf-xarray's registry ouputs 0-dim arrays
out = float(str2pint(source).to(target_unit).m)
return out


Expand Down
6 changes: 0 additions & 6 deletions tests/test_units.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,9 +19,6 @@ def test_fraction(self):
q = 5 * units.percent
assert q.to("dimensionless") == 0.05

q = 5 * units.parse_units("pct")
assert q.to("dimensionless") == 0.05


class TestConvertUnitsTo:
@pytest.mark.parametrize(
Expand All @@ -40,9 +37,6 @@ def test_pint2str(self):
u = units("percent")
assert str(u.units) == "%"

u = units("pct")
assert str(u.units) == "%"

def test_units2pint(self, timelonlatseries):
pytest.importorskip("cf-xarray")
u = units2pint(timelonlatseries([1, 2], attrs={"units": "kg m-2 s-1"}))
Expand Down
Loading