Merge pull request #1149 from Ouranosinc/sdba-props-as-inds

Statistical properties as indicators
Ouranosinc · Aug 30, 2022 · 67508f8 · 67508f8
2 parents b2090ff + c8fe47a
commit 67508f8
Show file tree

Hide file tree

Showing 12 changed files with 420 additions and 374 deletions.
diff --git a/HISTORY.rst b/HISTORY.rst
@@ -11,7 +11,9 @@ New features and enhancements
 * Adjustment methods of `SBCK <https://github.com/yrobink/SBCK>`_ are wrapped into xclim when that package is installed. (:issue:`1109`, :pull:`1115`).
     - Wrapped SBCK tests are also properly run in the tox testing ensemble. (:pull:`1119`).
 * Method ``FAO_PM98`` (based on Penman-Monteith formula) to compute potential evapotranspiration. (:pull:`1122`).
+* New indices for droughts: SPI (standardized precipitations) and SPEI (standardized water budgets) (:issue:`131`, :pull:`1096`)
 * Most numba functions of ``sdba.nbutils`` now use the lazy compilation mode. This significantly accelerates the import time of xclim. (:issue:`1135`, :pull:`1167`).
+* Statistical properties and measures from ``xclim.sdba`` are now Indicator subclasses (:pull:`1149`).
 
 New indicators
 ^^^^^^^^^^^^^^
@@ -24,7 +26,7 @@ New indicators
 Breaking changes
 ^^^^^^^^^^^^^^^^
 * `scipy` has been temporarily pinned below version 1.9 until lmoments3 tests can be rewritten to account for the new API. (:issue:`1142`, :pull:`1143`).
-* Now requires `xarray>=2022.06.0` (:pull:`1151`).
+* `xclim` now requires `xarray>=2022.06.0`. (:pull:`1151`).
 * Documentation CI (ReadTheDocs) builds will now fail if there are any misconfigured pages, internal link/reference warnings, or broken external hyperlinks. (:issue:`1094`, :pull:`1131`, :issue:`1139`, :pull:`1140`, :pull:`1160`).
 * Call signatures for generic indices have been reordered and/or modified to accept `op`, and optionally `constrain`, in many cases, and `condition`/`conditional`/`operation` has been renamed to `op` for consistency. (:issue:`389`, :pull:`1157`). The affected indices are as follows:
     - `get_op`, `compare`, `threshold_count`, `get_daily_events`, `count_level_crossings`, `count_occurrences`, `first_occurrence`, `last_occurrence`, `spell_length`, `thresholded_statistics`, `temperature_sum`, `degree_days`.
@@ -34,6 +36,7 @@ Breaking changes
     - ``xclim.indices._multivariate.daily_freezethaw_cycles`` -> Replaceable with the generic ``multiday_temperature_swing`` with `thresh_tasmax='0 degC'`, `thresh_tasmin='0 degC'`, `window=1`, and `op='sum'`. The indicator version (``xclim.atmos.daily_freezethaw_cycles``) is unaffected.
     - ``xclim.indices.generic.select_time`` -> Was previously moved to ``xclim.core.calendar``.
 * The `clix-meta` indicator table parsing function (``xclim.core.utils.adapt_clix_meta_yaml``) has been adapted to support the new "op" operator handler. (:pull:`1157`).
+* Because they have been reimplmented as Indicator subclasses, statistical properties and measures of ``xclim.sdba`` no longer preserve attributes of their inputs by default. Use ``xclim.set_options(keep_attrs=True)`` to get the previous behaviour. (:pull:`1149`).
 
 Bug fixes
 ^^^^^^^^^

diff --git a/docs/notebooks/sdba-advanced.ipynb b/docs/notebooks/sdba-advanced.ipynb
@@ -697,12 +697,11 @@
    "source": [
     "## Tests for sdba\n",
     "\n",
-    "It can be useful to perform diagnostic tests on adjusted simulations to assess if the bias correction method is working\n",
-    "properly or to compare two different bias correction techniques.\n",
+    "It can be useful to perform diagnostic tests on adjusted simulations to assess if the bias correction method is working properly or to compare two different bias correction techniques.\n",
     "\n",
-    "A diagnostic test includes calculations of a property (mean, 20-year return value, annual cycle amplitude, ...) on the\n",
-    " simulation and on the scenario (adjusted simulation), then a measure (bias, relative bias, ratio, ...) of the\n",
-    "  difference. The property collapse the time dimension of the simulation/scenario and returns one value by grid point."
+    "A diagnostic test includes calculations of a property (mean, 20-year return value, annual cycle amplitude, ...) on the simulation and on the scenario (adjusted simulation), then a measure (bias, relative bias, ratio, ...) of the  difference. Usually, the property collapse the time dimension of the simulation/scenario and returns one value by grid point.\n",
+    "\n",
+    "You'll find those in ``xclim.sdba.properties`` and ``xclim.sdba.measures``, where they are implemented as special subclasses of xclim's ``Indicator``, which means they can be worked with the same way as conventional indicators (used in yaml modules for example)."
    ]
   },
   {
@@ -762,21 +761,21 @@
    "outputs": [],
    "source": [
     "# calculate the mean warm Spell Length Distribution\n",
-    "sim_prop = xc.sdba.properties.spell_length_distribution(\n",
+    "sim_prop = sdba.properties.spell_length_distribution(\n",
     "    da=sim, thresh=\"28 degC\", op=\">\", stat=\"mean\", group=\"time\"\n",
     ")\n",
     "\n",
     "\n",
-    "scen_prop = xc.sdba.properties.spell_length_distribution(\n",
+    "scen_prop = sdba.properties.spell_length_distribution(\n",
     "    da=scen, thresh=\"28 degC\", op=\">\", stat=\"mean\", group=\"time\"\n",
     ")\n",
     "\n",
-    "ref_prop = xc.sdba.properties.spell_length_distribution(\n",
+    "ref_prop = sdba.properties.spell_length_distribution(\n",
     "    da=ref_future, thresh=\"28 degC\", op=\">\", stat=\"mean\", group=\"time\"\n",
     ")\n",
     "# measure the difference between the prediction and the reference with an absolute bias of the properties\n",
-    "measure_sim = xc.sdba.measures.bias(sim_prop, ref_prop)\n",
-    "measure_scen = xc.sdba.measures.bias(scen_prop, ref_prop)\n",
+    "measure_sim = sdba.measures.bias(sim_prop, ref_prop)\n",
+    "measure_scen = sdba.measures.bias(scen_prop, ref_prop)\n",
     "\n",
     "plt.figure(figsize=(5, 3))\n",
     "plt.plot(measure_sim.location, measure_sim.values, \".\", label=\"biased model (sim)\")\n",
@@ -807,20 +806,21 @@
    "outputs": [],
    "source": [
     "# calculate the mean warm Spell Length Distribution\n",
-    "sim_prop = xc.sdba.properties.spell_length_distribution(\n",
+    "sim_prop = sdba.properties.spell_length_distribution(\n",
     "    da=sim, thresh=\"28 degC\", op=\">\", stat=\"mean\", group=\"time.season\"\n",
     ")\n",
     "\n",
-    "scen_prop = xc.sdba.properties.spell_length_distribution(\n",
+    "scen_prop = sdba.properties.spell_length_distribution(\n",
     "    da=scen, thresh=\"28 degC\", op=\">\", stat=\"mean\", group=\"time.season\"\n",
     ")\n",
     "\n",
-    "ref_prop = xc.sdba.properties.spell_length_distribution(\n",
+    "ref_prop = sdba.properties.spell_length_distribution(\n",
     "    da=ref_future, thresh=\"28 degC\", op=\">\", stat=\"mean\", group=\"time.season\"\n",
     ")\n",
-    "# measure the difference between the prediction and the reference with an absolute bias the properties\n",
-    "measure_sim = xc.sdba.measures.bias(sim_prop, ref_prop)\n",
-    "measure_scen = xc.sdba.measures.bias(scen_prop, ref_prop)\n",
+    "# Properties are often associated with the same measures. This correspondance is implemented in xclim:\n",
+    "measure = sdba.properties.spell_length_distribution.get_measure()\n",
+    "measure_sim = measure(sim_prop, ref_prop)\n",
+    "measure_scen = measure(scen_prop, ref_prop)\n",
     "\n",
     "fig, axs = plt.subplots(2, 2, figsize=(9, 6))\n",
     "axs = axs.ravel()\n",
@@ -860,7 +860,7 @@
    "name": "python",
    "nbconvert_exporter": "python",
    "pygments_lexer": "ipython3",
-   "version": "3.10.4"
+   "version": "3.10.5"
   },
   "toc": {
    "base_numbering": 1,

diff --git a/docs/sdba.rst b/docs/sdba.rst
@@ -80,12 +80,12 @@ SDBA User API
 
 .. automodule:: xclim.sdba.properties
    :members:
-   :exclude-members: register_statistical_properties
+   :exclude-members: StatisticalProperty
    :noindex:
 
 .. automodule:: xclim.sdba.measures
    :members:
-   :exclude-members: check_same_units_and_convert
+   :exclude-members: StatisticalMeasure
    :noindex:
 
 Developer tools
@@ -109,10 +109,10 @@ Developer tools
    :members:
    :noindex:
 
-.. autofunction:: xclim.sdba.properties.register_statistical_properties
+.. autofunction:: xclim.sdba.properties.StatisticalProperty
    :noindex:
 
-.. autofunction:: xclim.sdba.measures.check_same_units_and_convert
+.. autofunction:: xclim.sdba.measures.StatisticalMeasure
    :noindex:
 
 .. only:: html or text

diff --git a/xclim/core/formatting.py b/xclim/core/formatting.py
@@ -446,7 +446,7 @@ def gen_call_string(funcname: str, *args, **kwargs):
     Example
     -------
     >>> A = xr.DataArray([1], dims=("x",), name="A")
-    >>> gen_call_string("func", A, b=2.0, c="3", d=[4, 5, 6])
+    >>> gen_call_string("func", A, b=2.0, c="3", d=[10] * 100)
     "func(A, b=2.0, c='3', d=<list>)"
     """
     elements = []
@@ -457,7 +457,9 @@ def gen_call_string(funcname: str, *args, **kwargs):
         elif isinstance(val, (int, float, str, bool)) or val is None:
             rep = repr(val)
         else:
-            rep = f"<{type(val).__name__}>"
+            rep = repr(val)
+            if len(rep) > 50:
+                rep = f"<{type(val).__name__}>"
 
         if name is not None:
             rep = f"{name}={rep}"

diff --git a/xclim/core/indicator.py b/xclim/core/indicator.py
@@ -374,7 +374,8 @@ class Indicator(IndicatorRegistrar):
     fields could also be present if the indicator was created from outside xclim.
 
     var_name:
-      Output variable(s) name(s).
+      Output variable(s) name(s). For derived single-output indicators, this field is not
+      inherited from the parent indicator and defaults to the identifier.
     standard_name:
       Variable name, must be in the CF standard names table (this is not checked).
     long_name:
@@ -428,7 +429,11 @@ def __new__(cls, **kwds):
         parameters = cls._ensure_correct_parameters(parameters)
 
         # If needed, wrap compute with declare units
-        if "compute" in kwds and not hasattr(kwds["compute"], "in_units"):
+        if (
+            "compute" in kwds
+            and not hasattr(kwds["compute"], "in_units")
+            and "_variable_mapping" in kwds
+        ):
             # We actually need the inverse mapping (to get cmip6 name -> arg name)
             inv_var_map = dict(map(reversed, kwds["_variable_mapping"].items()))
             # parameters has already been update above.
@@ -460,9 +465,9 @@ def __new__(cls, **kwds):
 
         # Priority given to passed realm -> parent's realm -> location of the class declaration (official inds only)
         kwds.setdefault("realm", cls.realm or xclim_realm)
-        if kwds["realm"] not in ["atmos", "seaIce", "land", "ocean"]:
+        if kwds["realm"] not in ["atmos", "seaIce", "land", "ocean", "generic"]:
             raise AttributeError(
-                "Indicator's realm must be given as one of 'atmos', 'seaIce', 'land' or 'ocean'"
+                "Indicator's realm must be given as one of 'atmos', 'seaIce', 'land', 'ocean' or 'generic'"
             )
 
         # Create new class object
@@ -591,12 +596,12 @@ def _ensure_correct_parameters(cls, parameters):
         """
         for name, meta in parameters.items():
             if not meta.injected:
-                if meta.kind <= InputKind.OPTIONAL_VARIABLE and meta.units is _empty:
-                    raise ValueError(
-                        f"Input variable {name} is missing expected units. Units are "
-                        "parsed either from the declare_units decorator or from the "
-                        "variable mapping (arg name to CMIP6 name) passed in `input`"
-                    )
+                # if meta.kind <= InputKind.OPTIONAL_VARIABLE and meta.units is _empty:
+                #     raise ValueError(
+                #         f"Input variable {name} is missing expected units. Units are "
+                #         "parsed either from the declare_units decorator or from the "
+                #         "variable mapping (arg name to CMIP6 name) passed in `input`"
+                #     )
                 if meta.kind == InputKind.OPTIONAL_VARIABLE:
                     meta.default = None
                 elif meta.kind in [InputKind.VARIABLE]:
@@ -626,8 +631,6 @@ def _parse_output_attrs(
         if isinstance(cf_attrs, dict):
             # Single output indicator, but we store as a list anyway.
             cf_attrs = [cf_attrs]
-        elif cf_attrs is None and parent_cf_attrs:
-            cf_attrs = deepcopy(parent_cf_attrs)
         elif cf_attrs is None:
             # Attributes were passed the "old" way, with lists or strings directly (only _cf_names)
             # We need to get the number of outputs first, defaulting to the length of parent's cf_attrs or 1
@@ -655,7 +658,7 @@ def _parse_output_attrs(
                         attrs[name] = value
         # else we assume a list of dicts
 
-        # For single output, var_name defauls to identifer.
+        # For single output, var_name defaults to identifier.
         if len(cf_attrs) == 1 and "var_name" not in cf_attrs[0]:
             cf_attrs[0]["var_name"] = identifier
 
@@ -703,9 +706,9 @@ def from_dict(
         data = data.copy()
         if "base" in data:
             if isinstance(data["base"], str):
-                cls = registry.get(
-                    data["base"].upper(), base_registry.get(data["base"])
-                )
+                parts = data["base"].split(".")
+                registry_id = ".".join([*parts[:-1], parts[-1].upper()])
+                cls = registry.get(registry_id, base_registry.get(data["base"]))
                 if cls is None:
                     raise ValueError(
                         f"Requested base class {data['base']} is neither in the "

diff --git a/xclim/ensembles/_base.py b/xclim/ensembles/_base.py
@@ -379,7 +379,8 @@ def _ens_align_datasets(
             time = xr.decode_cf(ds).time
 
             if resample_freq is not None:
-                counts = time.resample(time=resample_freq).count()
+                # Cast to bool to avoid bug in flox/numpy_groupies (xarray-contrib/flox#137)
+                counts = time.astype(bool).resample(time=resample_freq).count()
                 if any(counts > 1):
                     raise ValueError(
                         f"Alignment of dataset #{i:02d} failed: "

diff --git a/xclim/sdba/base.py b/xclim/sdba/base.py
@@ -5,8 +5,9 @@
 """
 from __future__ import annotations
 
-from inspect import signature
-from typing import Callable, Mapping, Sequence, Union
+from inspect import _empty, signature
+from types import FunctionType
+from typing import Callable, Mapping, Sequence
 
 import dask.array as dsk
 import jsonpickle
@@ -56,11 +57,20 @@ def parameters(self):
 
     def __repr__(self):
         """Return a string representation."""
+        # Get default values from the init signature
+        defaults = {
+            # A default value of None could mean an empty mutable object
+            n: [p.default] if p.default is not None else [[], {}, set(), None]
+            for n, p in signature(self.__init__).parameters.items()
+            if p.default is not _empty
+        }
+        # The representation only includes the parameters with a value different from their default
+        # and those not explicitly excluded.
         params = ", ".join(
             [
                 f"{k}={repr(v)}"
                 for k, v in self.items()
-                if k not in self._repr_hide_params
+                if k not in self._repr_hide_params and v not in defaults.get(k, [])
             ]
         )
         return f"{self.__class__.__name__}({params})"