Level creation now supports aggregation method mode #1078

TonioF · 2024-10-01T16:44:22Z

[Description of PR]

Checklist:

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/source/*
Changes documented in CHANGES.md
GitHub CI passes
AppVeyor CI passes
Test coverage remains or increases (target 100%)

… the value which is most frequent

TonioF · 2024-10-01T17:25:31Z

Solves #913

forman

Actually all good, thanks! However, please see my note on using Dask for the mode operation if input is a dask arrray too.

forman · 2024-10-07T10:20:52Z

xcube/core/subsampling.py

@@ -109,6 +113,10 @@ def subsample_dataset(
    return xr.Dataset(data_vars=new_data_vars, attrs=dataset.attrs)


+def _mode(x, axis, **kwargs):
+    return stats.mode(x, axis, nan_policy="omit", **kwargs).mode


This eagerly loads all data into memory, even for dask arrays. Please switch to dask version if x is a dask array.

forman · 2024-10-07T13:35:48Z

xcube/core/subsampling.py

+def _mode(x, axis, **kwargs):
+    def _scipy_mode(x, axis, **kwargs):
+        return stats.mode(x, axis, nan_policy="omit", **kwargs).mode
+    if isinstance(x, da.Array):
+        return x.map_blocks(_scipy_mode, axis=axis, dtype=x.dtype, **kwargs)
+    return _scipy_mode(x, axis, **kwargs)


Nice! However, a dedicated test would be even nicer :)

forman

Holy cow! You should now perform some refactorings to make code more readable and comprehensive again. See my suggestions.

forman · 2024-10-08T09:25:15Z

xcube/core/subsampling.py

+                    dim = dict()
+                    if x_name in var.dims:
+                        dim[x_name] = step
+                    if y_name in var.dims:
+                        dim[y_name] = step
+                    var_coarsen = var.coarsen(dim=dim, boundary="pad", coord_func="min")
+                    new_var: xr.DataArray = getattr(var_coarsen, agg_method)()


Extract function _agg_builtin and declare return type xr.DataArray. In accordance to helper _agg_mode below.

forman · 2024-10-08T09:26:19Z

xcube/core/subsampling.py

@@ -109,6 +114,37 @@ def subsample_dataset(
    return xr.Dataset(data_vars=new_data_vars, attrs=dataset.attrs)


+def _mode(var: xr.DataArray, x_name: str, y_name: str, step: int):


Rename to _agg_mode and declare return type xr.DataArray

forman · 2024-10-08T09:33:55Z

xcube/core/subsampling.py

+    dim = dict()
+    drop_axis = []
+    if x_name in var.dims:
+        dim[x_name] = step
+        drop_axis.append(var.dims.index(x_name))
+    if y_name in var.dims:
+        dim[y_name] = step
+        drop_axis.append(var.dims.index(y_name))
+    var_coarsen = var.coarsen(dim=dim, boundary="pad", coord_func="min")
+    if drop_axis[0] > drop_axis[1]:
+        drop_axis[0] += 2
+        drop_axis[1] += 1
+    else:
+        drop_axis[0] += 1
+        drop_axis[1] += 2


Extract helper function _get_drop_axis() that just computes drop_axis from var, x_name, y_name. The remaining code is a duplication of the first lines of new helper _agg_builtin(), see above. You can now extract from both _agg_mode() and _agg_builtin() common:

def _coarsen(vat, step, x_name, y_name, step): dim = dict() if x_name in var.dims: dim[x_name] = step if y_name in var.dims: dim[y_name] = step return var.coarsen(dim=dim, boundary="pad", coord_func="min")

Also explain the magic numbers 1, 2 used for drop_axis. They are not clear to me.

forman

Great!

TonioF added 2 commits October 1, 2024 18:37

Level creation now supports aggregation method mode to aggregate to…

7227fc1

… the value which is most frequent

updated docs

aa0bf7a

TonioF added enhancement New feature or request DOORS labels Oct 1, 2024

TonioF requested a review from forman October 1, 2024 17:24

TonioF marked this pull request as ready for review October 1, 2024 17:24

forman requested changes Oct 7, 2024

View reviewed changes

forman assigned TonioF Oct 7, 2024

consider dask arrays

f7514c4

TonioF requested a review from forman October 7, 2024 13:25

forman approved these changes Oct 7, 2024

View reviewed changes

made function dask compatible

17d5d0d

forman self-requested a review October 8, 2024 09:19

forman requested changes Oct 8, 2024

View reviewed changes

TonioF requested a review from forman October 8, 2024 12:29

refactoring

3510ab5

forman approved these changes Oct 8, 2024

View reviewed changes

TonioF merged commit 16cef10 into main Oct 8, 2024
1 of 2 checks passed

TonioF deleted the toniof-xxx-agg_most_frequent branch October 8, 2024 12:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Level creation now supports aggregation method mode #1078

Level creation now supports aggregation method mode #1078

TonioF commented Oct 1, 2024 •

edited

Loading

TonioF commented Oct 1, 2024

forman left a comment

forman Oct 7, 2024

forman Oct 7, 2024

forman left a comment

forman Oct 8, 2024

forman Oct 8, 2024

forman Oct 8, 2024

forman Oct 8, 2024

forman left a comment

		@@ -109,6 +114,37 @@ def subsample_dataset(
		return xr.Dataset(data_vars=new_data_vars, attrs=dataset.attrs)


		def _mode(var: xr.DataArray, x_name: str, y_name: str, step: int):

Level creation now supports aggregation method mode #1078

Level creation now supports aggregation method mode #1078

Conversation

TonioF commented Oct 1, 2024 • edited Loading

TonioF commented Oct 1, 2024

forman left a comment

Choose a reason for hiding this comment

forman Oct 7, 2024

Choose a reason for hiding this comment

forman Oct 7, 2024

Choose a reason for hiding this comment

forman left a comment

Choose a reason for hiding this comment

forman Oct 8, 2024

Choose a reason for hiding this comment

forman Oct 8, 2024

Choose a reason for hiding this comment

forman Oct 8, 2024

Choose a reason for hiding this comment

forman Oct 8, 2024

Choose a reason for hiding this comment

forman left a comment

Choose a reason for hiding this comment

TonioF commented Oct 1, 2024 •

edited

Loading