Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Level creation now supports aggregation method mode #1078

Merged
merged 5 commits into from
Oct 8, 2024

Conversation

TonioF
Copy link
Contributor

@TonioF TonioF commented Oct 1, 2024

[Description of PR]

Checklist:

  • Add unit tests and/or doctests in docstrings
  • Add docstrings and API docs for any new/modified user-facing classes and functions
  • New/modified features documented in docs/source/*
  • Changes documented in CHANGES.md
  • GitHub CI passes
  • AppVeyor CI passes
  • Test coverage remains or increases (target 100%)

@TonioF TonioF added enhancement New feature or request DOORS labels Oct 1, 2024
@TonioF TonioF requested a review from forman October 1, 2024 17:24
@TonioF TonioF marked this pull request as ready for review October 1, 2024 17:24
@TonioF
Copy link
Contributor Author

TonioF commented Oct 1, 2024

Solves #913

Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually all good, thanks! However, please see my note on using Dask for the mode operation if input is a dask arrray too.

@@ -109,6 +113,10 @@ def subsample_dataset(
return xr.Dataset(data_vars=new_data_vars, attrs=dataset.attrs)


def _mode(x, axis, **kwargs):
return stats.mode(x, axis, nan_policy="omit", **kwargs).mode
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This eagerly loads all data into memory, even for dask arrays. Please switch to dask version if x is a dask array.

@TonioF TonioF requested a review from forman October 7, 2024 13:25
Comment on lines 117 to 122
def _mode(x, axis, **kwargs):
def _scipy_mode(x, axis, **kwargs):
return stats.mode(x, axis, nan_policy="omit", **kwargs).mode
if isinstance(x, da.Array):
return x.map_blocks(_scipy_mode, axis=axis, dtype=x.dtype, **kwargs)
return _scipy_mode(x, axis, **kwargs)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice! However, a dedicated test would be even nicer :)

@forman forman self-requested a review October 8, 2024 09:19
Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Holy cow! You should now perform some refactorings to make code more readable and comprehensive again. See my suggestions.

Comment on lines 77 to 83
dim = dict()
if x_name in var.dims:
dim[x_name] = step
if y_name in var.dims:
dim[y_name] = step
var_coarsen = var.coarsen(dim=dim, boundary="pad", coord_func="min")
new_var: xr.DataArray = getattr(var_coarsen, agg_method)()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract function _agg_builtin and declare return type xr.DataArray. In accordance to helper _agg_mode below.

@@ -109,6 +114,37 @@ def subsample_dataset(
return xr.Dataset(data_vars=new_data_vars, attrs=dataset.attrs)


def _mode(var: xr.DataArray, x_name: str, y_name: str, step: int):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rename to _agg_mode and declare return type xr.DataArray

Comment on lines 118 to 132
dim = dict()
drop_axis = []
if x_name in var.dims:
dim[x_name] = step
drop_axis.append(var.dims.index(x_name))
if y_name in var.dims:
dim[y_name] = step
drop_axis.append(var.dims.index(y_name))
var_coarsen = var.coarsen(dim=dim, boundary="pad", coord_func="min")
if drop_axis[0] > drop_axis[1]:
drop_axis[0] += 2
drop_axis[1] += 1
else:
drop_axis[0] += 1
drop_axis[1] += 2
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extract helper function _get_drop_axis() that just computes drop_axis from var, x_name, y_name. The remaining code is a duplication of the first lines of new helper _agg_builtin(), see above. You can now extract from both _agg_mode() and _agg_builtin() common:

def _coarsen(vat, step, x_name, y_name, step):
  dim = dict()
  if x_name in var.dims:
      dim[x_name] = step
  if y_name in var.dims:
      dim[y_name] = step
  return var.coarsen(dim=dim, boundary="pad", coord_func="min")

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also explain the magic numbers 1, 2 used for drop_axis. They are not clear to me.

@TonioF TonioF requested a review from forman October 8, 2024 12:29
Copy link
Member

@forman forman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great!

@TonioF TonioF merged commit 16cef10 into main Oct 8, 2024
1 of 2 checks passed
@TonioF TonioF deleted the toniof-xxx-agg_most_frequent branch October 8, 2024 12:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
DOORS enhancement New feature or request
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants