-
Notifications
You must be signed in to change notification settings - Fork 18
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Level creation now supports aggregation method mode #1078
Conversation
… the value which is most frequent
Solves #913 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually all good, thanks! However, please see my note on using Dask for the mode
operation if input is a dask arrray too.
xcube/core/subsampling.py
Outdated
@@ -109,6 +113,10 @@ def subsample_dataset( | |||
return xr.Dataset(data_vars=new_data_vars, attrs=dataset.attrs) | |||
|
|||
|
|||
def _mode(x, axis, **kwargs): | |||
return stats.mode(x, axis, nan_policy="omit", **kwargs).mode |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This eagerly loads all data into memory, even for dask arrays. Please switch to dask version if x
is a dask array.
xcube/core/subsampling.py
Outdated
def _mode(x, axis, **kwargs): | ||
def _scipy_mode(x, axis, **kwargs): | ||
return stats.mode(x, axis, nan_policy="omit", **kwargs).mode | ||
if isinstance(x, da.Array): | ||
return x.map_blocks(_scipy_mode, axis=axis, dtype=x.dtype, **kwargs) | ||
return _scipy_mode(x, axis, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! However, a dedicated test would be even nicer :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Holy cow! You should now perform some refactorings to make code more readable and comprehensive again. See my suggestions.
xcube/core/subsampling.py
Outdated
dim = dict() | ||
if x_name in var.dims: | ||
dim[x_name] = step | ||
if y_name in var.dims: | ||
dim[y_name] = step | ||
var_coarsen = var.coarsen(dim=dim, boundary="pad", coord_func="min") | ||
new_var: xr.DataArray = getattr(var_coarsen, agg_method)() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract function _agg_builtin
and declare return type xr.DataArray
. In accordance to helper _agg_mode
below.
xcube/core/subsampling.py
Outdated
@@ -109,6 +114,37 @@ def subsample_dataset( | |||
return xr.Dataset(data_vars=new_data_vars, attrs=dataset.attrs) | |||
|
|||
|
|||
def _mode(var: xr.DataArray, x_name: str, y_name: str, step: int): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename to _agg_mode
and declare return type xr.DataArray
xcube/core/subsampling.py
Outdated
dim = dict() | ||
drop_axis = [] | ||
if x_name in var.dims: | ||
dim[x_name] = step | ||
drop_axis.append(var.dims.index(x_name)) | ||
if y_name in var.dims: | ||
dim[y_name] = step | ||
drop_axis.append(var.dims.index(y_name)) | ||
var_coarsen = var.coarsen(dim=dim, boundary="pad", coord_func="min") | ||
if drop_axis[0] > drop_axis[1]: | ||
drop_axis[0] += 2 | ||
drop_axis[1] += 1 | ||
else: | ||
drop_axis[0] += 1 | ||
drop_axis[1] += 2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Extract helper function _get_drop_axis()
that just computes drop_axis
from var
, x_name
, y_name
. The remaining code is a duplication of the first lines of new helper _agg_builtin()
, see above. You can now extract from both _agg_mode()
and _agg_builtin()
common:
def _coarsen(vat, step, x_name, y_name, step):
dim = dict()
if x_name in var.dims:
dim[x_name] = step
if y_name in var.dims:
dim[y_name] = step
return var.coarsen(dim=dim, boundary="pad", coord_func="min")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also explain the magic numbers 1
, 2
used for drop_axis. They are not clear to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great!
[Description of PR]
Checklist:
docs/source/*
CHANGES.md