Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
<!--Please ensure the PR fulfills the following requirements! --> <!-- If this is your first PR, make sure to add your details to the AUTHORS.rst! --> ### Pull Request Checklist: - [x] This PR addresses an already opened issue (for bug fixes / features) - This PR fixes #2000 and fixes #1820 and should please @tlogan2000 . - [x] Tests for the changes have been added (for bug fixes / features) - [x] (If applicable) Documentation has been added / updated (for bug fixes / features) - [x] CHANGELOG.rst has been updated (with summary of main changes) - [x] Link to issue (:issue:`number`) and pull request (:pull:`number`) has been added ### What kind of change does this PR introduce? Refactor of the Missing objects. I tried to follow a more orthodox OOP approach. In the new way: - Objects are initialized with their options, called with the data (+ freq, src_timestep, indexer) - Subclasses should override: + `__init__` to explicitly override the signature and document their options, but this method should not do anything. + `validate`, a static method, which returns False on invalid options (this is the same as before). + `is_missing`, which receives `null`, `count` and `freq`. It does the same as before. + (optionnaly) `_validate_src_timestep`, to validate the `src_timestep` at call time. Only useful for MissingWMO which is restricted to daily inputs. - Before, input validation was done in a few places, now it is almost only done in `__call__`, which is not meant to be overriden. - The methods do not receive`null` as a `DataArrayResample` object anymore, but as a normal `DataArray`. This allows a bit more flexibility, which I use to optimise `MissingWMO` by using `resample_map` on the `longest_run` condition. Benchmarking to come. - New `MissingTwoSteps` subclass used by `MissingPct` and `AtLeastNValid` (and `MissingWMO`, but not in a new way). This adds a `subfreq` option which can be used to divide the mask computation in two steps. 1. Compute the mask at `subfreq` using the given method 2. Merge the sub-groups at the target `freq` using the "any" method. ### Does this PR introduce a breaking change? Yes, `MissingBase` and all its children have been modified in breaking ways. However, these were not exposed in the public API. The convenience functions should work as they did before. Some users, though, might have implemented custom missing methods. These will break, sorry. I hope the new way makes more sense. ### Other information: I have yet to run `mypy` and tools in the like to see if I really fixed #2000. Also, I'll had some benchmarking to see if my change impacted performance. In preliminary tests, `missing_wmo` ran at least 10x faster on a dataset of 100 years x 50 points. And it had 1000x fewer dask tasks.
- Loading branch information