bucket resampler get_max, get_min are not optimally parallelised #432
Labels
enhancement
performance
improves speed or decreases memory consumption, but does not otherwise change functionality
Despite impressive improvements in #368, there remains room to improve the parallelisation of the
BucketResampler
methodsget_max
andget_min
. Much of the method is spent in the@dask.delayed
-function_get_statistics
, which is not parallelised, even though it could be in principle.Perhaps the
resample_blocks
function (to be) introduced in #341 could be of help.Code Sample, a minimal, complete, and verifiable piece of code
Problem description
The bucket resampler methods
get_max
andget_min
are spending most of the wall clock time in unparallelised code. This means the run takes longer than it needs to.Expected Output
I expect a dask visualisation that illustrates that 800% CPU is used 100% of the time.
Actual Result, Traceback if applicable
In reality, 800% CPU is used less than 40% of the time. More than 60% of the time is spent in the task
_get_statistics
, which is exactly the@dask.delayed
-decorated function used to calculate the maximum:Versions of Python, package at hand and relevant dependencies
pyresample main: v1.23.0-46-g0cb8914
The text was updated successfully, but these errors were encountered: