-
Notifications
You must be signed in to change notification settings - Fork 283
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify meta in dask.array.map_blocks #5989
base: main
Are you sure you want to change the base?
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
Additional details and impacted files@@ Coverage Diff @@
## main #5989 +/- ##
=======================================
Coverage 89.82% 89.82%
=======================================
Files 88 88
Lines 23146 23155 +9
Branches 5043 5045 +2
=======================================
+ Hits 20790 20799 +9
Misses 1624 1624
Partials 732 732 ☔ View full report in Codecov by Sentry. |
Hmm linkcheck fail is this one... The website really is down right now :-( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does seem basically sound, but I have a couple of worries to raise.
We can go with this if you don't like the suggestions.
lib/iris/_lazy_data.py
Outdated
# Assume operation does not change dtype and meta if not specified. | ||
if "meta" not in kwargs: | ||
kwargs["meta"] = da.utils.meta_from_array(data) | ||
if "dtype" in kwargs: | ||
kwargs["meta"] = kwargs["meta"].astype(kwargs["dtype"]) | ||
else: | ||
kwargs["dtype"] = kwargs["meta"].dtype | ||
|
||
result = data.map_blocks(func, *args, chunks=out_chunks, **kwargs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm finding this special handling of "dtype" and/or "meta" rather confusing.
And I note that it is currently only exercised in tests, and nowhere in the actual code.
So I'm wondering if we can do without ??
If not, maybe they need documenting as a special case, or even listing as explicit keys : in the current code, it is rather confusing that these keywords, which specifically control map_blocks
, are part of kwargs but will not be passed down to func
like the docstring says they are (!).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was concerned about the assumption that the output dtype
is always the same as the input dtype
, that's why I thought it might be good to be able to specify this. I checked the dtype
s coming out of the functions that use map_complete_blocks
and tried to specify those in c1e5e1a. Does it look better now?
lib/iris/_lazy_data.py
Outdated
@@ -536,9 +536,12 @@ def lazy_elementwise(lazy_array, elementwise_op): | |||
# This makes good practical sense for unit conversions, as a Unit.convert | |||
# call may cast to float, or not, depending on unit equality : Thus, it's | |||
# much safer to get udunits to decide that for us. | |||
dtype = elementwise_op(np.zeros(1, lazy_array.dtype)).dtype | |||
meta = da.utils.meta_from_array(lazy_array) | |||
new_meta = elementwise_op(meta) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This now requires elementwise_op
to work on an empty-shaped array, which was not the case before, and may not work for all array operations.
So I think that passing a minimal array of the correct dtype, as before, would be "safer".
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I restored the previous behaviour in c1e5e1a
@pp-mo This is ready for another review. The readthedocs build appears to be failing because it tries to download shapefiles from a server that is offline. |
🚀 Pull Request
Description
Specify meta in
dask.array.map_blocks
to avoid having the wrong chunktype in the output array.Closes #5988
Consult Iris pull request check list
Add any of the below labels to trigger actions on this PR: