-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Accounting for missing values in active storage operations. #18
Comments
Oh bugger. I had forgotten about all the edge cases ... I don't think DAP has to handle this does it? Insofar as they have the NetCDF file itself and the NetCDF semantics available server-side ... the active storage will not. |
This could be an 80/20 situation:
In the situation where we can't handle it, we default to normal storage operations of course ... |
We should at least force "normal" operations for now, if any of these are present in metadata. |
Sounds like a good way forward. CMIP6 metadata mandates that you should use both Looking further ahead, providing a single number to the storage is probably no harder than providing "a few" but, as you say, no need to worry about that at this moment. |
I suggest we make a few dummy files by extending |
Very good question, David! I think the missing data value (wheter it be |
In this case, we process on the client, surely, as netCDF4-python deals with all cases. |
Yes, we need to process on the client in all cases where the server can't handle it directly ... |
What about error handling? What if a chunk is all missing? I think the right answer would be to return a missing value, and that has to be handled above. |
(Edited - sent prematurely) I think that makes sense, as that also handles the case that all chunks are missing, for which the reduced answer is the mdi. That implies that the methods (like |
After today's conversation, we decided a reasonable option to avoid a potentially infinite length vector of "missing values", would be to support up to four numbers of missing information: valid_min, valid_max, missing_value, and _FillValue. If there was a vector of missing numbers in play, we'd simply default to "non-computational" storage. |
@valeriupredoi Can you please look and see if we have access to those missing value attributes in the zarr dataset object itself? (i.e. will it be easy for us to pass them to |
they are inside the bellows - see eg here but accessing and manipulating them from the API is a different dish of curry. I will investigate in more detail next week, ESMValX-releases permitting 👍 |
Argh, the interpretation for |
No answers yet, just an statement of need.
Missing values need to be accounted for during active operations. For instance, a land-surface temperature minimum needs to ignore a
missing_value
of-1e20
over the oceans. Therefore the missing values (of which there can be 0 to many) need to be passed to the active storage, similarly to how the data type needs to be passed.Things get complicated because there are many different ways of specifying missing values (https://docs.unidata.ucar.edu/nug/current/attribute_conventions.html), some of which are not simple numbers:
_FillValue
missing_value
(which may be a scalar or vector)valid_min
number, or the first of thevalid_range
numbersvalid_max
number, or the second of thevalid_range
numbersAll of these methods are used in the wild.
The fixed missing values are typically floats which need to match exactly with values in the data, so a string decimal representation created by the client might not convert back to the exact binary representation on the storage. Does DAP deal with this, I wonder?
The text was updated successfully, but these errors were encountered: