-
Notifications
You must be signed in to change notification settings - Fork 2
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Multiple operations instead of means #33
Comments
@mkjpryor, I'm pretty sure you saw this coming ... |
I thought about this recently, but the thought disappeared before I did anything with it. For me, Option 1. seems like the most sensible of the easy options. Option 4. doesn't seem like it would be too taxing to implement. |
I would want My use case for this would be that cf-python uses
This is interesting. I'm thinking you mean something like a chaining framework on the storage that always did 1) read the data from disk and then 2) get the results of the following list of operations on the in-memory data. E.g. the client asks for "range" and gets back from |
Thanks Mark, David. ok, I'm implementing Option 1 for now ... clearly option 4 is an extension, and we can consider that in the (possibly near) future ... and yes David, the interpretation you have above is what I had in mind. |
I would argue that the generic operation of masking ie returning reduced data based on a condition, should be an integral part of AS and be performed by the storage unit/its software. Masking reduces data by a lot, it is used very often in the field, and is relatively costly because data needs to be looked at chunk by chunk. I don't know if this was in the initial design, but I reckon it has to be, the sooner the better. For now, I too agree that returning |
@valeriupredoi how would you define the condition? Comparison with a null value, or something more complex? |
We had a bit of a discussion about masking today. The bottom line is that doing anything beyond what we have done for missing data would likely have minimal impact in the use cases we discussed - since the real benefit of all this work will come with high volume data, which will likely be compressed, in which case masking is already efficiently hidden. Unmasking before returning a chunk from storage is not likely to be beneficial in this situation. |
The option for carrying out a series of operations would as @markgoddard suggests, require relatively work in client and servers, but we wont consider it until we have everything else working. |
In working through the implications of implementing means in chunks, it is notable that once missing data is in play, we need to return two numbers from the
reduce_chunk
method: thesum
, and thecount
, because means over chunks will be needed to be weighted by the actual number of values being meaned.There are a number of ways we could implement this:
(X, N)
, whereX
is the expected operation, andN
the number of values contributing(X, N)
when required (e.g. for means) otherwise return(X,None)
or(X,)
X
, except when it needs to be(X,N)
The something else option could be slightly more interesting: do we think it's a smart idea to say we could chain a series of methods and expect a series of results, in a lightweight sort of caching?
Obvious use cases would be:
This could be facilitated by handing not just "a method" but a list of 1.. many methods, and expect back a list of 1..many results.
The text was updated successfully, but these errors were encountered: