-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Future changes to Awkward Array behavior class resolution #1035
Comments
I took another look at anndata/anndata/_core/views.py Lines 202 to 257 in b2965fc
It seems like the point of this array class is to intercept |
Tagging @ivirshup for visibility :) |
Hey, sorry about the delayed response.
Yes.
Yes, but... It's not just copy-on-write for the cc: @grst |
@ivirshup Is this required? Would it break your invariants if accessing an |
The overall idea is that subsetting an anndata does not actually subset all of its contents until you need them. E.g. subsetting is lazy. However, we track this at the Since we don't want to write back updates, we just want it to act like you took a subset, we have copy on write behavior for each element of the
How would you set any values on an awkward array? Or do you mean only when it's an anndata style view? |
I can see that this is confusing! I'll reply exclusively on #1040 from now on. |
Going to keep this open until we are happy with a solution. Current status summarized here: #1040 (comment) |
I met with @jpivarski and @agoose77 yesterday. We came to the conclusion that we might not need any copy-on-write behavior for views of awkward arrays, because a slice of an awkward array is always a (shallow) copy anyway. There's therefore no risk of modifying the original awkward array when setting a record on an awkward array within an Only downside is that AnnDataView behaves differently for awkward arrays than for other backends. Updating the awkward array will not trigger an In any case I'll first come up with a bunch of test cases to make sure everything works as expected. |
This issue has been automatically marked as stale because it has not had recent activity. |
This is still an issue. The proposed fix is in #1070, but pending some more fundamental decision on how to cache views. |
@agoose77, it is still the case and this is what is proposed in #1070. However, @ivirshup pointed out in #1070 (comment) that when creating the view only when accessing the data, it is impossible to assign any data to a record, e.g. using v.obsm["awk"]["b"] = [5, 6] because a new shallow copy will be created every time This is why in the most recent version I suggested to create the shallow copy already during creating of the AnnDataView, i.e. on v = a[:2] in the same example, which Isaac referred to as "caching" the view. But apparently this might conflict with the proposed fix for an issue with garbage collection (#1119). In any case, I don't think we need anything from the awkward side, but just need to find a way how to reconcile cached views with garbage collection. |
This issue has been automatically marked as stale because it has not had recent activity. |
This is not an existing bug, but I'm raising it now so that we're on the same page.
Over at Awkward Array, we've come to the conclusion that the
__array__
mechanism, used to facilitate categorical and string arrays, should be a built-in feature, rather than implemented using Awkward behaviors. We were planning upon introducing a new parameter that mirrors__record__
to indicate the nominal type of an array. This change would mean__array__
no longer affects the loaded class (type(arr)
) of an Awkward Array. Anndata uses the__array__
parameter to wrap Awkward Arrays in custom view wrappers.The mechanism that we were originally thinking of was a new parameter
__list__
that can only attach to list-types (arrays with >1 dimension). Clearly, this is less general than the existing functionality, so I was wondering if you'd be able to clarify the kinds of Awkward Arrays that anndata needs to support.Before we get into the details, the most important point is that we do not want to make existing use cases impossible. So, I'm hoping that we can figure out what anndata needs, and rework our proposal.
When we were conceiving of these changes, we did not have any examples of custom arrays in the forefront of our minds. We did know of the use of
__array__
in anndata, but it certainly slipped my mind during these discussions. Pinging @jpivarski for visiblity.The text was updated successfully, but these errors were encountered: