-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix view behavior for AwkwardArrays #1070
base: main
Are you sure you want to change the base?
Conversation
grst
commented
Jul 24, 2023
•
edited by flying-sheep
Loading
edited by flying-sheep
- Closes Future changes to Awkward Array behavior class resolution #1035
- Follow up of Fix for awkward 2.3 #1040
- Release note added (or unnecessary)
I think this does the trick |
Codecov Report
Additional details and impacted files@@ Coverage Diff @@
## main #1070 +/- ##
==========================================
- Coverage 84.75% 82.56% -2.20%
==========================================
Files 36 36
Lines 5149 5132 -17
==========================================
- Hits 4364 4237 -127
- Misses 785 895 +110
Flags with carried forward coverage won't be shown. Click here to find out more.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn’t this have some tests?
This is already covered by existing tests in |
Co-authored-by: Angus Hollands <[email protected]>
anndata/_core/views.py
Outdated
array = ak.with_parameter(self, _PARAM_NAME, None) | ||
array = ak.with_parameter(array, "__list__", None) | ||
array = ak.with_parameter(array, _PARAM_NAME, None) | ||
array = ak.with_parameter(array, "__record__", None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should use with_name
, so that we can traverse into the layout to find the record-node
array = ak.with_parameter(array, "__record__", None) | |
array = ak.with_name(array, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I imagine that after our dicsussion @grst could not get that working (see #1040 (comment))
Is there a reason traversing is better than what we have (assuming @grst could not get the with_name
working)? We only ever deal with this at a "top level."
@grst I get the sense that it might be helpful to have a meeting to discuss this. Would you be able to find time for a zoom at some point? |
Agreed, I sent an email with a poll to find a timeslot. |
@ivirshup, @ilan-gold, @agoose77, ready for re-review. Feels good to get rid of all that code 💥 Currently the tests fail for |
@grst it is possible to modify the original array if you don't perform any row-based slices, i.e I can't recall to what extent we touched on this on the call. If the process of pulling out the Awkward Array always returns a shallow copy, this wouldn't be a problem. |
Hi @agoose77, returning the shallow copy (or slice) works as expected. The remaining problem is that there's no handler implemented for the |
Is that blocking for this PR, or tangential? I'd like to make a release candidate by the end of the week, so would like to merge soon if possible. |
If you don't mind, I can mark the respective tests as
|
Which tests are these? |
So just to clarify what this PR does. For this block: import anndata as ad, awkward as ak, numpy as np
a = ad.AnnData(
np.ones((3, 3)),
obsm={"awk": ak.Array([{"a": 1}, {"a": 2}, {"a": 3}])}
)
v = a[:2]
v.obsm["awk"]["b"] = [5, 6]
display(v)
display(v.obsm["awk"]) On main:
On this PR:
This is the desired behavior? Is |
The test didn't catch that when modifying an awkward array in a AnnDataView, the changes did not persist within that view. The issue was that I had been retreiving the awkward array from the view once, and then did tests on it. Instead, to properly test this, I need to retreive the Awkward array from the AnnData view every time. It turns out doing so gives me a fresh copy of the origina awkward array and modifications only affect that copy - not the original data.
I solved the issue described in the previous commit by adding logic to `AlignedViewMixin` that can copy objects of a certain type (for now only awkward arrays) upon view creation.
Thanks for catching this! I thought my tests cover this, but it turns out I retrieved the awkward array from the view only once and did all tests on that copy. Instead, retreiving To solve this, it is necessary to perform a shallow copy of the awkward array on view creation (which is a cheap operation). I added this behavior here: Maybe this could also be useful for pandas data frames with copy-on-write behavior in the future? |
I think caching the "view"s could be a good idea. I may need to think for a bit on how this effects my current thoughts on how to fix the garbage collection problem: Currently, this won't work with the solution proposed there, since the AlignedMappings are created on the fly to prevent circular references from being formed. |