Future changes to Awkward Array behavior class resolution #1035

agoose77 · 2023-06-28T22:34:25Z

This is not an existing bug, but I'm raising it now so that we're on the same page.

Over at Awkward Array, we've come to the conclusion that the __array__ mechanism, used to facilitate categorical and string arrays, should be a built-in feature, rather than implemented using Awkward behaviors. We were planning upon introducing a new parameter that mirrors __record__ to indicate the nominal type of an array. This change would mean __array__ no longer affects the loaded class (type(arr)) of an Awkward Array. Anndata uses the __array__ parameter to wrap Awkward Arrays in custom view wrappers.

The mechanism that we were originally thinking of was a new parameter __list__ that can only attach to list-types (arrays with >1 dimension). Clearly, this is less general than the existing functionality, so I was wondering if you'd be able to clarify the kinds of Awkward Arrays that anndata needs to support.

Before we get into the details, the most important point is that we do not want to make existing use cases impossible. So, I'm hoping that we can figure out what anndata needs, and rework our proposal.

When we were conceiving of these changes, we did not have any examples of custom arrays in the forefront of our minds. We did know of the use of __array__ in anndata, but it certainly slipped my mind during these discussions. Pinging @jpivarski for visiblity.

The text was updated successfully, but these errors were encountered:

agoose77 · 2023-06-29T09:31:31Z

I took another look at

anndata/anndata/_core/views.py

Lines 202 to 257 in b2965fc

    
           try: 
        
               from ..compat import awkward as ak 
        
               import weakref 
        
               # Registry to store weak references from AwkwardArrayViews to their parent AnnData container 
        
               _registry = weakref.WeakValueDictionary() 
        
               _PARAM_NAME = "_view_args" 
        
               class AwkwardArrayView(_ViewMixin, AwkArray): 
        
                   @property 
        
                   def _view_args(self): 
        
                       """Override _view_args to retrieve the values from awkward arrays parameters. 
        
                       Awkward arrays cannot be subclassed like other python objects. Instead subclasses need 
        
                       to be attached as "behavior". These "behaviors" cannot take any additional parameters (as we do 
        
                       for other data types to store `_view_args`). Therefore, we need to store `_view_args` using awkward's 
        
                       parameter mechanism. These parameters need to be json-serializable, which is why we can't store 
        
                       ElementRef directly, but need to replace the reference to the parent AnnDataView container with a weak 
        
                       reference. 
        
                       """ 
        
                       parent_key, attrname, keys = self.layout.parameter(_PARAM_NAME) 
        
                       parent = _registry[parent_key] 
        
                       return ElementRef(parent, attrname, keys) 
        
                   def __copy__(self) -> AwkArray: 
        
                       """ 
        
                       Turn the AwkwardArrayView into an actual AwkwardArray with no special behavior. 
        
                       Need to override __copy__ instead of `.copy()` as awkward arrays don't implement `.copy()` 
        
                       and are copied using python's standard copy mechanism in `aligned_mapping.py`. 
        
                       """ 
        
                       array = self 
        
                       # makes a shallow copy and removes the reference to the original AnnData object 
        
                       array = ak.with_parameter(self, _PARAM_NAME, None) 
        
                       array = ak.with_parameter(array, "__array__", None) 
        
                       return array 
        
               @as_view.register(AwkArray) 
        
               def as_view_awkarray(array, view_args): 
        
                   parent, attrname, keys = view_args 
        
                   parent_key = f"target-{id(parent)}" 
        
                   _registry[parent_key] = parent 
        
                   # TODO: See https://github.com/scverse/anndata/pull/647#discussion_r963494798_ for more details and 
        
                   # possible strategies to stack behaviors. 
        
                   # A better solution might be based on xarray-style "attrs", once this is implemented 
        
                   # https://github.com/scikit-hep/awkward/issues/1391#issuecomment-1412297114 
        
                   if type(array).__name__ != "Array": 
        
                       raise NotImplementedError( 
        
                           "Cannot create a view of an awkward array with __array__ parameter. " 
        
                           "Please open an issue in the AnnData repo and describe your use-case." 
        
                       ) 
        
                   array = ak.with_parameter(array, _PARAM_NAME, (parent_key, attrname, keys)) 
        
                   array = ak.with_parameter(array, "__array__", "AwkwardArrayView") 
        
                   return array 
        
               ak.behavior["AwkwardArrayView"] = AwkwardArrayView

It seems like the point of this array class is to intercept __setitem__ for copy-on-write semantics. Could you clarify whether my understanding is correct. If so, then we don't need the array class at all? Awkward Arrays are (mostly) immutable, as the structure of the array is stored in a separate object to the high-level interface. I'm thinking that we just need to ensure that any view produces a new outer ak.Array via ak.Array(array).

agoose77 · 2023-06-29T09:31:44Z

Tagging @ivirshup for visibility :)

ivirshup · 2023-07-06T13:12:45Z

Hey, sorry about the delayed response.

It seems like the point of this array class is to intercept setitem for copy-on-write semantics.

Yes.

Awkward Arrays are (mostly) immutable, as the structure of the array is stored in a separate object to the high-level interface. I'm thinking that we just need to ensure that any view produces a new outer ak.Array via ak.Array(array).

Yes, but... It's not just copy-on-write for the ak.Array, it's also copy on write for the parent AnnData object.

cc: @grst

ivirshup · 2023-07-06T13:38:08Z

Also @grst, for some context I believe the change mentioned here happened in awkward 2.3 and is causing test failures. I think the fix is super easy, but would appreciate your eyes on it (#1040).

@agoose77 could you also take a look at the linked PR?

agoose77 · 2023-07-06T14:05:26Z

Yes, but... It's not just copy-on-write for the ak.Array, it's also copy on write for the parent AnnData object.

@ivirshup Is this required? Would it break your invariants if accessing an ak.Array always returned a shallow copy?

ivirshup · 2023-07-06T14:44:09Z

The overall idea is that subsetting an anndata does not actually subset all of its contents until you need them. E.g. subsetting is lazy. However, we track this at the AnnData object level, not at each element level.

Since we don't want to write back updates, we just want it to act like you took a subset, we have copy on write behavior for each element of the AnnData. Since this is just tracked at the AnnData level, making one element "actual" means we have to make all elements actual. So we need to have a way for the parent anndata to know a modification has been made to a child element – hence our view classes.

Is this required? Would it break your invariants if accessing an ak.Array always returned a shallow copy?

How would you set any values on an awkward array? Or do you mean only when it's an anndata style view?

agoose77 · 2023-07-06T14:45:34Z

I can see that this is confusing! I'll reply exclusively on #1040 from now on.

ivirshup · 2023-07-24T13:13:47Z

Going to keep this open until we are happy with a solution. Current status summarized here: #1040 (comment)

grst · 2023-08-22T07:23:30Z

I met with @jpivarski and @agoose77 yesterday.

We came to the conclusion that we might not need any copy-on-write behavior for views of awkward arrays, because a slice of an awkward array is always a (shallow) copy anyway. There's therefore no risk of modifying the original awkward array when setting a record on an awkward array within an AnnDataView. We could therefore get rid of all the custom code around awkward array views which is, ultimately, a hack that abuses custom behaviors for something they were not designed for.

Only downside is that AnnDataView behaves differently for awkward arrays than for other backends. Updating the awkward array will not trigger an init_as_actual.

In any case I'll first come up with a bunch of test cases to make sure everything works as expected.

github-actions · 2023-10-23T02:05:49Z

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

grst · 2023-10-23T06:25:09Z

This is still an issue. The proposed fix is in #1070, but pending some more fundamental decision on how to cache views.

agoose77 · 2023-10-23T09:08:22Z

@grst I'm a little lost with where things are! My understanding from #1035 is that we don't need to create array classes, and instead can simply shallow copy arrays where appropriate. Is that no longer the case?

grst · 2023-10-23T13:48:03Z

@agoose77, it is still the case and this is what is proposed in #1070.

However, @ivirshup pointed out in #1070 (comment) that when creating the view only when accessing the data, it is impossible to assign any data to a record, e.g. using

v.obsm["awk"]["b"] = [5, 6]

because a new shallow copy will be created every time v.obsm["awk"] is accessed.

This is why in the most recent version I suggested to create the shallow copy already during creating of the AnnDataView, i.e. on

v = a[:2]

in the same example, which Isaac referred to as "caching" the view.

But apparently this might conflict with the proposed fix for an issue with garbage collection (#1119).

In any case, I don't think we need anything from the awkward side, but just need to find a way how to reconcile cached views with garbage collection.

github-actions · 2023-12-25T02:10:04Z

This issue has been automatically marked as stale because it has not had recent activity.
Please add a comment if you want to keep the issue open. Thank you for your contributions!

agoose77 added the Bug 🐛 label Jun 28, 2023

ivirshup added the type: awkward array 😐 label Jul 6, 2023

ivirshup mentioned this issue Jul 6, 2023

Fix for awkward 2.3 #1040

Merged

5 tasks

ivirshup added the topic: views 👀 label Jul 6, 2023

ivirshup closed this as completed in #1040 Jul 24, 2023

ivirshup reopened this Jul 24, 2023

grst linked a pull request Jul 24, 2023 that will close this issue

Fix view behavior for AwkwardArrays #1070

Open

3 tasks

grst self-assigned this Aug 22, 2023

github-actions bot added the stale label Oct 23, 2023

grst removed the stale label Oct 23, 2023

github-actions bot added the stale label Dec 25, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Future changes to Awkward Array behavior class resolution #1035

Future changes to Awkward Array behavior class resolution #1035

agoose77 commented Jun 28, 2023 •

edited

Loading

agoose77 commented Jun 29, 2023

agoose77 commented Jun 29, 2023

ivirshup commented Jul 6, 2023 •

edited

Loading

ivirshup commented Jul 6, 2023 •

edited

Loading

agoose77 commented Jul 6, 2023 •

edited

Loading

ivirshup commented Jul 6, 2023

agoose77 commented Jul 6, 2023

ivirshup commented Jul 24, 2023

grst commented Aug 22, 2023

github-actions bot commented Oct 23, 2023

grst commented Oct 23, 2023

agoose77 commented Oct 23, 2023

grst commented Oct 23, 2023

github-actions bot commented Dec 25, 2023

Future changes to Awkward Array behavior class resolution #1035

Future changes to Awkward Array behavior class resolution #1035

Comments

agoose77 commented Jun 28, 2023 • edited Loading

agoose77 commented Jun 29, 2023

agoose77 commented Jun 29, 2023

ivirshup commented Jul 6, 2023 • edited Loading

ivirshup commented Jul 6, 2023 • edited Loading

agoose77 commented Jul 6, 2023 • edited Loading

ivirshup commented Jul 6, 2023

agoose77 commented Jul 6, 2023

ivirshup commented Jul 24, 2023

grst commented Aug 22, 2023

github-actions bot commented Oct 23, 2023

grst commented Oct 23, 2023

agoose77 commented Oct 23, 2023

grst commented Oct 23, 2023

github-actions bot commented Dec 25, 2023

agoose77 commented Jun 28, 2023 •

edited

Loading

ivirshup commented Jul 6, 2023 •

edited

Loading

ivirshup commented Jul 6, 2023 •

edited

Loading

agoose77 commented Jul 6, 2023 •

edited

Loading