Fix garbage collection #1119

ivirshup · 2023-09-04T14:53:40Z

Closes Anndata not properly garbage collected #360
Tests added
Release note added (or unnecessary)

Still needs some tests.

Demo

Set up

import anndata as ad
import numpy as np

ANNDATA_FILENAME = 'test.h5ad'

X = np.ones((10_000, 10_000))
ad.AnnData(
    X,
    layers={"X_again": X}
).write_h5ad(ANNDATA_FILENAME, compression="lzf")
del X

Benchmarking script

import anndata as ad
import numpy as np
import os
import tracemalloc

ANNDATA_FILENAME = 'test.h5ad'
RUNS = 10

def display_top(snapshot, key_type='lineno'):
    snapshot = snapshot.filter_traces((
        tracemalloc.Filter(False, "<frozen importlib._bootstrap>"),
        tracemalloc.Filter(False, "<unknown>"),
    ))
    top_stats = snapshot.statistics(key_type)
    total = sum(stat.size for stat in top_stats)
    return total

def trace_function(func, arg, n):
    total = np.zeros(n)
    data = func(arg).copy()
    tracemalloc.start()
    for i in range(n):
        data = func(arg).copy()
        snapshot = tracemalloc.take_snapshot()
        total[i] = display_top(snapshot)
    tracemalloc.stop()
    return total

def func(pth):
    data = ad.read_h5ad(pth)
    return data[::2].copy()

total = trace_function(func, ANNDATA_FILENAME, RUNS)
total

Memory usage on main:

array([3.20191228e+09, 4.00357741e+09, 4.80520541e+09, 5.60682786e+09,
       4.00358264e+09, 4.80520492e+09, 5.60682752e+09, 6.40845017e+09,
       7.21007283e+09, 8.01169533e+09])

Memory usage on this branch:

array([2.40190353e+09, 2.40195236e+09, 2.40196104e+09, 2.40196308e+09,
       2.40196497e+09, 2.40196565e+09, 2.40196656e+09, 2.40196712e+09,
       2.40196768e+09, 2.40196820e+09])

codecov · 2023-09-04T15:04:22Z

Codecov Report

Attention: Patch coverage is 96.17834% with 6 lines in your changes missing coverage. Please review.

Project coverage is 83.92%. Comparing base (b2bdd7f) to head (e138fc9).
Report is 75 commits behind head on main.

Files with missing lines	Patch %	Lines
src/anndata/_core/aligned_mapping.py	96.22%	4 Missing ⚠️
src/anndata/_core/raw.py	93.75%	1 Missing ⚠️
...anndata/experimental/multi_files/_anncollection.py	0.00%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1119      +/-   ##
==========================================
- Coverage   85.95%   83.92%   -2.04%     
==========================================
  Files          36       36              
  Lines        5747     5823      +76     
==========================================
- Hits         4940     4887      -53     
- Misses        807      936     +129

Flag	Coverage Δ
gpu-tests	`?`

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines	Coverage Δ
src/anndata/_core/anndata.py	`83.63% <100.00%> (-1.21%)`	⬇️
src/anndata/_core/file_backing.py	`91.26% <100.00%> (+1.04%)`	⬆️
src/anndata/_io/write.py	`76.74% <ø> (ø)`
src/anndata/tests/helpers.py	`81.73% <100.00%> (-4.50%)`	⬇️
src/anndata/_core/raw.py	`76.97% <93.75%> (-6.60%)`	⬇️
...anndata/experimental/multi_files/_anncollection.py	`70.68% <0.00%> (ø)`
src/anndata/_core/aligned_mapping.py	`93.36% <96.22%> (+1.46%)`	⬆️

... and 9 files with indirect coverage changes

scverse-benchmark · 2024-06-03T15:00:36Z

Benchmark changes

Change	Before [`b2bdd7f`]	After [`e138fc9`]	Ratio	Benchmark (Parameter)
-	1.31359e+07	1.70902e+06	0.13	anndata.GarbargeCollectionSuite.track_peakmem_garbage_collection
+	1.20863	1.51659	1.25	readwrite.H5ADReadSuite.track_read_full_memratio('pbmc3k')

Comparison: https://github.com/scverse/anndata/compare/b2bdd7f926d54c9ae7a1b56a4e97d37e6e4d1dad..e138fc99fa4d5ab7e9351a45b905e13a4fc2415f
Last changed: Mon, 22 Jul 2024 12:07:58 +0000

More details: https://github.com/scverse/anndata/pull/1119/checks?check_run_id=27745934862

ilan-gold · 2024-06-03T15:14:18Z

Two big outstanding problems:

The following sort of thing no longer works

AnnData.layers

i/o memory has spiked according to benchmarking

src/anndata/_core/anndata.py

…into remove-backrefs

ilan-gold · 2024-06-04T11:45:55Z

Looking at this further: https://github.com/scverse/anndata/pull/1119/checks?check_run_id=25779040354 there is actually no improvement over main. So I am not sure what's up with that.

ilan-gold · 2024-06-05T13:54:10Z

benchmarks/benchmarks/anndata.py

+class GarbargeCollectionSuite:
+    runs = 10
+
+    # https://github.com/pythonprofilers/memory_profiler/issues/402 and other backend does not pick this up


Ok @ivirshup @flying-sheep independent of this issue, were you aware that https://pypi.org/project/memory-profiler/ is a line-by-line profiler? I think this is probably not very good for us, no? When I read get_peakmem (i.e., the function in utils we wrote based on memory_profiler), I would think that it would really track the literal peak memory, and not the peak over individual line operations.

Similarly: https://scverse.zulipchat.com/#narrow/stream/393966-scanpy-anndata-dev/topic/Benchmarking/near/442513634

src/anndata/_core/aligned_mapping.py

ilan-gold · 2024-06-18T09:29:47Z

@flying-sheep I would be ready to go on this

flying-sheep · 2024-06-20T11:56:20Z

I would still like the things in here addressed: #1119 (review)

I fixed everything that I didn’t talk about there, but

I’m still unclear on why the changes help / what they’re supposed to do
The double indirection with its “only pass by reference“ structure seems fragile and possibly too complex for me. If someone can answer 1., the complexity argument could go away, but it’s still fragile.

ilan-gold · 2024-07-02T12:21:45Z

TODO: try using a weakref within the axis arrays

flying-sheep · 2024-07-08T07:54:08Z

Hm, maybe this is just a complex problem and can’t be simplified much more without making other parts more complex.

The issue here is the complexity of the paths that data takes:

when initializing a new AnnData object, AlignedMappingProperty.__set__ creates the AnnData._{name} attribute as a simple dict. __get__ then initializes the ephemeral AlignedMapping & *Actual¹ objects with it.

here my argument applies: shouldn’t there be a source of truth where that dict is the canonical data container and only one reference to it is stored?

when copying an AnnData object, we currently .copy() the ephemeral AlignedMapping & *Actual object, temporarily making it the data holder, and only then we assign the attribute on the new AnnData object in AlignedMappingProperty.__set__, which works, but extends the responsibility for the AxisArrays beyond “ephemeral API wrapper around dict stored in AnnData object.

¹Classes like Layers, … that have a _data attribute they share with an AnnData object.

Co-authored-by: Isaac Virshup <[email protected]>

ivirshup added 2 commits September 4, 2023 15:39

Make creation of aligned mapping lazy

a9a10af

Use weakref for filemanager

ca0759a

ivirshup mentioned this pull request Sep 8, 2023

Fix view behavior for AwkwardArrays #1070

Open

3 tasks

grst mentioned this pull request Oct 23, 2023

Future changes to Awkward Array behavior class resolution #1035

Open

ilan-gold added this to the 0.10.8 milestone Jun 3, 2024

ilan-gold self-assigned this Jun 3, 2024

Merge branch 'main' into remove-backrefs

92615d8

ilan-gold added the skip-gpu-ci label Jun 3, 2024

ilan-gold added 4 commits June 3, 2024 16:24

(chore): add benchmark

cb53b77

(fix): use right gen_adata function

2df03b5

(fix): change name

38d218f

(chore): fewer runs

a9eba7a

ilan-gold added the benchmark label Jun 3, 2024

ilan-gold reviewed Jun 3, 2024

View reviewed changes

src/anndata/_core/anndata.py Outdated Show resolved Hide resolved

ilan-gold added 4 commits June 4, 2024 11:17

(fix): benchmark test name

be8ae55

Merge branch 'main' into remove-backrefs

82fc74c

(fix): return cls for None obj

e0bff0e

Merge branch 'remove-backrefs' of https://github.com/ivirshup/anndata …

ccbdaf3

…into remove-backrefs

ilan-gold added 5 commits June 4, 2024 14:08

(fix): try track_peakmem

6ed9963

(fix): remove track_ name

ce4e2d8

(fix): docs

0c9e563

(fix): do custom peakmem track

c971dfb

(fix): n -> runs

9e08151

ilan-gold force-pushed the remove-backrefs branch from 4cf6ea9 to 9e08151 Compare June 5, 2024 13:53

ilan-gold reviewed Jun 5, 2024

View reviewed changes

(fix): comment

c771079

ilan-gold reviewed Jun 7, 2024

View reviewed changes

src/anndata/_core/aligned_mapping.py Outdated Show resolved Hide resolved

flying-sheep and others added 2 commits June 7, 2024 17:44

Add type hints for properties

f750954

Merge branch 'main' into remove-backrefs

ab4cd79

ilan-gold approved these changes Jun 18, 2024

View reviewed changes

flying-sheep modified the milestones: 0.10.8, 0.10.9 Jun 18, 2024

Merge branch 'main' into pr/ivirshup/1119

cf4377c

ilan-gold assigned flying-sheep Jul 2, 2024

flying-sheep added 6 commits July 2, 2024 15:46

fmt

48ae69d

cleanup

480b0a4

more fmt

24b89a0

dedupe and test

05ebee9

Simplify copy

866abbd

Merge branch 'main' into pr/ivirshup/1119

3595197

flying-sheep added 5 commits July 22, 2024 13:31

docs and typing

015d6ad

fix parent_mapping type

3a1a007

Fix I

73fc94c

fix docs

ba61e2b

fix 3.9

e138fc9

flying-sheep approved these changes Jul 22, 2024

View reviewed changes

flying-sheep merged commit d51f84c into scverse:main Jul 22, 2024
15 of 16 checks passed

lumberbot-app bot added the Still Needs Manual Backport label Jul 22, 2024

flying-sheep pushed a commit that referenced this pull request Jul 22, 2024

Backport PR #1119: Fix garbage collection

da86a60

flying-sheep removed the Still Needs Manual Backport label Jul 22, 2024

scverse deleted a comment from lumberbot-app bot Jul 22, 2024

flying-sheep added a commit that referenced this pull request Jul 22, 2024

Backport PR #1119: Fix garbage collection (#1567)

af919b6

Co-authored-by: Isaac Virshup <[email protected]>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix garbage collection #1119

Fix garbage collection #1119

ivirshup commented Sep 4, 2023 •

edited by ilan-gold

Loading

codecov bot commented Sep 4, 2023 •

edited

Loading

scverse-benchmark bot commented Jun 3, 2024 •

edited

Loading

ilan-gold commented Jun 3, 2024

ilan-gold commented Jun 4, 2024

ilan-gold Jun 5, 2024

ilan-gold Jun 5, 2024

ilan-gold commented Jun 18, 2024

flying-sheep commented Jun 20, 2024

ilan-gold commented Jul 2, 2024

flying-sheep commented Jul 8, 2024

Fix garbage collection #1119

Fix garbage collection #1119

Conversation

ivirshup commented Sep 4, 2023 • edited by ilan-gold Loading

Demo

codecov bot commented Sep 4, 2023 • edited Loading

Codecov Report

scverse-benchmark bot commented Jun 3, 2024 • edited Loading

Benchmark changes

ilan-gold commented Jun 3, 2024

ilan-gold commented Jun 4, 2024

ilan-gold Jun 5, 2024

Choose a reason for hiding this comment

ilan-gold Jun 5, 2024

Choose a reason for hiding this comment

ilan-gold commented Jun 18, 2024

flying-sheep commented Jun 20, 2024

ilan-gold commented Jul 2, 2024

flying-sheep commented Jul 8, 2024

ivirshup commented Sep 4, 2023 •

edited by ilan-gold

Loading

codecov bot commented Sep 4, 2023 •

edited

Loading

scverse-benchmark bot commented Jun 3, 2024 •

edited

Loading