Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

(feat): read_elem_as_dask method #1469

Merged
merged 160 commits into from
Jul 23, 2024
Merged
Show file tree
Hide file tree
Changes from 13 commits
Commits
Show all changes
160 commits
Select commit Hold shift + click to select a range
d111f04
(feat): `read_elem_lazy` method
ilan-gold Apr 11, 2024
00be7f0
(revert): error message
ilan-gold Apr 11, 2024
fd635d7
(refactor): declare `is_csc` reading elem directly in h5
ilan-gold Apr 11, 2024
f5e7fda
(chore): `read_elem_lazy` -> `read_elem_as_dask`
ilan-gold Apr 12, 2024
ae5396c
(chore): remove string handling
ilan-gold Apr 12, 2024
664336a
(refactor): use `elem` for h5 where posssble
ilan-gold Apr 12, 2024
2370215
Merge branch 'main' into ig/read_dask_elem
ilan-gold Apr 17, 2024
52002b6
(chore): remove invlaud syntax
ilan-gold Apr 17, 2024
5ab1ad1
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Apr 17, 2024
aa1006e
(fix): put dask import inside function
ilan-gold Apr 17, 2024
dda7d83
(refactor): try maybe open?
ilan-gold Apr 17, 2024
fd418f0
Merge branch 'main' into ig/read_dask_elem
ilan-gold May 27, 2024
23b0bfd
Merge branch 'main' into ig/read_dask_elem
ilan-gold May 27, 2024
97b8031
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jun 3, 2024
1fc4cc3
(fix): revert `encoding-version`
ilan-gold Jun 3, 2024
5ca71ea
(chore): document `create_sparse_store` test function
ilan-gold Jun 3, 2024
3672c18
(chore): sort indices to prevent warning
ilan-gold Jun 3, 2024
33c3599
(fix): remove utility function `make_dask_array`
ilan-gold Jun 3, 2024
157e710
(chore): `read_sparse_as_dask_h5` -> `read_sparse_as_dask`
ilan-gold Jun 3, 2024
375000d
(feat): make params of `h5_chunks` and `stride`
ilan-gold Jun 3, 2024
241904a
(chore): add distributed test
ilan-gold Jun 3, 2024
42d0d22
(fix): `TypeVar` bind
ilan-gold Jun 3, 2024
0bba2c0
(chore): release note
ilan-gold Jun 4, 2024
0d0b43a
(chore): `0.10.8` -> `0.11.0`
ilan-gold Jun 5, 2024
762d4c6
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jun 26, 2024
c935fe0
(fix): `ruff` for default `pytest.fixture` `scope`
ilan-gold Jun 26, 2024
23e0ea2
Apply suggestions from code review
ilan-gold Jul 1, 2024
5b96c77
(fix): `Any` to `DaskArray`
ilan-gold Jul 1, 2024
0907a4e
(fix): type `make_index` + fix undeclared
ilan-gold Jul 1, 2024
20ced16
[pre-commit.ci] auto fixes from pre-commit.com hooks
pre-commit-ci[bot] Jul 1, 2024
36ae8f2
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 1, 2024
bb6607e
fix rest
flying-sheep Jul 1, 2024
419691b
(fix): use `chunks` kwarg
ilan-gold Jul 2, 2024
a23df34
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 2, 2024
fd2376a
(feat): expose `chunks` as an option to `read_elem_as_dask` via `data…
ilan-gold Jul 2, 2024
ae723d0
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Jul 2, 2024
42b1093
(fix): `test_read_dispatched_null_case` test
ilan-gold Jul 2, 2024
78de057
(fix): disallowed spread syntax?
ilan-gold Jul 2, 2024
717b997
(refactor): reuse `compute_chunk_layout_for_axis_shape` functionality
ilan-gold Jul 2, 2024
2b86293
(fix): remove unneeded `slice` arguments
ilan-gold Jul 3, 2024
8d5a9df
(fix): revert message
ilan-gold Jul 3, 2024
449fc1a
(refactor): `make_index` -> `make_block_indexer`
ilan-gold Jul 3, 2024
1522de3
(fix): export from `experimental`
ilan-gold Jul 3, 2024
71c150d
(fix): `callback` signature for `test_read_dispatched_null_case
ilan-gold Jul 3, 2024
b441366
(chore): `get_elem_name` helper
ilan-gold Jul 3, 2024
0307a1d
(chore): use `H5Group` consistently
ilan-gold Jul 3, 2024
ee075cd
(refactor): make `chunks` public facing API instead of `dataset_kwargs`
ilan-gold Jul 3, 2024
89acec4
(fix): regsiter for group not array
ilan-gold Jul 3, 2024
48b7630
(chore): add warning test
ilan-gold Jul 3, 2024
8712582
(chore): make arg order consistent
ilan-gold Jul 3, 2024
cda8aa7
(feat): add `callback` typing for `read_dispatched`
ilan-gold Jul 5, 2024
e8f62f4
(chore): use `npt.NDArray`
ilan-gold Jul 5, 2024
f6e48ac
(fix): remove uneceesary union
ilan-gold Jul 5, 2024
4de3246
(chore): release note
ilan-gold Jul 5, 2024
ba817e0
(fix); try protocol docs
ilan-gold Jul 5, 2024
438d28d
(feat): create `InMemoryElem` + `DictElemType` to remove `Any`
ilan-gold Jul 5, 2024
296ea3f
(chore): refactor `DictElemType` -> `InMemoryArrayOrScalarType` for r…
ilan-gold Jul 5, 2024
cf13a57
(fix): use `Union`
ilan-gold Jul 5, 2024
d02ba49
(fix): more `Union`
ilan-gold Jul 5, 2024
6970a97
(refactor): `InMemoryElem` -> `InMemoryReadElem`
ilan-gold Jul 5, 2024
2282351
(chore): add needed types to public export + docs fix
ilan-gold Jul 5, 2024
810cd0a
Merge branch 'main' into ig/read_dask_elem
flying-sheep Jul 8, 2024
a996081
(chore): type `write_elem` functions
ilan-gold Jul 8, 2024
f6e457b
(chore): create `write_callback` protocol
ilan-gold Jul 8, 2024
a0b4057
Merge branch 'main' into ig/protocol_for_callback
ilan-gold Jul 8, 2024
4416526
(chore): export + docs
ilan-gold Jul 8, 2024
fbe44f0
(fix): add string descriptions
ilan-gold Jul 8, 2024
8c1f01d
(fix): try sphinx protocol doc
ilan-gold Jul 8, 2024
a7d412a
(fix): try ignoring exports
ilan-gold Jul 8, 2024
4d56396
(fix): remap callback internal usages
ilan-gold Jul 8, 2024
2012ee5
(fix): add docstring
ilan-gold Jul 8, 2024
f65f065
Discard changes to pyproject.toml
flying-sheep Jul 9, 2024
8f6ea49
re-add dep
flying-sheep Jul 9, 2024
155a21e
Fix docs
flying-sheep Jul 9, 2024
daae3e5
Almost works
flying-sheep Jul 9, 2024
c415ae4
works!
flying-sheep Jul 9, 2024
00010b8
(chore): use pascal-case
ilan-gold Jul 9, 2024
0bd87fc
(feat): type read/write funcs in callback
ilan-gold Jul 9, 2024
5997678
(fix): use generic for `Read` as well.
ilan-gold Jul 9, 2024
f208332
(fix): need more aliases
ilan-gold Jul 9, 2024
eb69fcb
Split table, format
flying-sheep Jul 9, 2024
477bbef
(refactor): move to `_types` file
ilan-gold Jul 9, 2024
103cad6
Merge branch 'ig/protocol_for_callback' of github.com:scverse/anndata…
ilan-gold Jul 9, 2024
8d23f6f
bump scanpydoc
flying-sheep Jul 9, 2024
9b647c2
Some basic syntax fixes
flying-sheep Jul 9, 2024
d6d01bc
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 9, 2024
5ef93e1
(fix): change `Read{Callback}` type for kwargs
ilan-gold Jul 9, 2024
9cfe908
(chore): test `chunks `argument
ilan-gold Jul 9, 2024
99fc6db
(fix): type `read_recarray`
ilan-gold Jul 9, 2024
b5bccc3
(fix): `GroupyStorageType` not `StorageType`
ilan-gold Jul 9, 2024
e5ea2b0
(fix): little type fixes
ilan-gold Jul 9, 2024
6ac72d6
(fix): clarify `H5File` typing
ilan-gold Jul 9, 2024
989dc65
(fix): dask doc
ilan-gold Jul 9, 2024
36b0207
(fix): dask docs
ilan-gold Jul 9, 2024
dadfb4d
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 9, 2024
ca6cf66
(fix): typing
ilan-gold Jul 9, 2024
eabaf35
(fix): handle case when `chunks` is `None`
ilan-gold Jul 9, 2024
4c398c3
(feat): add string-array reading
ilan-gold Jul 9, 2024
d6fc8a4
(fix): remove `string-array` because it is not tested
ilan-gold Jul 9, 2024
33aebb2
(refactor): clean up tests
ilan-gold Jul 10, 2024
701cd85
(fix): overfetching problem
ilan-gold Jul 10, 2024
43b21a2
Fix circular import
flying-sheep Jul 11, 2024
0e22449
add some typing
flying-sheep Jul 11, 2024
ec546f4
fix mapping types
flying-sheep Jul 11, 2024
7c2e4da
Fix Read/Write
flying-sheep Jul 11, 2024
1ba5b99
Fix one more
flying-sheep Jul 11, 2024
49c0d49
unify names
flying-sheep Jul 11, 2024
3666735
claift ReadCallback signature
flying-sheep Jul 11, 2024
3a332ad
Fix type aliases
flying-sheep Jul 11, 2024
d0f4d13
(fix): clean up typing to use `RWAble`
ilan-gold Jul 11, 2024
6e89e14
Merge branch 'main' into ig/protocol_for_callback
ilan-gold Jul 11, 2024
ea29cfa
(fix): use `Union`
ilan-gold Jul 11, 2024
f4ff236
(fix): add qualname override
ilan-gold Jul 11, 2024
f50b286
(fix): ignore dask and masked array
ilan-gold Jul 11, 2024
712e085
(fix): ignore erroneous class warning
ilan-gold Jul 11, 2024
24dd18b
(fix): upgrade `scanpydoc`
ilan-gold Jul 11, 2024
79d3fdc
(fix): use `MutableMapping` instead of `dict` due to broken docstring
ilan-gold Jul 11, 2024
9a2be00
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 11, 2024
d3bcddf
Add data docs
flying-sheep Jul 11, 2024
84fdc96
Revert "(fix): use `MutableMapping` instead of `dict` due to broken d…
flying-sheep Jul 11, 2024
2608bc3
(fix): add clarification
ilan-gold Jul 11, 2024
e551e18
Simplify
flying-sheep Jul 11, 2024
13e3bb1
Merge branch 'ig/protocol_for_callback' into ig/read_dask_elem
ilan-gold Jul 11, 2024
2935e45
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 11, 2024
bf0be15
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Jul 11, 2024
9d37fc8
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 12, 2024
1ffe43e
(fix): remove double `dask` intersphinx
ilan-gold Jul 12, 2024
f9df5bc
(fix): remove `_types.DaskArray` from type checking block
ilan-gold Jul 12, 2024
a85da39
(refactor): use `block_info` for resolving fetch location
ilan-gold Jul 15, 2024
3bef77c
Merge branch 'ig/read_dask_elem' of github.com:scverse/anndata into i…
ilan-gold Jul 15, 2024
899184f
(fix): dtype for reading
ilan-gold Jul 15, 2024
efb70ec
(fix): ignore import cycle problem (why??)
ilan-gold Jul 16, 2024
118f43c
(fix): add issue
ilan-gold Jul 16, 2024
f742a0a
(fix): subclass `Reader` to remove `datasetkwargs`
ilan-gold Jul 18, 2024
ae68731
(fix): add message tp errpr
ilan-gold Jul 18, 2024
f5e7760
Update tests/test_io_elementwise.py
ilan-gold Jul 18, 2024
96b13a3
(fix): correct `self.callback` check
ilan-gold Jul 18, 2024
9c68e36
(fix): erroneous diffs
ilan-gold Jul 18, 2024
410aeda
(fix): extra `read_elem` `dataset_kwargs`
ilan-gold Jul 18, 2024
31a30c4
(fix): remove more `dataset_kwargs` nonsense
ilan-gold Jul 18, 2024
80fe8cb
(chore): add docs
ilan-gold Jul 18, 2024
b314248
(fix): use `block_info` for dense
ilan-gold Jul 18, 2024
02d4735
(fix): more erroneous diffs
ilan-gold Jul 18, 2024
6e5534a
(fix): use context again
ilan-gold Jul 18, 2024
d26cfe8
(fix): change size by dimension in tests
ilan-gold Jul 22, 2024
94e43a3
(refactor): clean up `get_elem_name`
ilan-gold Jul 22, 2024
5160016
(fix): try new sphinx for error
ilan-gold Jul 22, 2024
43da9a3
(fix): return type
ilan-gold Jul 22, 2024
9735ced
(fix): protocol for reading
ilan-gold Jul 22, 2024
f1730c3
(fix): bring back ignored warning
ilan-gold Jul 22, 2024
9861b56
Fix docs
flying-sheep Jul 22, 2024
235096a
almost fix typing
flying-sheep Jul 22, 2024
dce9f07
add wrapper
flying-sheep Jul 22, 2024
2725ef2
move into type checking
flying-sheep Jul 22, 2024
ffe89f0
(fix): small type fxes
ilan-gold Jul 22, 2024
6cb231e
Merge branch 'main' into ig/read_dask_elem
ilan-gold Jul 22, 2024
75a64fc
block info types
flying-sheep Jul 22, 2024
3f734fe
simplify
flying-sheep Jul 22, 2024
c4c2356
rename
flying-sheep Jul 22, 2024
cc67a9b
simplify more
flying-sheep Jul 22, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 5 additions & 1 deletion src/anndata/_io/specs/__init__.py
Original file line number Diff line number Diff line change
@@ -1,21 +1,25 @@
from __future__ import annotations

from . import methods
from . import lazy_methods, methods
from .registry import (
_LAZY_REGISTRY, # noqa: F401
_REGISTRY, # noqa: F401
IOSpec,
Reader,
Writer,
get_spec,
read_elem,
read_elem_as_dask,
write_elem,
)

__all__ = [
"methods",
"lazy_methods",
"write_elem",
"get_spec",
"read_elem",
"read_elem_as_dask",
"Reader",
"Writer",
"IOSpec",
Expand Down
103 changes: 103 additions & 0 deletions src/anndata/_io/specs/lazy_methods.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,103 @@
from __future__ import annotations

from contextlib import contextmanager
from pathlib import Path

import h5py
import numpy as np
from scipy import sparse

import anndata as ad
from anndata.compat import H5Array, H5Group, ZarrArray, ZarrGroup

from .registry import _LAZY_REGISTRY, IOSpec

# TODO: settings
stride = 100
h5_chunks = 1000
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved


def make_dask_array(is_csc, shape, make_dask_chunk, dtype):
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved
import dask.array as da

chunks = [None, None]
major_index = int(is_csc)
minor_index = (is_csc + 1) % 2
chunks[minor_index] = (shape[minor_index],)
chunks[major_index] = (stride,) * (shape[major_index] // stride) + (
shape[major_index] % stride,
)
memory_format = [sparse.csr_matrix, sparse.csc_matrix][major_index]
da_mtx = da.map_blocks(
make_dask_chunk,
dtype=dtype,
chunks=chunks,
meta=memory_format((0, 0), dtype=np.float32),
)
return da_mtx


def make_index(is_csc, stride, shape, block_id):
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved
index = (
slice(
block_id[is_csc] * stride,
min((block_id[is_csc] * stride) + stride, shape[0]),
),
)
if is_csc:
return (slice(None, None, None),) + index
return index
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved


@contextmanager
def maybe_open_h5(filename_or_elem: str | ZarrGroup, elem_name: str):
if isinstance(filename_or_elem, str):
file = h5py.File(filename_or_elem, "r")
try:
yield file[elem_name]
finally:
file.close()
else:
try:
yield filename_or_elem
finally:
pass
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved


@_LAZY_REGISTRY.register_read(H5Group, IOSpec("csc_matrix", "0.1.0"))
@_LAZY_REGISTRY.register_read(H5Group, IOSpec("csr_matrix", "0.1.0"))
@_LAZY_REGISTRY.register_read(ZarrGroup, IOSpec("csc_matrix", "0.1.0"))
@_LAZY_REGISTRY.register_read(ZarrGroup, IOSpec("csr_matrix", "0.1.0"))
def read_sparse_as_dask_h5(elem, _reader):
filename_or_elem = elem.file.filename if isinstance(elem, H5Group) else elem
elem_name = elem.name if isinstance(elem, H5Group) else Path(elem.path).name
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please read the pathlib docs. This should almost definitely be PurePosixPath. I really hope we don’t use filesystem paths for other abstract paths anywhere, that would really mess things up.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What should be a PurePosixPath? We don't control the elem, so I think you need to cast it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

elem.path is the relative path inside of the zarr hierarchy, right? converting that to Path (a concrete path) makes no sense. It’s probably not too bad, but as said: Please read the intro here: https://docs.python.org/3/library/pathlib.html

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@flying-sheep I was just going to ask - why a PurePosixPath instead of PurePath? I guess since it is relative?

Copy link
Member

@flying-sheep flying-sheep Jul 2, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shape = elem.attrs["shape"]
dtype = elem["data"].dtype
is_csc = elem.attrs["encoding-type"] == "csc_matrix"

def make_dask_chunk(block_id=None):
# We need to open the file in each task since `dask` cannot share h5py objects when using `dask.distributed`
# https://github.com/scverse/anndata/issues/1105
with maybe_open_h5(filename_or_elem, elem_name) as f:
mtx = ad.experimental.sparse_dataset(f)
index = make_index(is_csc, stride, shape, block_id)
chunk = mtx[index]
return chunk

return make_dask_array(is_csc, shape, make_dask_chunk, dtype)


@_LAZY_REGISTRY.register_read(H5Array, IOSpec("array", "0.2.0"))
def read_h5_array(elem, _reader):
import dask.array as da

if not hasattr(elem, "chunks") or elem.chunks is None:
return da.from_array(elem, chunks=(h5_chunks,) * len(elem.shape))
return da.from_array(elem)

Check warning on line 96 in src/anndata/_io/specs/lazy_methods.py

View check run for this annotation

Codecov / codecov/patch

src/anndata/_io/specs/lazy_methods.py#L96

Added line #L96 was not covered by tests


@_LAZY_REGISTRY.register_read(ZarrArray, IOSpec("array", "0.2.0"))
def read_zarr_array(elem, _reader):
import dask.array as da

return da.from_zarr(elem)
22 changes: 18 additions & 4 deletions src/anndata/_io/specs/registry.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,9 +145,7 @@ def get_reader(
if (src_type, spec, modifiers) in self.read:
return self.read[(src_type, spec, modifiers)]
else:
raise IORegistryError._from_read_parts(
"read", _REGISTRY.read, src_type, spec
)
raise IORegistryError._from_read_parts("read", self.read, src_type, spec)

def has_reader(
self, src_type: type, spec: IOSpec, modifiers: frozenset[str] = frozenset()
Expand Down Expand Up @@ -176,7 +174,7 @@ def get_partial_reader(
return self.read_partial[(src_type, spec, modifiers)]
else:
raise IORegistryError._from_read_parts(
"read_partial", _REGISTRY.read_partial, src_type, spec
"read_partial", self.read_partial, src_type, spec
)

def get_spec(self, elem: Any) -> IOSpec:
Expand All @@ -188,6 +186,7 @@ def get_spec(self, elem: Any) -> IOSpec:


_REGISTRY = IORegistry()
_LAZY_REGISTRY = IORegistry()


@singledispatch
Expand Down Expand Up @@ -332,6 +331,21 @@ def read_elem(elem: StorageType) -> Any:
return Reader(_REGISTRY).read_elem(elem)


def read_elem_as_dask(elem: StorageType) -> Any:
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved
"""
Read an element from a store lazily.

Assumes that the element is encoded using the anndata encoding. This function will
determine the encoded type using the encoding metadata stored in elem's attributes.

Params
------
elem
The stored element.
"""
return Reader(_LAZY_REGISTRY).read_elem(elem)


def write_elem(
store: GroupStorageType,
k: str,
Expand Down
88 changes: 68 additions & 20 deletions tests/test_io_elementwise.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,14 @@
from scipy import sparse

import anndata as ad
from anndata._io.specs import _REGISTRY, IOSpec, get_spec, read_elem, write_elem
from anndata._io.specs import (
_REGISTRY,
IOSpec,
get_spec,
read_elem,
read_elem_as_dask,
write_elem,
)
from anndata._io.specs.registry import IORegistryError
from anndata.compat import H5Group, ZarrGroup, _read_attr
from anndata.tests.helpers import (
Expand Down Expand Up @@ -47,6 +54,39 @@ def store(request, tmp_path) -> H5Group | ZarrGroup:
file.close()


sparse_formats = ["csr", "csc"]
SIZE = 1000


@pytest.fixture(scope="function", params=sparse_formats)
def sparse_format(request):
return request.param


def create_dense_store(store):
X = np.random.randn(SIZE, SIZE)

write_elem(store, "X", X)
return store


def create_sparse_store(sparse_format, store):
import dask.array as da

X = sparse.random(
SIZE,
SIZE,
format=sparse_format,
density=0.01,
random_state=np.random.default_rng(),
)
X_dask = da.from_array(X, chunks=(100, 100))

write_elem(store, "X", X)
write_elem(store, "X_dask", X_dask)
return store
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're going to use these in more places, I would suggest not making them square. This has definitely caused problems for us in the past. Maybe even make the shape/ chunk size a parameter?

I would also write a short doc string about what is actually being returned. I think it's a little strange that one of these groups has two things in it, while other only has one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can just do the chunking along the axis. The shape shouldn't matter. We can reoslve it from sparse_format



@pytest.mark.parametrize(
"value,encoding_type",
[
Expand Down Expand Up @@ -126,30 +166,38 @@ def test_io_spec_cupy(store, value, encoding_type):
assert get_spec(store[key]) == _REGISTRY.get_spec(value)


@pytest.mark.parametrize("sparse_format", ["csr", "csc"])
def test_dask_write_sparse(store, sparse_format):
import dask.array as da
def test_dask_write_sparse(sparse_format, store):
x_sparse_store = create_sparse_store(sparse_format, store)
X_from_disk = read_elem(x_sparse_store["X"])
X_dask_from_disk = read_elem(x_sparse_store["X_dask"])

X = sparse.random(
1000,
1000,
format=sparse_format,
density=0.01,
random_state=np.random.default_rng(),
)
X_dask = da.from_array(X, chunks=(100, 100))
assert_equal(X_from_disk, X_dask_from_disk)
assert_equal(dict(x_sparse_store["X"].attrs), dict(x_sparse_store["X_dask"].attrs))

write_elem(store, "X", X)
write_elem(store, "X_dask", X_dask)
assert x_sparse_store["X_dask/indptr"].dtype == np.int64
assert x_sparse_store["X_dask/indices"].dtype == np.int64

X_from_disk = read_elem(store["X"])
X_dask_from_disk = read_elem(store["X_dask"])

@pytest.mark.parametrize("arr_type", ["dense", *sparse_formats])
def test_read_lazy_2d_dask(arr_type, store):
if arr_type == "dense":
arr_store = create_dense_store(store)
else:
arr_store = create_sparse_store(arr_type, store)
X_dask_from_disk = read_elem_as_dask(arr_store["X"])
X_from_disk = read_elem(arr_store["X"])

assert_equal(X_from_disk, X_dask_from_disk)
assert_equal(dict(store["X"].attrs), dict(store["X_dask"].attrs))
random_int_indices = np.random.randint(0, SIZE, (SIZE // 10,))
random_bool_mask = np.random.randn(SIZE) > 0
index_slice = slice(0, SIZE // 10)
for index in [random_int_indices, index_slice, random_bool_mask]:
assert_equal(X_from_disk[index, :], X_dask_from_disk[index, :])
assert_equal(X_from_disk[:, index], X_dask_from_disk[:, index])

assert store["X_dask/indptr"].dtype == np.int64
assert store["X_dask/indices"].dtype == np.int64
if arr_type in {"csr", "csc"}:
assert arr_store["X_dask/indptr"].dtype == np.int64
assert arr_store["X_dask/indices"].dtype == np.int64


@pytest.mark.parametrize("sparse_format", ["csr", "csc"])
Expand Down Expand Up @@ -195,7 +243,7 @@ def test_write_anndata_to_root(store):
["attribute", "value"],
[
("encoding-type", "floob"),
("encoding-version", "10000.0"),
("encoding-version", "SIZE0.0"),
ilan-gold marked this conversation as resolved.
Show resolved Hide resolved
],
)
def test_read_iospec_not_found(store, attribute, value):
Expand Down
Loading