-
Notifications
You must be signed in to change notification settings - Fork 154
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
(feat): read_elem_as_dask
method
#1469
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #1469 +/- ##
==========================================
- Coverage 86.48% 84.17% -2.32%
==========================================
Files 36 37 +1
Lines 5823 5932 +109
==========================================
- Hits 5036 4993 -43
- Misses 787 939 +152
|
read_elem_lazy
methodread_elem_dask
method
read_elem_dask
methodread_elem_as_dask
method
tests/test_io_elementwise.py
Outdated
def create_dense_store(store): | ||
X = np.random.randn(SIZE, SIZE) | ||
|
||
write_elem(store, "X", X) | ||
return store | ||
|
||
|
||
def create_sparse_store(sparse_format, store): | ||
import dask.array as da | ||
|
||
X = sparse.random( | ||
SIZE, | ||
SIZE, | ||
format=sparse_format, | ||
density=0.01, | ||
random_state=np.random.default_rng(), | ||
) | ||
X_dask = da.from_array(X, chunks=(100, 100)) | ||
|
||
write_elem(store, "X", X) | ||
write_elem(store, "X_dask", X_dask) | ||
return store |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to use these in more places, I would suggest not making them square. This has definitely caused problems for us in the past. Maybe even make the shape/ chunk size a parameter?
I would also write a short doc string about what is actually being returned. I think it's a little strange that one of these groups has two things in it, while other only has one.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can just do the chunking along the axis. The shape shouldn't matter. We can reoslve it from sparse_format
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks great. I'm actually surprised how easily the subclassing seems to have worked out.
I've left one comment, which I think should be addressed but I hope is simple. One more minor nit which is simple, but can be left.
I think get_elem_name
should live next to anndata._core.file_backing.filename
. I don't know that file_backing
is actually the best module for that, and maybe it would fit better in a util
module. Also, get_elem_name
seems like an actually implemented version of AnnData._io.specs.registry.elem_key
. I think we should just delete elem_key
.
OK now I’m done lol |
My thinking in terms of the naming for this would be to use this for
xarray
as well but we don't have to.read_elem_as_dask
#1430 and closes Distributed writing for H5ad format due to h5py objects being unserializable #1105 via our careful handling of the file descriptor