Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add H5AD read/write helpers #88

Merged
merged 10 commits into from
Sep 4, 2023
Merged

Add H5AD read/write helpers #88

merged 10 commits into from
Sep 4, 2023

Conversation

lazappi
Copy link
Collaborator

@lazappi lazappi commented Jul 6, 2023

Add user functions for reading/writing H5AD files to/from SingleCellExperiment/Seurat objects. This will likely be the main interface for most normal users.

Also adds functions for creating dummy data objects (based on the dummy_data() test helper).

@lazappi
Copy link
Collaborator Author

lazappi commented Jul 11, 2023

@rcannood The CI here (and on other PRs) is failing on all platforms with an error that I think is coming from your {anndata} package (in the round-trip tests). I can't replicate it locally, though to work out what is going on. Any thoughts on how you want to handle this?

@rcannood
Copy link
Collaborator

Sigh! That issue has been popping up everywhere. It's because scipy 1.11 introduced some breaking changes. The development version of reticulate (at rstudio/reticulate) has resolved the issue, but the changes still need to be released on CRAN. I'll add a workaround to the repo for now.

@rcannood
Copy link
Collaborator

@lazappi The main branch contains a workaround by switching to the development version of reticulate for now. Can you merge the main branch into this branch?

Copy link
Collaborator

@rcannood rcannood left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These changes make sense to me, though I'd like to allow users to also return hdf5anndata objects instead of only Seurat or SCE. Is this ok with you?

R/HDF5-write.R Outdated Show resolved Hide resolved
R/HDF5-read.R Outdated Show resolved Hide resolved
R/HDF5-write.R Outdated Show resolved Hide resolved
R/anndataR-package.R Outdated Show resolved Hide resolved
R/HDF5-write.R Show resolved Hide resolved
@lazappi
Copy link
Collaborator Author

lazappi commented Aug 8, 2023

These changes make sense to me, though I'd like to allow users to also return hdf5anndata objects instead of only Seurat or SCE. Is this ok with you?

I would prefer not to do this. Part of the reason for having these wrapper functions is to hide the HDF5AnnData interface object from most users. I can't see many use cases for interacting with it directly and I feel like anybody "advanced" enough to want to use it should be able to create the objects manually.

@rcannood
Copy link
Collaborator

Personally, I see no downsides to enabling this functionality, and again it's something that I actually need in order to use the package myself. Sure, I could do it the roundabout way without using read_h5ad, but I don't know why this should be the case when we might as well just add this functionality.

However, I can agree to not making the default. Would this be okay with you?

@lazappi
Copy link
Collaborator Author

lazappi commented Aug 29, 2023

I guess I can be convinced but I would maybe want to see what other people think. I think we just have a different philosophy/use case in mind. I know you have a use for it and could use it safely but you aren't a normal user and I worry about the average person misusing the interface objects if it's too easy to create them. Maybe I'm just paranoid though.

* Remove temporary workaround

* use `inherits()` instead of `is()`

* Add HDF5AnnData and InMemoryAnnData to `read_h5ad()`

* fix incorrect merge

* try to fix error in R devel

* fix docs

* add terminal newline

* remove conditions

* undo previous commit
@rcannood
Copy link
Collaborator

I merged some changes into this PR.

@lazappi Do you have an idea why the devel branch fails when reading the obs in certain unit tests?

@rcannood
Copy link
Collaborator

Namely:

 ── Error ('test-HDF5-read.R:127:3'): reading H5AD as Seurat works ─────────���────
<subscriptOutOfBoundsError/error/condition>
Error in `element[["mask"]]`: subscript out of bounds
Backtrace:
    ▆
 1. ├─anndataR::read_h5ad(file, to = "Seurat") at test-HDF5-read.R:127:2
 2. │ └─anndataR::to_Seurat(adata) at anndataR/R/HDF5-read.R:452:2
 3. │   └─base::ncol(obj$obs) at anndataR/R/Seurat.R:42:2
 4. └─anndataR (local) `<fn>`() at anndataR/R/Seurat.R:42:2
 5.   └─anndataR:::read_h5ad_element(private$.h5obj, "/obs", include_index = FALSE) at anndataR/R/HDF5AnnData.R:43:8
 6.     └─anndataR (local) read_fun(file = file, name = name, version = version, ...) at anndataR/R/HDF5-read.R:67:2
 7.       └─anndataR:::read_h5ad_collection(file, name, column_order) at anndataR/R/HDF5-read.R:361:2
 8.         └─anndataR:::read_h5ad_element(...) at anndataR/R/HDF5-read.R:416:4
 9.           └─anndataR (local) read_fun(file = file, name = name, version = version, ...) at anndataR/R/HDF5-read.R:67:2
── Error ('test-HDF5AnnData.R:32:3'): reading obs works ────────────────────────
<subscriptOutOfBoundsError/error/condition>
Error in `element[["mask"]]`: subscript out of bounds
Backtrace:
    ▆
 1. └─anndataR (local) `<fn>`() at test-HDF5AnnData.R:32:2
 2.   └─anndataR:::read_h5ad_element(private$.h5obj, "/obs", include_index = FALSE) at anndataR/R/HDF5AnnData.R:43:8
 3.     └─anndataR (local) read_fun(file = file, name = name, version = version, ...) at anndataR/R/HDF5-read.R:67:2
 4.       └─anndataR:::read_h5ad_collection(file, name, column_order) at anndataR/R/HDF5-read.R:361:2
 5.         └─anndataR:::read_h5ad_element(...) at anndataR/R/HDF5-read.R:416:4
 6.           └─anndataR (local) read_fun(file = file, name = name, version = version, ...) at anndataR/R/HDF5-read.R:67:2

@lazappi
Copy link
Collaborator Author

lazappi commented Sep 1, 2023

I think this might have to do with the changes to {rhdf5}. The nullable arrays are now read natively and you don't have to mess around with the masks yourself. I've fixed this in {zellkonverter} already so I can port that other here if you like.

@rcannood
Copy link
Collaborator

rcannood commented Sep 1, 2023

I see! Is this related to #83 ?

@lazappi
Copy link
Collaborator Author

lazappi commented Sep 1, 2023

Yes. Although I'm not sure if we are testing against the release or devel versions of Bioconductor packages (this change is only in devel).

@rcannood rcannood mentioned this pull request Sep 1, 2023
@rcannood
Copy link
Collaborator

rcannood commented Sep 1, 2023

The devel R CMD check uses the upcoming release of R, so it probably also has to use the development version of Bioconductor.

I created a PR (#90) which merges #83 into #88, which seems to have solved the issue. @lazappi Is it ok to merge #90?

@lazappi
Copy link
Collaborator Author

lazappi commented Sep 1, 2023

I made one minor comment but otherwise yes.

* support rhdf5 nullable vector and boolean enum implementation

- use `as.vector()` to reduce code duplication when reading
  nullable vectors
- closes #82

* implement and use read_h5ad_attributes

- 'quietly' to suppress warnings; set to TRUE when relevant warnings
  are handled by calling code

* use 'anndataR-category-unknown' as a class of warning

- versions of rhdf5 that don't support ENUM; more user-friendly message
- use to suppress irrelevant warnings in unit tests

* Tidy after merge and style

* More tidying...

* Roxygenise

* Fix namespace

---------

Co-authored-by: Martin Morgan <[email protected]>
Co-authored-by: Luke Zappia <[email protected]>
@rcannood
Copy link
Collaborator

rcannood commented Sep 2, 2023

Thanks! #90 was merged. Can this PR also be merged?

@lazappi
Copy link
Collaborator Author

lazappi commented Sep 4, 2023

Yeah, I think so

@rcannood rcannood merged commit 2b117bc into main Sep 4, 2023
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants