Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

H5ad helpers with pr83 #90

Merged
merged 9 commits into from
Sep 1, 2023
54 changes: 31 additions & 23 deletions R/HDF5-read.R
Original file line number Diff line number Diff line change
Expand Up @@ -64,6 +64,7 @@ read_h5ad_element <- function(file, name, type = NULL, version = NULL, ...) {
"' for element '", name, "'"
)
)

read_fun(file = file, name = name, version = version, ...)
}

Expand Down Expand Up @@ -183,18 +184,7 @@ read_h5ad_rec_array <- function(file, name, version = "0.2.0") {
#'
#' @return a boolean vector
read_h5ad_nullable_boolean <- function(file, name, version = "0.1.0") {
version <- match.arg(version)

element <- rhdf5::h5read(file, name)

# Get mask and convert to Boolean
mask <- as.logical(element[["mask"]])

# Get values and set missing
element <- as.logical(element[["values"]])
element[mask] <- NA

return(element)
as.logical(read_h5ad_nullable(file, name, version))
}

#' Read H5AD nullable integer
Expand All @@ -207,16 +197,32 @@ read_h5ad_nullable_boolean <- function(file, name, version = "0.1.0") {
#'
#' @return an integer vector
read_h5ad_nullable_integer <- function(file, name, version = "0.1.0") {
as.integer(read_h5ad_nullable(file, name, version))
}

#' Read H5AD nullable
#'
#' Read a nullable vector (boolean or integer) from an H5AD file
#'
#' @param file Path to a H5AD file or an open H5AD handle
#' @param name Name of the element within the H5AD file
#' @param version Encoding version of the element to read
#'
#' @return a nullable vector
read_h5ad_nullable <- function(file, name, version = "0.1.0") {
version <- match.arg(version)

element <- rhdf5::h5read(file, name)

# Get mask and convert to Boolean
mask <- as.logical(element[["mask"]])

# Get values and set missing
element <- as.integer(element[["values"]])
element[mask] <- NA_integer_
# Some versions of rhdf5 automatically apply mask, in which case
# there is no 'mask' element
if (!is.null(names(element))) {
# Get mask and convert to Boolean
mask <- as.logical(element[["mask"]])
# Get values and set missing
element <- as.vector(element[["values"]])
element[mask] <- NA
}

return(element)
}
Expand Down Expand Up @@ -272,15 +278,17 @@ read_h5ad_categorical <- function(file, name, version = "0.2.0") {

levels <- element[["categories"]]

ordered <- element[["ordered"]]
if (is.null(ordered)) {
attributes <- rhdf5::h5readAttributes(file, name)
ordered <- attributes[["ordered"]]
if (is.na(ordered)) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Without any testing I think this might need to be is.null() (like in the original line)

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm. Assuming attributes is a named list not containing the key ordered, that would make sense. However, when I change is.na(ordered) to is.null(ordered) and run the tests again, I get the following error:

Error (test-HDF5AnnData.R:32:3): reading obs works
Error in `if (ordered) "ordered"`: argument is not interpretable as logical
Backtrace:
    ▆
 1. └─anndataR (local) `<fn>`() at test-HDF5AnnData.R:32:2
 2.   └─anndataR:::read_h5ad_element(private$.h5obj, "/obs", include_index = FALSE) at anndataR/R/HDF5AnnData.R:43:8
 3.     └─anndataR (local) read_fun(file = file, name = name, version = version, ...) at anndataR/R/HDF5-read.R:68:2
 4.       └─anndataR:::read_h5ad_collection(file, name, column_order) at anndataR/R/HDF5-read.R:369:2
 5.         └─anndataR:::read_h5ad_element(...) at anndataR/R/HDF5-read.R:424:4
 6.           └─anndataR (local) read_fun(file = file, name = name, version = version, ...) at anndataR/R/HDF5-read.R:68:2
 7.             └─base::factor(codes, labels = levels, ordered = ordered) at anndataR/R/HDF5-read

I can't even open the example hdf5 file:

> devtools::load_all()
> file <- system.file("extdata", "example.h5ad", package = "anndataR")
> adata <- HDF5AnnData$new(file)
> obs <- adata$obs
Error in if (ordered) "ordered" : 
  argument is not interpretable as logical

Copy link
Collaborator Author

@rcannood rcannood Sep 1, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When I add print(attributes) before the if-statement, I see:

> adata <- HDF5AnnData$new(file)
> adata
$`encoding-type`
[1] "categorical"

$`encoding-version`
[1] "0.2.0"

$ordered
[1] NA

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Going to assume is.na(ordered) is the correct condition. If we find out that ordered can sometimes be NULL as well, we can always add a second condition.

# This version of {rhdf5} doesn't yet support ENUM type attributes so we
# can't tell if the categorical should be ordered,
# see https://github.com/grimbough/rhdf5/issues/125
warning(
"Unable to determine if categorical '", name,
"' is ordered, assuming it isn't"
)

ordered <- FALSE
}

Expand Down Expand Up @@ -412,12 +420,12 @@ read_h5ad_collection <- function(file, name, column_order) {
columns <- list()
for (col_name in column_order) {
new_name <- paste0(name, "/", col_name)
encoding <- rhdf5::h5readAttributes(file, new_name)
encoding <- read_h5ad_encoding(file, new_name)
columns[[col_name]] <- read_h5ad_element(
file = file,
name = new_name,
type = encoding$`encoding-type`,
version = encoding$`encoding-version`
type = encoding$type,
version = encoding$version
)
}
columns
Expand Down
21 changes: 21 additions & 0 deletions man/read_h5ad_nullable.Rd

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.