-
-
Notifications
You must be signed in to change notification settings - Fork 11
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
export generate_dataset()
function
#152
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've made some comments but I think it's pretty good. I would also update the examples to use this but maybe not everyone wants to do that.
layer_types = c( | ||
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas", # | ||
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix", | ||
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas", | ||
"integer_csparse_with_nas", "integer_rsparse_with_nas" | ||
), | ||
obs_types = c( | ||
"character", "integer", "factor", "factor_ordered", "logical", "numeric", "character_with_nas", | ||
"integer_with_nas", "factor_with_nas", "factor_ordered_with_nas", "logical_with_nas", "numeric_with_nas" | ||
), | ||
var_types = c( | ||
"character", "integer", "factor", "factor_ordered", "logical", "numeric", "character_with_nas", | ||
"integer_with_nas", "factor_with_nas", "factor_ordered_with_nas", "logical_with_nas", "numeric_with_nas" | ||
), | ||
obsm_types = c( | ||
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas", | ||
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix", | ||
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas", | ||
"integer_csparse_with_nas", "integer_rsparse_with_nas", "character", "integer", "factor", "factor_ordered", | ||
"logical", "numeric", "character_with_nas", "integer_with_nas", "factor_with_nas", "factor_ordered_with_nas", | ||
"logical_with_nas", "numeric_with_nas" | ||
), | ||
varm_types = c( | ||
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas", | ||
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix", | ||
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas", | ||
"integer_csparse_with_nas", "integer_rsparse_with_nas", "character", "integer", "factor", "factor_ordered", | ||
"logical", "numeric", "character_with_nas", "integer_with_nas", "factor_with_nas", "factor_ordered_with_nas", | ||
"logical_with_nas", "numeric_with_nas" | ||
), | ||
obsp_types = c( | ||
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas", | ||
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix", | ||
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas", | ||
"integer_csparse_with_nas", "integer_rsparse_with_nas" | ||
), | ||
varp_types = c( | ||
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas", | ||
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix", | ||
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas", | ||
"integer_csparse_with_nas", "integer_rsparse_with_nas" | ||
), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like that all these are included but maybe we could make an easy way to generate a reduced list for examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see. Not sure how we could go about this, other than perhaps:
generate_dummy_dataset <- function(n_obs, n_vars) {
generate_dataset(
n_obs = n_obs,
n_vars = n_vars,
obs_types = c(...), # small set of types
...
)
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Aha, wait! We could turn it around. We could set the defaults generate_dataset()
to something simple, and then create a helper function in the tests which contains all of the types.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I implemented a solution for this in 206f11a
R/generate_dataset.R
Outdated
X <- t(dummy$layers[["integer_csparse"]]) | ||
colnames(X) <- dummy$obs_names | ||
rownames(X) <- dummy$var_names | ||
X <- t(list$layers[["integer_csparse"]]) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be list$X
in case integer_csparse
wasn't generated? Same with the ones below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True -- though I think the Seurat conversion should be revised in light of Seuratv5. Maybe we can just pass all layers to Seurat without needing to be selective.
@@ -10,6 +12,8 @@ list_generators <- function() { | |||
#' | |||
#' @param class Name of the AnnData class. Must be one of `"HDF5AnnData"` | |||
#' or `"InMemoryAnnData"`. | |||
#' | |||
#' @noRd | |||
get_generator <- function(class = c("HDF5AnnData", "InMemoryAnnData")) { | |||
# TODO: also support directly passing the correct class? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not quite sure what the purpose of this is. Is generator
here different to the dataset generators?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes -- the generator function is used by the from_Seurat
and from_SingleCellExperiment
to create the correct type of AnnData object:
from_SingleCellExperiment <- function(sce, output_class = c("InMemory", "HDF5AnnData"), ...) { # nolint
stopifnot(
inherits(sce, "SingleCellExperiment")
)
# fetch generator
generator <- get_generator(output_class)
# ... create obs, var, ...
generator$new(
X = x,
obs = obs,
var = var,
obs_names = obs_names,
var_names = var_names,
layers = layers,
...
)
}
I agree, however, that the term generator
is now somewhat confusing, given that there is a generate_dataset()
function. Fortunately, the get_generator
function is an internal one, so we can rename it to whatever we like :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, it might be better to have a more specific name. object_generator
or converter
maybe?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point! I renamed the generator to anndata constructor in 5093884
Co-authored-by: Luke Zappia <[email protected]>
Changes:
Closes #136.