Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

export generate_dataset() function #152

Merged
merged 18 commits into from
Dec 5, 2023
Merged

export generate_dataset() function #152

merged 18 commits into from
Dec 5, 2023

Conversation

rcannood
Copy link
Collaborator

@rcannood rcannood commented Dec 4, 2023

Changes:

  • Re-exports the dataset generator function.
  • Reworked the arguments of the dataset generator function
  • Renamed internal functions
  • Fixed print method

Closes #136.

@rcannood rcannood requested a review from lazappi December 4, 2023 10:06
Copy link
Collaborator

@lazappi lazappi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've made some comments but I think it's pretty good. I would also update the examples to use this but maybe not everyone wants to do that.

R/generate_dataset.R Outdated Show resolved Hide resolved
Comment on lines +33 to +74
layer_types = c(
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas", #
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix",
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas",
"integer_csparse_with_nas", "integer_rsparse_with_nas"
),
obs_types = c(
"character", "integer", "factor", "factor_ordered", "logical", "numeric", "character_with_nas",
"integer_with_nas", "factor_with_nas", "factor_ordered_with_nas", "logical_with_nas", "numeric_with_nas"
),
var_types = c(
"character", "integer", "factor", "factor_ordered", "logical", "numeric", "character_with_nas",
"integer_with_nas", "factor_with_nas", "factor_ordered_with_nas", "logical_with_nas", "numeric_with_nas"
),
obsm_types = c(
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas",
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix",
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas",
"integer_csparse_with_nas", "integer_rsparse_with_nas", "character", "integer", "factor", "factor_ordered",
"logical", "numeric", "character_with_nas", "integer_with_nas", "factor_with_nas", "factor_ordered_with_nas",
"logical_with_nas", "numeric_with_nas"
),
varm_types = c(
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas",
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix",
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas",
"integer_csparse_with_nas", "integer_rsparse_with_nas", "character", "integer", "factor", "factor_ordered",
"logical", "numeric", "character_with_nas", "integer_with_nas", "factor_with_nas", "factor_ordered_with_nas",
"logical_with_nas", "numeric_with_nas"
),
obsp_types = c(
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas",
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix",
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas",
"integer_csparse_with_nas", "integer_rsparse_with_nas"
),
varp_types = c(
"numeric_matrix", "numeric_dense", "numeric_csparse", "numeric_rsparse", "numeric_matrix_with_nas",
"numeric_dense_with_nas", "numeric_csparse_with_nas", "numeric_rsparse_with_nas", "integer_matrix",
"integer_dense", "integer_csparse", "integer_rsparse", "integer_matrix_with_nas", "integer_dense_with_nas",
"integer_csparse_with_nas", "integer_rsparse_with_nas"
),
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like that all these are included but maybe we could make an easy way to generate a reduced list for examples?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Not sure how we could go about this, other than perhaps:

generate_dummy_dataset <- function(n_obs, n_vars) {
  generate_dataset(
    n_obs = n_obs,
    n_vars = n_vars,
    obs_types = c(...), # small set of types
    ...
  )
}

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, wait! We could turn it around. We could set the defaults generate_dataset() to something simple, and then create a helper function in the tests which contains all of the types.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I implemented a solution for this in 206f11a

X <- t(dummy$layers[["integer_csparse"]])
colnames(X) <- dummy$obs_names
rownames(X) <- dummy$var_names
X <- t(list$layers[["integer_csparse"]])
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be list$X in case integer_csparse wasn't generated? Same with the ones below.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

True -- though I think the Seurat conversion should be revised in light of Seuratv5. Maybe we can just pass all layers to Seurat without needing to be selective.

@@ -10,6 +12,8 @@ list_generators <- function() {
#'
#' @param class Name of the AnnData class. Must be one of `"HDF5AnnData"`
#' or `"InMemoryAnnData"`.
#'
#' @noRd
get_generator <- function(class = c("HDF5AnnData", "InMemoryAnnData")) {
# TODO: also support directly passing the correct class?
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what the purpose of this is. Is generator here different to the dataset generators?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes -- the generator function is used by the from_Seurat and from_SingleCellExperiment to create the correct type of AnnData object:

from_SingleCellExperiment <- function(sce, output_class = c("InMemory", "HDF5AnnData"), ...) { # nolint
  stopifnot(
    inherits(sce, "SingleCellExperiment")
  )

  # fetch generator
  generator <- get_generator(output_class)

  # ... create obs, var, ...

  generator$new(
    X = x,
    obs = obs,
    var = var,
    obs_names = obs_names,
    var_names = var_names,
    layers = layers,
    ...
  )
}

I agree, however, that the term generator is now somewhat confusing, given that there is a generate_dataset() function. Fortunately, the get_generator function is an internal one, so we can rename it to whatever we like :)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it might be better to have a more specific name. object_generator or converter maybe?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point! I renamed the generator to anndata constructor in 5093884

tests/testthat/test-generate_dataset.R Show resolved Hide resolved
@rcannood rcannood merged commit 1fd62d5 into main Dec 5, 2023
7 checks passed
@rcannood rcannood deleted the export_dataset_generator branch December 5, 2023 15:28
lazappi added a commit that referenced this pull request Dec 6, 2023
* origin/main:
  Version update (#161)
  re-enable tests for writing compressed files (#140)
  re-enable tests (#139)
  re-enable tests (#138)
  Add stripTrailingWhitespace option to .Rproj (#157)
  export `generate_dataset()` function (#152)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Move dummy data functions back to main package
2 participants