Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add formatting checks to scpca-nf #823

Open
7 tasks
jashapiro opened this issue Feb 20, 2025 · 0 comments
Open
7 tasks

Add formatting checks to scpca-nf #823

jashapiro opened this issue Feb 20, 2025 · 0 comments

Comments

@jashapiro
Copy link
Member

As described in our Slab discussion, we would like to add data formatting checks to scpca-nf

While that notebook proposes adding such checks within the same script where the files are created, I think we should instead write separate scripts to perform the checks, as that will provide a bit more separation between the generating code and the checks, and help ensure that the files that are being checked are the final versions that were written to disk.

Each check script should have the prefix test_ (or something similar, depending on discussion), to distinguish them from scripts that are used for processing.

Formatting tests that fail should not cause the Nextflow process to fail, but should instead generate an error file with a description of the problem(s). This file can then be used to print an error with Nextflow's log.error() function, which will cause the error to be printed to the log (and screen if running locally), and can also be passed along to a separate process to collate and report on errors.

These scripts could be run as part of the Nextflow process that generates the files that they are testing, or we could establish a separate process that runs all of the tests, possibly combining this with the process that publishes outputs. The advantage of including the scripts either in the generating or publishing process is that this will reduce the number of times that the files need to be transferred between processes.

The files we expect to require format testing are listed below:

  • unfiltered SCE
  • filtered SCE
  • processed SCE
  • merged SCEs (while there was some discussion about whether this is necessary due to computational time, I think the check is worth it)
  • unfiltered AnnData (While trusting zellkonverter to do these is tempting, it seems advisable to make sure that any changes in that library do not cause unexpected effects either)
  • filtered AnnData
  • processed AnnData

One question is whether these scripts should be combined with the metrics calculations that are to be done after we make decisions on those metrics in #822.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant