You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As described in our Slab discussion, we would like to add data formatting checks to scpca-nf
While that notebook proposes adding such checks within the same script where the files are created, I think we should instead write separate scripts to perform the checks, as that will provide a bit more separation between the generating code and the checks, and help ensure that the files that are being checked are the final versions that were written to disk.
Each check script should have the prefix test_ (or something similar, depending on discussion), to distinguish them from scripts that are used for processing.
Formatting tests that fail should not cause the Nextflow process to fail, but should instead generate an error file with a description of the problem(s). This file can then be used to print an error with Nextflow's log.error() function, which will cause the error to be printed to the log (and screen if running locally), and can also be passed along to a separate process to collate and report on errors.
These scripts could be run as part of the Nextflow process that generates the files that they are testing, or we could establish a separate process that runs all of the tests, possibly combining this with the process that publishes outputs. The advantage of including the scripts either in the generating or publishing process is that this will reduce the number of times that the files need to be transferred between processes.
The files we expect to require format testing are listed below:
unfiltered SCE
filtered SCE
processed SCE
merged SCEs (while there was some discussion about whether this is necessary due to computational time, I think the check is worth it)
unfiltered AnnData (While trusting zellkonverter to do these is tempting, it seems advisable to make sure that any changes in that library do not cause unexpected effects either)
filtered AnnData
processed AnnData
One question is whether these scripts should be combined with the metrics calculations that are to be done after we make decisions on those metrics in #822.
The text was updated successfully, but these errors were encountered:
As described in our Slab discussion, we would like to add data formatting checks to
scpca-nf
While that notebook proposes adding such checks within the same script where the files are created, I think we should instead write separate scripts to perform the checks, as that will provide a bit more separation between the generating code and the checks, and help ensure that the files that are being checked are the final versions that were written to disk.
Each check script should have the prefix
test_
(or something similar, depending on discussion), to distinguish them from scripts that are used for processing.Formatting tests that fail should not cause the Nextflow process to fail, but should instead generate an error file with a description of the problem(s). This file can then be used to print an error with Nextflow's
log.error()
function, which will cause the error to be printed to the log (and screen if running locally), and can also be passed along to a separate process to collate and report on errors.These scripts could be run as part of the Nextflow process that generates the files that they are testing, or we could establish a separate process that runs all of the tests, possibly combining this with the process that publishes outputs. The advantage of including the scripts either in the generating or publishing process is that this will reduce the number of times that the files need to be transferred between processes.
The files we expect to require format testing are listed below:
zellkonverter
to do these is tempting, it seems advisable to make sure that any changes in that library do not cause unexpected effects either)One question is whether these scripts should be combined with the metrics calculations that are to be done after we make decisions on those metrics in #822.
The text was updated successfully, but these errors were encountered: