Add new scripts to ASCC for 0.5.0 #56

DLBPointon · 2024-08-07T14:50:30Z

Description of feature

From Eerik
There are two scripts in the old non-nf-core ASCC repository that seem to be missing from the sanger-tol/ascc version:
https://github.com/sanger-tol/cobiontcheck/blob/compleasm2_fcs_test/filter_fasta_by_length.py
This is for optionally filtering the input assembly to remove sequences longer than a certain length. This is run before anything else is done with the assembly. The purpose of this is to prevent the pipeline choking on huge FASTA sequences (it was Jo Wood's idea to just leave huge sequences out from runs). James Torrance is doing his runs so that sequences longer than 1.9 Gb are left out from runs. FCS-GX has been hardcoded to not work with sequences longer than 1.9 Gb anyway
https://github.com/sanger-tol/cobiontcheck/blob/compleasm2_fcs_test/find_taxid_in_taxdump.py
This is for checking if the taxID given by the user exists in the NCBI taxdump file. This script is also run at the start of the pipeline run. The taxID may be missing from taxdump either because the taxdump is out of date or the user has provided a faulty taxID number. The check at the start of the run is to catch the error early.
So I think these two scripts should be included in the sanger-tol/ascc pipeline. I can try to make Nextflow modules of them myself if you're working on other things (edited)

These should be relatively easy additions

DLBPointon added the enhancement New feature or request label Aug 7, 2024

DLBPointon added this to the Release 1 milestone Aug 7, 2024

DLBPointon added a commit that referenced this issue Aug 8, 2024

Adding new scripts for filtering and double checking data #56

a599125

DLBPointon added a commit that referenced this issue Aug 8, 2024

Updates for #56

aeb7097

DLBPointon added a commit that referenced this issue Aug 8, 2024

Updates for #56

7405a9b

DLBPointon linked a pull request Aug 9, 2024 that will close this issue

Many additions #48

Merged

DLBPointon closed this as completed in 6008006 Sep 11, 2024

DLBPointon closed this as completed in #48 Sep 11, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add new scripts to ASCC for 0.5.0 #56

Add new scripts to ASCC for 0.5.0 #56

DLBPointon commented Aug 7, 2024

Add new scripts to ASCC for 0.5.0 #56

Add new scripts to ASCC for 0.5.0 #56

Comments

DLBPointon commented Aug 7, 2024

Description of feature