You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
From Eerik
There are two scripts in the old non-nf-core ASCC repository that seem to be missing from the sanger-tol/ascc version: https://github.com/sanger-tol/cobiontcheck/blob/compleasm2_fcs_test/filter_fasta_by_length.py
This is for optionally filtering the input assembly to remove sequences longer than a certain length. This is run before anything else is done with the assembly. The purpose of this is to prevent the pipeline choking on huge FASTA sequences (it was Jo Wood's idea to just leave huge sequences out from runs). James Torrance is doing his runs so that sequences longer than 1.9 Gb are left out from runs. FCS-GX has been hardcoded to not work with sequences longer than 1.9 Gb anyway https://github.com/sanger-tol/cobiontcheck/blob/compleasm2_fcs_test/find_taxid_in_taxdump.py
This is for checking if the taxID given by the user exists in the NCBI taxdump file. This script is also run at the start of the pipeline run. The taxID may be missing from taxdump either because the taxdump is out of date or the user has provided a faulty taxID number. The check at the start of the run is to catch the error early.
So I think these two scripts should be included in the sanger-tol/ascc pipeline. I can try to make Nextflow modules of them myself if you're working on other things (edited)
These should be relatively easy additions
The text was updated successfully, but these errors were encountered:
Description of feature
From Eerik
There are two scripts in the old non-nf-core ASCC repository that seem to be missing from the sanger-tol/ascc version:
https://github.com/sanger-tol/cobiontcheck/blob/compleasm2_fcs_test/filter_fasta_by_length.py
This is for optionally filtering the input assembly to remove sequences longer than a certain length. This is run before anything else is done with the assembly. The purpose of this is to prevent the pipeline choking on huge FASTA sequences (it was Jo Wood's idea to just leave huge sequences out from runs). James Torrance is doing his runs so that sequences longer than 1.9 Gb are left out from runs. FCS-GX has been hardcoded to not work with sequences longer than 1.9 Gb anyway
https://github.com/sanger-tol/cobiontcheck/blob/compleasm2_fcs_test/find_taxid_in_taxdump.py
This is for checking if the taxID given by the user exists in the NCBI taxdump file. This script is also run at the start of the pipeline run. The taxID may be missing from taxdump either because the taxdump is out of date or the user has provided a faulty taxID number. The check at the start of the run is to catch the error early.
So I think these two scripts should be included in the sanger-tol/ascc pipeline. I can try to make Nextflow modules of them myself if you're working on other things (edited)
These should be relatively easy additions
The text was updated successfully, but these errors were encountered: