Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[TheiaProk wfs] upgrade StxTyper version and OPERON outputs #750

Draft
wants to merge 10 commits into
base: main
Choose a base branch
from
2 changes: 2 additions & 0 deletions docs/workflows/genomic_characterization/theiaprok.md
Original file line number Diff line number Diff line change
Expand Up @@ -2055,8 +2055,10 @@ The TheiaProk workflows automatically activate taxa-specific sub-workflows after
| staphopiasccmec_types_and_mecA_presence | String | staphopia-sccmec Hamming distance file | FASTA, ONT, PE, SE |
| staphopiasccmec_version | String | staphopia-sccmec presence and absence TSV file | FASTA, ONT, PE, SE |
| stxtyper_all_hits | String | Comma-separated list of matches of all types. Includes complete, partial, frameshift, internal stop, and novel hits. List is de-duplicated so multiple identical hits are only listed once. For example if 5 partial stx2 hits are detected in the genome, only 1 "stx2" will be listed in this field. To view the potential subtype for each partial hit, the user will need to view the stxtyper_report TSV file. | FASTA, ONT, PE, SE |
| stxtyper_ambiguous_hits | String | Comma-separated list of matches that have the OPERON output of "AMBIGUOUS". Ambiguous bases found in the query sequence (e.g., N) | FASTA, ONT, PE, SE |
| stxtyper_complete_operons | String | Comma-separated list of all COMPLETE operons detected by StxTyper. Show multiple hits if present in results. | FASTA, ONT, PE, SE |
| stxtyper_docker | String | Name of docker image used by the stxtyper task. | FASTA, ONT, PE, SE |
| stxtyper_extended_operons | String | Comma-separated list of all EXTENDED operons detected by StxTyper if coding sequence extends beyond the reference stop codon for one or both of the reference proteins. | FASTA, ONT, PE, SE |
| stxtyper_novel_hits | String | Comma-separated list of matches that have the OPERON output of "COMPLETE_NOVEL". Possible outputs "stx1", "stx2", or "stx1,stx2" | FASTA, ONT, PE, SE |
| stxtyper_num_hits | Int | Number of "hits" or rows present in the `stxtyper_report` TSV file | FASTA, ONT, PE, SE |
| stxtyper_partial_hits | String | Possible outputs "stx1", "stx2", or "stx1,stx2". Tells the user that there was a partial hit to either the A or B subunit, but does not describe which subunit, only the possible types from the PARTIAL matches. | FASTA, ONT, PE, SE |
Expand Down
25 changes: 21 additions & 4 deletions tasks/species_typing/escherichia_shigella/task_stxtyper.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@ task stxtyper {
File assembly
String samplename
Boolean enable_debugging = false # Additional messages are printed and files in $TMPDIR are not removed after running
String docker = "us-docker.pkg.dev/general-theiagen/staphb/stxtyper:1.0.24"
String docker = "us-docker.pkg.dev/general-theiagen/staphb/stxtyper:1.0.40"
Int disk_size = 50
Int cpu = 1
Int memory = 4
Expand All @@ -27,6 +27,7 @@ task stxtyper {
--nucleotide ~{assembly} \
--name ~{samplename} \
--output ~{samplename}_stxtyper.tsv \
--threads ~{cpu} \
~{true='--debug' false='' enable_debugging} \
--log ~{samplename}_stxtyper.log

Expand All @@ -40,9 +41,9 @@ task stxtyper {
echo "DEBUG: No hits found in StxTyper output TSV. Exiting task with exit code 0 now."

# create empty output files
touch stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt
touch stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt stxtyper_extended_operons.txt stxtyper_ambiguous_hits.txt
# put "none" into all of them so task does not fail
echo "None" | tee stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt
echo "None" | tee stxtyper_all_hits.txt stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt stxtyper_extended_operons.txt stxtyper_ambiguous_hits.txt
exit 0
fi

Expand Down Expand Up @@ -88,11 +89,25 @@ task stxtyper {
if [ "$(grep --silent -E 'FRAMESHIFT|INTERNAL_STOP' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then
echo "None" > stxtyper_stx_frameshifts_or_internal_stop_hits.txt
fi

### extended operons
echo "DEBUG: Parsing extended operons..."
awk -F'\t' -v OFS=, '$4 == "EXTENDED" {print $3}' ~{samplename}_stxtyper.tsv | paste -sd, - | tee stxtyper_extended_operons.txt
if [ "$(grep --silent 'EXTENDED' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then
echo "None" > stxtyper_extended_operons.txt
fi

### ambiguous hits
echo "DEBUG: Parsing ambiguous hits..."
awk -F'\t' -v OFS=, '$4 == "AMBIGUOUS" {print $3}' ~{samplename}_stxtyper.tsv | paste -sd, - | tee stxtyper_ambiguous_hits.txt
if [ "$(grep --silent 'AMBIGUOUS' ~{samplename}_stxtyper.tsv; echo $?)" -gt 0 ]; then
echo "None" > stxtyper_ambiguous_hits.txt
fi

echo "DEBUG: generating stx_type_all string output now..."
# sort and uniq so there are no duplicates; then paste into a single comma-separated line with commas
# sed is to remove any instances of "None" from the output
cat stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt | sed '/None/d' | sort | uniq | paste -sd, - > stxtyper_all_hits.txt
cat stxtyper_complete_operons.txt stxtyper_partial_hits.txt stxtyper_stx_frameshifts_or_internal_stop_hits.txt stx_novel_hits.txt stxtyper_extended_operons.txt stxtyper_ambiguous_hits.txt | sed '/None/d' | sort | uniq | paste -sd, - > stxtyper_all_hits.txt

fi
echo "DEBUG: Finished parsing StxTyper output TSV."
Expand All @@ -109,6 +124,8 @@ task stxtyper {
String stxtyper_partial_hits = read_string("stxtyper_partial_hits.txt")
String stxtyper_frameshifts_or_internal_stop_hits = read_string("stxtyper_stx_frameshifts_or_internal_stop_hits.txt")
String stxtyper_novel_hits = read_string("stx_novel_hits.txt")
String stxtyper_extended_operons = read_string("stxtyper_extended_operons.txt")
String stxtyper_ambiguous_hits = read_string("stxtyper_ambiguous_hits.txt")
}
runtime {
docker: "~{docker}"
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiaprok/test_wf_theiaprok_illumina_pe.yml
Original file line number Diff line number Diff line change
Expand Up @@ -514,9 +514,9 @@
- path: miniwdl_run/wdl/tasks/utilities/data_export/task_broad_terra_tools.wdl
md5sum: 59e18911ba07c16e01df38abe0e70477
- path: miniwdl_run/wdl/workflows/theiaprok/wf_theiaprok_illumina_pe.wdl
md5sum: a6130333e29e84f7fc5488fd717cee95
md5sum: 7215782a19b40e3414a622b1db758c4f
- path: miniwdl_run/wdl/workflows/utilities/wf_merlin_magic.wdl
md5sum: adf5789053a7f720f4555be4472270d0
md5sum: 98d1cf63c946ec480fdab9b8a0e1b4dd
- path: miniwdl_run/wdl/workflows/utilities/wf_read_QC_trim_pe.wdl
contains: ["version", "QC", "output"]
- path: miniwdl_run/workflow.log
Expand Down
4 changes: 2 additions & 2 deletions tests/workflows/theiaprok/test_wf_theiaprok_illumina_se.yml
Original file line number Diff line number Diff line change
Expand Up @@ -485,9 +485,9 @@
- path: miniwdl_run/wdl/tasks/utilities/data_export/task_broad_terra_tools.wdl
md5sum: 59e18911ba07c16e01df38abe0e70477
- path: miniwdl_run/wdl/workflows/theiaprok/wf_theiaprok_illumina_se.wdl
md5sum: c019ebbd2d88234d580a83d776de0d4a
md5sum: 2ef93d54c79615b57df794b91965d7e4
- path: miniwdl_run/wdl/workflows/utilities/wf_merlin_magic.wdl
md5sum: adf5789053a7f720f4555be4472270d0
md5sum: 98d1cf63c946ec480fdab9b8a0e1b4dd
- path: miniwdl_run/wdl/workflows/utilities/wf_read_QC_trim_se.wdl
md5sum: 09d9f68b9ca8bf94b6145ff9bed2edd1
- path: miniwdl_run/workflow.log
Expand Down
2 changes: 2 additions & 0 deletions workflows/theiaprok/wf_theiaprok_fasta.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -586,6 +586,8 @@ workflow theiaprok_fasta {
String? stxtyper_partial_hits = merlin_magic.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = merlin_magic.stxtyper_stx_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = merlin_magic.stxtyper_novel_hits
String? stxtyper_extended_operons = merlin_magic.stxtyper_extended_operons
String? stxtyper_ambiguous_hits = merlin_magic.stxtyper_ambiguous_hits
# Listeria Typing
File? lissero_results = merlin_magic.lissero_results
String? lissero_version = merlin_magic.lissero_version
Expand Down
2 changes: 2 additions & 0 deletions workflows/theiaprok/wf_theiaprok_illumina_pe.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -867,6 +867,8 @@ workflow theiaprok_illumina_pe {
String? stxtyper_partial_hits = merlin_magic.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = merlin_magic.stxtyper_stx_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = merlin_magic.stxtyper_novel_hits
String? stxtyper_extended_operons = merlin_magic.stxtyper_extended_operons
String? stxtyper_ambiguous_hits = merlin_magic.stxtyper_ambiguous_hits
# Shigella sonnei Typing
File? sonneityping_mykrobe_report_csv = merlin_magic.sonneityping_mykrobe_report_csv
File? sonneityping_mykrobe_report_json = merlin_magic.sonneityping_mykrobe_report_json
Expand Down
2 changes: 2 additions & 0 deletions workflows/theiaprok/wf_theiaprok_illumina_se.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -795,6 +795,8 @@ workflow theiaprok_illumina_se {
String? stxtyper_partial_hits = merlin_magic.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = merlin_magic.stxtyper_stx_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = merlin_magic.stxtyper_novel_hits
String? stxtyper_extended_operons = merlin_magic.stxtyper_extended_operons
String? stxtyper_ambiguous_hits = merlin_magic.stxtyper_ambiguous_hits
# Shigella sonnei Typing
File? sonneityping_mykrobe_report_csv = merlin_magic.sonneityping_mykrobe_report_csv
File? sonneityping_mykrobe_report_json = merlin_magic.sonneityping_mykrobe_report_json
Expand Down
2 changes: 2 additions & 0 deletions workflows/theiaprok/wf_theiaprok_ont.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -737,6 +737,8 @@ workflow theiaprok_ont {
String? stxtyper_partial_hits = merlin_magic.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = merlin_magic.stxtyper_stx_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = merlin_magic.stxtyper_novel_hits
String? stxtyper_extended_operons = merlin_magic.stxtyper_extended_operons
String? stxtyper_ambiguous_hits = merlin_magic.stxtyper_ambiguous_hits
# Shigella sonnei Typing
File? sonneityping_mykrobe_report_csv = merlin_magic.sonneityping_mykrobe_report_csv
File? sonneityping_mykrobe_report_json = merlin_magic.sonneityping_mykrobe_report_json
Expand Down
2 changes: 2 additions & 0 deletions workflows/utilities/wf_merlin_magic.wdl
Original file line number Diff line number Diff line change
Expand Up @@ -811,6 +811,8 @@ workflow merlin_magic {
String? stxtyper_partial_hits = stxtyper.stxtyper_partial_hits
String? stxtyper_stx_frameshifts_or_internal_stop_hits = stxtyper.stxtyper_frameshifts_or_internal_stop_hits
String? stxtyper_novel_hits = stxtyper.stxtyper_novel_hits
String? stxtyper_extended_operons = stxtyper.stxtyper_extended_operons
String? stxtyper_ambiguous_hits = stxtyper.stxtyper_ambiguous_hits
# Shigella sonnei Typing
File? sonneityping_mykrobe_report_csv = sonneityping.sonneityping_mykrobe_report_csv
File? sonneityping_mykrobe_report_json = sonneityping.sonneityping_mykrobe_report_json
Expand Down