Skip to content

Commit

Permalink
Recalculate AFs before consensus building
Browse files Browse the repository at this point in the history
  • Loading branch information
wm75 committed Feb 2, 2022
1 parent 970e809 commit 6f56432
Show file tree
Hide file tree
Showing 3 changed files with 60 additions and 27 deletions.
Original file line number Diff line number Diff line change
@@ -1,5 +1,38 @@
# Changelog

## [0.3] 2022-02-02

### Fixed
- Apply AF thresholds on unbiased AF values recalculated from DP4 and DP fields
instead of on AF values provided by the variant caller to ensure proper
variant gating (into consensus and ambiguous variants) for lofreq-called
data.

https://github.com/CSB5/lofreq/issues/80 means that lofreq-calculated AF
values are lower bounds of true AFs when bases are excluded from calling
based on base quality. The extent of AF underestimation depends on the
fraction of bases with sub-threshold (30 for this workflow) base qualities.

By recalculating AFs as (DP4[2] + DP4[3]) / DP we avoid this issue for
variant calls generated by lofreq. For variants called with Galaxy's medaka
consensus/variant wrappers, original AFs are computed with the exact same
formula so the recalculation by the workflow does not affect variant gating
in that case.

### Changed
- Increase the default for consensus variant allele frequency threshold to 0.75.
Correct calculation of unbiased AF values increases the typical AF of
consensus variants more than enough to justify the change.
- The following tools are updated to their latest wrapper versions or revisions:

- bcftools_consensus
- collapse_collections
- the gops subtract and merge tools
- snpsift

None of these updates are expected to impact the generated consensus
sequences.

## [0.2.2] 2021-12-13

### Added
Expand All @@ -12,6 +45,7 @@
Added RO-Crate metadata file. No functional changes.

## [0.2] - 2021-04-30

### Changed
- Lower the default for consensus variant allele frequency threshold to 0.7
(from 0.8).
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -5,16 +5,16 @@ This workflow aims at generating reliable consensus sequences from variant
calls according to transparent criteria that capture at least some of the
complexity of variant calling.

It takes a collection of VCFs and a collection of the corresponding
aligned reads (for the purpose of calculating genome-wide coverage) such as
produced by any of the four variant calling workflows in
It takes a collection of VCFs (with DP and DP4 INFO fields) and a collection of
the corresponding aligned reads (for the purpose of calculating genome-wide
coverage) such as produced by any of the variant calling workflows in
https://github.com/galaxyproject/iwc/tree/main/workflows/sars-cov-2-variant-calling
and generates a collection of viral consensus sequences and a multisample FASTA
of all these sequences.

Each consensus sequence is guaranteed to capture all called, filter-passing
variants as defined in the VCF of its sample that reach a user-defined
consensus allele frequency threshold.
Each consensus sequence is guaranteed to capture all called, filter-passing (as
per the FILTER column of the VCF input) variants found in the VCF of its sample
that reach a user-defined consensus allele frequency threshold.

Filter-failing variants and variants below a second user-defined minimal
allele frequency threshold will be ignored.
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@
"format-version": "0.1",
"license": "MIT",
"name": "COVID-19: consensus construction",
"release": "0.2.2",
"release": "0.3",
"steps": {
"0": {
"annotation": "Collection of VCFs produced by upstream workflows for variation analysis",
Expand Down Expand Up @@ -71,7 +71,7 @@
"y": 687.5
},
"tool_id": null,
"tool_state": "{\"default\": 0.7, \"parameter_type\": \"float\", \"optional\": true}",
"tool_state": "{\"default\": 0.75, \"parameter_type\": \"float\", \"optional\": true}",
"tool_version": null,
"type": "parameter_input",
"uuid": "52664bd7-b500-40a1-935b-8ac6df7003e5",
Expand Down Expand Up @@ -267,7 +267,7 @@
"owner": "iuc",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"components\": [{\"__index__\": 0, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \"((FILTER = 'PASS') | ( na FILTER )) & (AF >= \"}}, {\"__index__\": 1, \"param_type\": {\"select_param_type\": \"float\", \"__current_case__\": 2, \"component_value\": {\"__class__\": \"ConnectedValue\"}}}, {\"__index__\": 2, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \")\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_state": "{\"components\": [{\"__index__\": 0, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \"( ( FILTER = 'PASS' ) | ( na FILTER ) ) & ( ( DP4[2] + DP4[3] ) >= ( \"}}, {\"__index__\": 1, \"param_type\": {\"select_param_type\": \"float\", \"__current_case__\": 2, \"component_value\": {\"__class__\": \"ConnectedValue\"}}}, {\"__index__\": 2, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \" * DP ) )\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "0.1.1",
"type": "tool",
"uuid": "8ec54933-36d6-404c-b0a6-5a14b04eb30e",
Expand Down Expand Up @@ -321,7 +321,7 @@
"owner": "iuc",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"components\": [{\"__index__\": 0, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \"(AF > \"}}, {\"__index__\": 1, \"param_type\": {\"select_param_type\": \"float\", \"__current_case__\": 2, \"component_value\": {\"__class__\": \"ConnectedValue\"}}}, {\"__index__\": 2, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \") & (AF < \"}}, {\"__index__\": 3, \"param_type\": {\"select_param_type\": \"float\", \"__current_case__\": 2, \"component_value\": {\"__class__\": \"ConnectedValue\"}}}, {\"__index__\": 4, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \") & ((FILTER = 'PASS') | ( na FILTER ))\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_state": "{\"components\": [{\"__index__\": 0, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \"( ( DP4[2] + DP4[3] ) > ( \"}}, {\"__index__\": 1, \"param_type\": {\"select_param_type\": \"float\", \"__current_case__\": 2, \"component_value\": {\"__class__\": \"ConnectedValue\"}}}, {\"__index__\": 2, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \" * DP ) ) & ( ( DP4[2] + DP4[3] ) < ( \"}}, {\"__index__\": 3, \"param_type\": {\"select_param_type\": \"float\", \"__current_case__\": 2, \"component_value\": {\"__class__\": \"ConnectedValue\"}}}, {\"__index__\": 4, \"param_type\": {\"select_param_type\": \"text\", \"__current_case__\": 0, \"component_value\": \" * DP ) ) & ( ( FILTER = 'PASS' ) | ( na FILTER ) )\"}}], \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "0.1.1",
"type": "tool",
"uuid": "51ac3588-c801-4551-8345-b04cc3571fea",
Expand Down Expand Up @@ -480,7 +480,7 @@
},
"tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_filter/4.3+t.galaxy1",
"tool_shed_repository": {
"changeset_revision": "2e497a770bca",
"changeset_revision": "5fab4f81391d",
"name": "snpsift",
"owner": "iuc",
"tool_shed": "toolshed.g2.bx.psu.edu"
Expand Down Expand Up @@ -542,7 +542,7 @@
},
"tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_filter/4.3+t.galaxy1",
"tool_shed_repository": {
"changeset_revision": "2e497a770bca",
"changeset_revision": "5fab4f81391d",
"name": "snpsift",
"owner": "iuc",
"tool_shed": "toolshed.g2.bx.psu.edu"
Expand Down Expand Up @@ -656,7 +656,7 @@
},
"tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_extractFields/4.3+t.galaxy0",
"tool_shed_repository": {
"changeset_revision": "09d6806c609e",
"changeset_revision": "5fab4f81391d",
"name": "snpsift",
"owner": "iuc",
"tool_shed": "toolshed.g2.bx.psu.edu"
Expand Down Expand Up @@ -714,7 +714,7 @@
},
"tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/snpsift/snpSift_extractFields/4.3+t.galaxy0",
"tool_shed_repository": {
"changeset_revision": "09d6806c609e",
"changeset_revision": "5fab4f81391d",
"name": "snpsift",
"owner": "iuc",
"tool_shed": "toolshed.g2.bx.psu.edu"
Expand Down Expand Up @@ -1191,7 +1191,7 @@
},
"tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/merge/gops_merge_1/1.0.0",
"tool_shed_repository": {
"changeset_revision": "0926c81f382c",
"changeset_revision": "381cd27bf67a",
"name": "merge",
"owner": "devteam",
"tool_shed": "toolshed.g2.bx.psu.edu"
Expand Down Expand Up @@ -1253,7 +1253,7 @@
},
"tool_id": "toolshed.g2.bx.psu.edu/repos/devteam/subtract/gops_subtract_1/1.0.0",
"tool_shed_repository": {
"changeset_revision": "7a2a604ae9c8",
"changeset_revision": "0145969324c4",
"name": "subtract",
"owner": "devteam",
"tool_shed": "toolshed.g2.bx.psu.edu"
Expand Down Expand Up @@ -1389,7 +1389,7 @@
},
"26": {
"annotation": "",
"content_id": "toolshed.g2.bx.psu.edu/repos/iuc/bcftools_consensus/bcftools_consensus/1.10",
"content_id": "toolshed.g2.bx.psu.edu/repos/iuc/bcftools_consensus/bcftools_consensus/1.10+galaxy1",
"errors": null,
"id": 26,
"input_connections": {
Expand Down Expand Up @@ -1434,15 +1434,15 @@
"output_name": "output_file"
}
},
"tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/bcftools_consensus/bcftools_consensus/1.10",
"tool_id": "toolshed.g2.bx.psu.edu/repos/iuc/bcftools_consensus/bcftools_consensus/1.10+galaxy1",
"tool_shed_repository": {
"changeset_revision": "e522022137f6",
"changeset_revision": "92182c270ce4",
"name": "bcftools_consensus",
"owner": "iuc",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"chain\": \"false\", \"input_file\": {\"__class__\": \"ConnectedValue\"}, \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"fasta_ref\": {\"__class__\": \"ConnectedValue\"}}, \"rename\": \"true\", \"sec_default\": {\"mask\": {\"__class__\": \"ConnectedValue\"}, \"iupac_codes\": \"false\", \"sample\": \"\", \"select_haplotype\": null}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "1.10",
"tool_state": "{\"chain\": \"false\", \"input_file\": {\"__class__\": \"ConnectedValue\"}, \"reference_source\": {\"reference_source_selector\": \"history\", \"__current_case__\": 1, \"fasta_ref\": {\"__class__\": \"ConnectedValue\"}}, \"rename\": \"true\", \"sec_default\": {\"mask\": {\"__class__\": \"ConnectedValue\"}, \"iupac_codes\": \"false\", \"sample\": \"\", \"select_haplotype\": null}, \"sec_restrict\": {\"include\": \"\", \"exclude\": \"\"}, \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "1.10+galaxy1",
"type": "tool",
"uuid": "698a9a8c-f6fe-430b-a7e0-cf9e1d058d02",
"workflow_outputs": [
Expand All @@ -1455,7 +1455,7 @@
},
"27": {
"annotation": "",
"content_id": "toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.2",
"content_id": "toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0",
"errors": null,
"id": 27,
"input_connections": {
Expand Down Expand Up @@ -1492,15 +1492,15 @@
"output_name": "output"
}
},
"tool_id": "toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/4.2",
"tool_id": "toolshed.g2.bx.psu.edu/repos/nml/collapse_collections/collapse_dataset/5.1.0",
"tool_shed_repository": {
"changeset_revision": "830961c48e42",
"changeset_revision": "90981f86000f",
"name": "collapse_collections",
"owner": "nml",
"tool_shed": "toolshed.g2.bx.psu.edu"
},
"tool_state": "{\"filename\": {\"add_name\": \"false\", \"__current_case__\": 1}, \"input_list\": {\"__class__\": \"ConnectedValue\"}, \"one_header\": \"false\", \"__page__\": null, \"__rerun_remap_job_id__\": null}",
"tool_version": "4.2",
"tool_version": "5.1.0",
"type": "tool",
"uuid": "acae0f3e-448b-483e-a44a-3e09ce9b3e77",
"workflow_outputs": [
Expand All @@ -1516,6 +1516,5 @@
"COVID-19",
"covid19.galaxyproject.org"
],
"uuid": "06dc40a7-99f2-4b3c-ae21-b3dcb239306f",
"version": 0
"uuid": "06dc40a7-99f2-4b3c-ae21-b3dcb239306f"
}

1 comment on commit 6f56432

@wm75
Copy link
Contributor Author

@wm75 wm75 commented on 6f56432 Mar 15, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CSB5/lofreq#126 makes it into a new release, we can return to using lofreq's original AF again.

Please sign in to comment.