Skip to content

Commit

Permalink
Friederike requests
Browse files Browse the repository at this point in the history
  • Loading branch information
AitorOP committed Aug 16, 2024
1 parent 3c243d8 commit 9950a69
Show file tree
Hide file tree
Showing 7 changed files with 74 additions and 20 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,9 +42,9 @@ It's listed on [Elixir - Tools and Data Services Registry](https://bio.tools/nf-
Depending on the options and samples provided, the pipeline can currently perform the following:

- Form consensus reads from UMI sequences (`fgbio`)
- Sequencing quality control and trimming (enabled by `--trim_fastq`) (`FastQC`, `fastp`,`bedtools`)
- Sequencing quality control and trimming (enabled by `--trim_fastq`) (`FastQC`, `fastp`)
- Map Reads to Reference (`BWA-mem`, `BWA-mem2`, `dragmap` or `Sentieon BWA-mem`)
- Process BAM file (`GATK MarkDuplicates`, `GATK BaseRecalibrator` and `GATK ApplyBQSR` or `Sentieon LocusCollector` and `Sentieon Dedup`)
- Process BAM file (`GATK MarkDuplicates`,`bedtools`, `GATK BaseRecalibrator` and `GATK ApplyBQSR` or `Sentieon LocusCollector` and `Sentieon Dedup`)
- Summarise alignment statistics (`samtools stats`, `mosdepth`)
- Variant calling (enabled by `--tools`, see [compatibility](https://nf-co.re/sarek/latest/docs/usage#which-variant-calling-tool-is-implemented-for-which-data-type)):
- `ASCAT`
Expand Down
1 change: 1 addition & 0 deletions conf/modules/lofreq.config
Original file line number Diff line number Diff line change
Expand Up @@ -23,6 +23,7 @@ process {
path: { "${params.outdir}/variant_calling/lofreq/${meta.id}/" },
pattern: "*{vcf.gz,vcf.gz.tbi}"
]
max_cpus: 4
}
}
}
35 changes: 18 additions & 17 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,12 +16,12 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [Trim adapters](#trim-adapters)
- [Split FastQ files](#split-fastq-files)
- [UMI consensus](#umi-consensus)
- [Bedtools](#bedtools)
- [Map to Reference](#map-to-reference)
- [BWA](#bwa)
- [BWA-mem2](#bwa-mem2)
- [DragMap](#dragmap)
- [Sentieon BWA mem](#sentieon-bwa-mem)
- [Bedtools](#bedtools)
- [Mark Duplicates](#mark-duplicates)
- [GATK MarkDuplicates (Spark)](#gatk-markduplicates-spark)
- [Sentieon LocusCollector and Dedup](#sentieon-locuscollector-and-dedup)
Expand Down Expand Up @@ -160,22 +160,6 @@ These files are intermediate and by default not placed in the output-folder kept

</details>

#### Bedtools

[Bedtools](https://github.com/arq5x/bedtools2) utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

<details markdown="1">
<summary>Output files for all samples</summary>

**Output directory: `{outdir}/reports/bedtools/`**

- `<sample>.bed`
- New .bed file with the news changes.
</details>

</details>

### Map to Reference

#### BWA
Expand Down Expand Up @@ -213,6 +197,23 @@ The alignment files (BAM or CRAM) produced by the chosen aligner are not publish
- BAM file and index
</details>

#### Bedtools

[Bedtools](https://github.com/arq5x/bedtools2) utilities are a swiss-army knife of tools for a wide-range of genomics analysis tasks. The most widely-used tools enable genome arithmetic. Bedtools allows one to intersect, merge, count, complement, and shuffle genomic intervals from multiple files in widely-used genomic file formats such as BAM, BED, GFF/GTF, VCF.
While each individual tool is designed to do a relatively simple task (e.g., intersect two interval files), quite sophisticated analyses can be conducted by combining multiple bedtools operations on the UNIX command line.

<details markdown="1">
<summary>Output files for all samples</summary>

**Output directory: `{outdir}/reports/bedtools/`**

- `<sample>.bed`
- When applying bedtools sort to a .bed file, the lines are reordered so that the genomic regions are in ascending order according to their position in the genome.
- When applying bedtools merge, overlapping or adjacent regions are combined into one, reducing redundancy and creating longer intervals that cover all the original regions.
</details>

</details>

### Mark Duplicates

During duplicate marking, read pairs that are likely to have originated from duplicates of the same original DNA fragments through some artificial processes are identified. These are considered to be non-independent observations, so all but a single read pair within each set of duplicates are marked, causing the marked pairs to be ignored by default during the variant discovery process.
Expand Down
2 changes: 1 addition & 1 deletion nextflow_schema.json
Original file line number Diff line number Diff line change
Expand Up @@ -112,7 +112,7 @@
"fa_icon": "fas fa-toolbox",
"description": "Tools to use for duplicate marking, variant calling and/or for annotation.",
"help_text": "Multiple tools separated with commas.\n\n**Variant Calling:**\n\nGermline variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: DeepVariant, FreeBayes, GATK HaplotypeCaller, mpileup, Sentieon Haplotyper, Strelka, Lofreq\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit\n\nTumor-only somatic variant calling can currently be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, mpileup, Mutect2, Strelka\n- Structural Variants: Manta, TIDDIT\n- Copy-number: CNVKit, ControlFREEC\n\nSomatic variant calling can currently only be performed with the following variant callers:\n- SNPs/Indels: FreeBayes, Mutect2, Strelka\n- Structural variants: Manta, TIDDIT\n- Copy-Number: ASCAT, CNVKit, Control-FREEC\n- Microsatellite Instability: MSIsensorpro\n\n> **NB** Mutect2 for somatic variant calling cannot be combined with `--no_intervals`\n\n**Annotation:**\n \n- snpEff, VEP, merge (both consecutively), and bcftools annotate (needs `--bcftools_annotation`).\n\n> **NB** As Sarek will use bgzip and tabix to compress and index VCF files annotated, it expects VCF files to be sorted when starting from `--step annotate`.",
"pattern": "^((ascat|bcfann|cnvkit|controlfreec|deepvariant|freebayes|haplotypecaller|sentieon_dnascope|sentieon_haplotyper|manta|merge|mpileup|msisensorpro|mutect2|lofreq|ngscheckmate|sentieon_dedup|snpeff|strelka|tiddit|vep)?,?)*(?<!,)$"
"pattern": "^((ascat|bcfann|cnvkit|controlfreec|deepvariant|freebayes|haplotypecaller|lofreq|sentieon_dnascope|sentieon_haplotyper|manta|merge|mpileup|msisensorpro|mutect2|ngscheckmate|sentieon_dedup|snpeff|strelka|tiddit|vep)?,?)*(?<!,)$"
},
"skip_tools": {
"type": "string",
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -33,7 +33,10 @@ workflow BAM_VARIANT_CALLING_TUMOR_ONLY_LOFREQ {

vcf = Channel.empty().mix(LOFREQ.out.vcf)
.map{ meta, vcf -> [ meta + [ variantcaller:'lofreq' ], vcf ] }

versions = versions.mix(LOFREQ.out.versions)
versions = versions.mix(SORT_INTERVALS.out.versions)
versions = versions.mix(MERGE_INTERVALS.out.versions)

emit:
vcf
Expand Down
12 changes: 12 additions & 0 deletions tests/config/pytesttags.yml
Original file line number Diff line number Diff line change
Expand Up @@ -536,6 +536,18 @@ strelka_bp:
- tests/csv/3.0/recalibrated_somatic.csv
- tests/test_strelka_bp.yml

## lofreq
lofreq:
- conf/modules/lofreq.config
- modules/nf-core/bedtools/sort/**
- modules/nf-core/bedtools/merge/**
- modules/nf-core/mosdepth/**
- modules/nf-core/lofreq/callparallel/**
- subworkflows/local/bam_variant_calling_tumor_only_lofreq/**
- subworkflows/local/bam_variant_calling_tumor_only_all/**
- tests/csv/3.0/recalibrated_tumoronly.csv
- tests/test_lofreq.yml

## tiddit
tiddit:
- conf/modules/tiddit.config
Expand Down
37 changes: 37 additions & 0 deletions tests/test_lofreq.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,37 @@
- name: Run variant calling on tumor only sample with lofreq
command: nextflow run main.nf -profile test,tools_tumoronly --tools lofreq --outdir results
tags:
- lofreq
- tumor_only
- variant_calling
files:
- path: results/csv/variantcalled.csv
md5sum: 5cce88d8a0961c51e15120c6cffc1de4
- path: results/csv/mapped.csv
md5sum: 85c4d7e1fed217509c3f5c9cbd93539f
- path: results/csv/recalibrated.csv
md5sum: 4251894dfed507f5b4a59b97cdea68cf
- path: results/multiqc
- path: results/sort
- path: results/merge
- path: results/reports/bcftools/lofreq/sample2/sample2.bcftools_stats.txt
md5sum: 795d766515702e277fecfe54cef17eb0
# conda changes md5sums for test
- path: results/reports/vcftools/lofreq/sample2/sample2.FILTER.summary
md5sum: be7ff84cf917483f02a6ae28edae999d
- path: results/reports/vcftools/lofreq/sample2/sample2.TsTv.qual
md5sum: e5d29ea7ac3d1ddfe77ae4574615c366
# conda changes md5sums for test
- path: results/reports/samtools/sample2/sample2.recal.cram.stats
md5sum: 345e7084e5dda88fe368894d19ee50de
- path: results/reports/samtools/sample2/sample2.sorted.cram.stats
md5sum: 6e3505dc1d2ea5db94232fdd8e33ae84
# conda changes md5sums for test
- path: results/variant_calling/lofreq/sample2/sample2.vcf.gz
md5sum: 15b7a969076d113d6fb18f00c9312a76
# binary changes md5sums on reruns
- path: results/reports/mosdepth/sample2/sample2.recal.mosdepth.global.dist.txt
- path: results/reports/mosdepth/sample2/sample2.recal.mosdepth.region.dist.txt
- path: results/reports/mosdepth/sample2/sample2.recal.mosdepth.summary.txt
- path: results/reports/mosdepth/sample2/sample2.recal.regions.bed.gz
- path: results/reports/mosdepth/sample2/sample2.recal.regions.bed.gz.csi

0 comments on commit 9950a69

Please sign in to comment.