Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align parabricks subworkflow #6876

Open
wants to merge 69 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 57 commits
Commits
Show all changes
69 commits
Select commit Hold shift + click to select a range
df512a6
add parabricks
famosab Oct 28, 2024
296188a
remove config tag
famosab Oct 28, 2024
661bdd4
fix typo
famosab Oct 28, 2024
b7abc0a
fix typo
famosab Oct 28, 2024
83bd443
fix typo
famosab Oct 28, 2024
dc00c7b
update paths
famosab Oct 28, 2024
ec71494
update paths
famosab Oct 28, 2024
202d305
remove ch
famosab Oct 28, 2024
68d2e46
change gpu access
famosab Oct 29, 2024
b3e43e8
change fasta
famosab Oct 29, 2024
99cd5c1
update container
famosab Oct 29, 2024
d34c6df
low memory
famosab Oct 29, 2024
41c206c
indey
famosab Oct 29, 2024
72c6ea5
index bwamem
famosab Oct 29, 2024
55901ff
index bwamem
famosab Oct 29, 2024
b662fa5
index bwa
famosab Oct 29, 2024
2e6a27d
add index file
famosab Oct 29, 2024
2e8dfa9
add index file
famosab Oct 29, 2024
6e7b6f5
add index file
famosab Oct 29, 2024
7eced10
add index file
famosab Oct 29, 2024
5cbee34
stage in
famosab Oct 29, 2024
7367d15
stage in
famosab Oct 29, 2024
e676ee9
workdir
famosab Oct 29, 2024
280feec
revert workdir
famosab Oct 29, 2024
472a3a9
revert workdir
famosab Oct 29, 2024
ad8cd22
add bwa index
famosab Oct 29, 2024
5e6202a
add bwa index link
famosab Oct 29, 2024
e0227b6
add bwa index link
famosab Oct 29, 2024
3ce2d86
add bwa index link
famosab Oct 29, 2024
87daaf9
rm stage
famosab Oct 29, 2024
cd9faa6
please work now
famosab Oct 29, 2024
332eea6
remove fq2bam from this PR
famosab Oct 30, 2024
7dceaa3
Merge branch 'master' into parabricks-sbwf
famosab Oct 30, 2024
f5c8cc4
update tests
famosab Oct 30, 2024
a27baac
change inputs in test and to fq2bam
famosab Oct 30, 2024
f9088af
add low memory
famosab Oct 30, 2024
fcd7bd8
adjust applybqsr input
famosab Oct 30, 2024
38bbe78
adjust io to be consistent
famosab Oct 30, 2024
52be7aa
Merge branch 'master' into parabricks-sbwf
famosab Oct 30, 2024
c179f67
Merge branch 'master' into parabricks-sbwf
famosab Nov 15, 2024
0450b3c
Merge branch 'master' into parabricks-sbwf
famosab Nov 18, 2024
84ff84f
wip
famosab Nov 18, 2024
a39b3db
Merge branch 'parabricks-sbwf' of github.com:famosab/modules into par…
famosab Nov 18, 2024
1084460
try applybqsr
famosab Nov 18, 2024
87576ae
Merge branch 'master' into parabricks-sbwf
famosab Nov 18, 2024
cad4876
minor updates
sateeshperi Nov 18, 2024
7bb0222
update snap
famosab Nov 18, 2024
9748d56
update snap
famosab Nov 18, 2024
eb13562
update snap - problem is the naming in applybqsr
famosab Nov 19, 2024
2a4c49f
add tag gpu
famosab Dec 2, 2024
64963ad
Merge branch 'master' into parabricks-sbwf
famosab Dec 2, 2024
8ce25d8
Merge branch 'master' into parabricks-sbwf
famosab Dec 16, 2024
82a0754
update meta
famosab Dec 16, 2024
d760937
Merge branch 'parabricks-sbwf' of github.com:famosab/modules into par…
famosab Dec 16, 2024
cd5ec00
update config
famosab Dec 16, 2024
83df033
Merge branch 'master' into parabricks-sbwf
famosab Dec 16, 2024
f4ca194
Merge branch 'master' into parabricks-sbwf
famosab Dec 17, 2024
5c967c1
Apply suggestions from code review
famosab Dec 18, 2024
c357cc9
Merge branch 'master' into parabricks-sbwf
famosab Dec 18, 2024
27309d1
Merge branch 'master' into parabricks-sbwf
famosab Dec 18, 2024
91712d7
Merge branch 'master' into parabricks-sbwf
famosab Dec 18, 2024
788040e
Merge branch 'master' into parabricks-sbwf
famosab Dec 20, 2024
7ce98df
Merge branch 'master' into parabricks-sbwf
famosab Jan 7, 2025
cd4314d
Merge branch 'master' into parabricks-sbwf
famosab Jan 9, 2025
ce2b452
Merge branch 'master' into parabricks-sbwf
famosab Jan 13, 2025
7544071
Merge branch 'master' into parabricks-sbwf
famosab Jan 21, 2025
9ce6067
Merge branch 'master' into parabricks-sbwf
sateeshperi Jan 24, 2025
2a7d739
Merge branch 'master' into parabricks-sbwf
famosab Jan 29, 2025
656fca3
Merge branch 'nf-core:master' into parabricks-sbwf
famosab Jan 31, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
54 changes: 54 additions & 0 deletions subworkflows/nf-core/fastq_align_parabricks/main.nf
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
//
// Alignment and BQSR with Nvidia CLARA Parabricks
//
include { PARABRICKS_FQ2BAM } from '../../../modules/nf-core/parabricks/fq2bam/main'
include { PARABRICKS_APPLYBQSR } from '../../../modules/nf-core/parabricks/applybqsr/main'

workflow FASTQ_ALIGN_PARABRICKS {

take:
ch_reads // channel: [mandatory] meta, reads
ch_fasta // channel: [mandatory] meta, fasta
ch_index // channel: [mandatory] meta, index
ch_interval_file // channel: [optional for parabricks] meta, intervals_bed_combined
ch_known_sites // channel [optional for parabricks] known_sites_indels
famosab marked this conversation as resolved.
Show resolved Hide resolved

main:
ch_versions = Channel.empty()
ch_bam = Channel.empty()
ch_bai = Channel.empty()
ch_bqsr_table = Channel.empty()
ch_qc_metrics = Channel.empty()
ch_duplicate_metrics = Channel.empty()

PARABRICKS_FQ2BAM(
ch_reads,
ch_fasta,
ch_index,
ch_interval_file,
ch_known_sites
)

// Collecting FQ2BAM outputs
ch_bam = PARABRICKS_FQ2BAM.out.bam
ch_bai = PARABRICKS_FQ2BAM.out.bai
ch_qc_metrics = PARABRICKS_FQ2BAM.out.qc_metrics
ch_bqsr_table = PARABRICKS_FQ2BAM.out.bqsr_table
ch_duplicate_metrics = PARABRICKS_FQ2BAM.out.duplicate_metrics
ch_versions = ch_versions.mix(PARABRICKS_FQ2BAM.out.versions)

// Apply BQSR
PARABRICKS_APPLYBQSR(
ch_bam,
ch_bai,
ch_bqsr_table.ifEmpty([]),
ch_interval_file,
ch_fasta
)
ch_versions = ch_versions.mix(PARABRICKS_APPLYBQSR.out.versions)

emit:
bam = PARABRICKS_APPLYBQSR.out.bam // channel: [ [meta], bam ]
bai = PARABRICKS_APPLYBQSR.out.bai // channel: [ [meta], bai ]
versions = ch_versions // channel: [ versions.yml ]
}
63 changes: 63 additions & 0 deletions subworkflows/nf-core/fastq_align_parabricks/meta.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,63 @@
# yaml-language-server: $schema=https://raw.githubusercontent.com/nf-core/modules/master/subworkflows/yaml-schema.json
name: "fastq_align_parabricks"
description: Align a fastq file using GPU-based acceleration
keywords:
- fastq
- align
- parabricks
- gpu
- preprocessing
components:
- parabricks/fq2bam
- parabricks/applybqsr
input:
- ch_reads:
type: file
description: |
Channel containing reads (either one file for se or two files for pe)
Structure: [ val(meta), path(fastq1), path(fastq2) ]
famosab marked this conversation as resolved.
Show resolved Hide resolved
- ch_fasta:
type: file
description: |
Channel containing reference fasta file
Structure: [ val(meta), path(fasta) ]
- ch_index:
type: file
description: |
Channel containing reference BWA index
Structure: [ val(meta), path(.{amb,ann,bwt,pac,sa}) ]
- ch_interval_file:
type: file
description: |
(optional) file(s) containing genomic intervals for use in base
quality score recalibration (BQSR)
Structure: [ val(meta), path(.{bed,interval_list,picard,list,intervals}) ]
- ch_known_sites:
type: file
description: |
(optional) known sites file(s) for calculating BQSR. markdups must
be true to perform BQSR.
Structure [ path(vcf) ]
output:
- bam:
type: file
description: |
Channel containing BAM files
Structure: [ val(meta), path(bam) ]
pattern: "*.bam"
- bai:
type: file
description: |
Channel containing indexed BAM (BAI) files
Structure: [ val(meta), path(bai) ]
pattern: "*.bai"
- versions:
type: file
description: |
File containing software versions
Structure: [ path(versions.yml) ]
pattern: "versions.yml"
authors:
- "@famosab"
maintainers:
- "@famosab"
106 changes: 106 additions & 0 deletions subworkflows/nf-core/fastq_align_parabricks/tests/main.nf.test
Original file line number Diff line number Diff line change
@@ -0,0 +1,106 @@
nextflow_workflow {

name "Test Subworkflow FASTQ_ALIGN_PARABRICKS"
script "../main.nf"
workflow "FASTQ_ALIGN_PARABRICKS"
config "./nextflow.config"

tag "subworkflows"
tag "subworkflows_nfcore"
tag "subworkflows/fastq_align_parabricks"
tag "parabricks"
tag "parabricks/fq2bam"
tag "parabricks/applybqsr"
tag "bwa"
tag "bwa/index"
tag "gpu"

setup {
run("BWA_INDEX") {
script "../../../../modules/nf-core/bwa/index/main.nf"
process {
"""
input[0] = Channel.of([
[ id:'test' ], // meta map
file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
])
"""
}
}
}

test("sarscov2 single-end [fastq_gz]") {

when {
workflow {
"""
input[0] = Channel.of([
[ id:'test', single_end:true ],
[ file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true)]
])
input[1] = Channel.value([
[id: 'reference'],
file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
])
input[2] = BWA_INDEX.out.index
input[3] = Channel.value([
[id: 'intervals'],
file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/picard/baits.interval_list', checkIfExists: true)
])
input[4] = file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/vcf/test.vcf.gz', checkIfExists: true)
"""
}
}

then {
assertAll(
{ assert workflow.success},
{ assert snapshot(
workflow.out.bam.collect { meta, bamfile -> bam(bamfile).getReadsMD5() },
workflow.out.bai.collect { meta, bai -> file(bai).name },
workflow.out.versions
).match()
}
)
}
}

test("sarscov2 paired-end [fastq_gz]") {

when {
workflow {
"""
input[0] = Channel.of([
[ id:'test', single_end:false ],
[
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_1.fastq.gz', checkIfExists: true),
file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/fastq/test_2.fastq.gz', checkIfExists: true)
]
])
input[1] = Channel.value([
[id: 'reference'],
file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/genome.fasta', checkIfExists: true)
])
input[2] = BWA_INDEX.out.index
input[3] = Channel.value([
[id: 'intervals'],
file(params.modules_testdata_base_path + 'genomics/sarscov2/genome/picard/baits.interval_list', checkIfExists: true)
])
input[4] = file(params.modules_testdata_base_path + 'genomics/sarscov2/illumina/vcf/test.vcf.gz', checkIfExists: true)
"""
}
}

then {
assertAll(
{ assert workflow.success},
{ assert snapshot(
workflow.out.bam.collect { meta, bamfile -> bam(bamfile).getReadsMD5() },
workflow.out.bai.collect { meta, bai -> file(bai).name },
workflow.out.versions
).match()
}
)
}
}
}
Original file line number Diff line number Diff line change
@@ -0,0 +1,40 @@
{
"sarscov2 single-end [fastq_gz]": {
"content": [
[
"7e2bd786d964e42ddbc2ab0c9f340b09"
],
[
"test.bqsr.bam.bai"
],
[
"versions.yml:md5,4d671c4d60b6a0279cfca507525daa77",
"versions.yml:md5,df165e28f025dad39d826caead132115"
]
],
"meta": {
"nf-test": "0.9.2",
"nextflow": "24.10.0"
},
"timestamp": "2024-11-19T15:25:23.622710503"
},
"sarscov2 paired-end [fastq_gz]": {
"content": [
[
"73e8e89cda8fce1cf07bdebff0f793ec"
],
[
"test.bqsr.bam.bai"
],
[
"versions.yml:md5,4d671c4d60b6a0279cfca507525daa77",
"versions.yml:md5,df165e28f025dad39d826caead132115"
]
],
"meta": {
"nf-test": "0.9.2",
"nextflow": "24.10.0"
},
"timestamp": "2024-11-19T15:26:09.183487496"
}
}
10 changes: 10 additions & 0 deletions subworkflows/nf-core/fastq_align_parabricks/tests/nextflow.config
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
process {

withName: 'PARABRICKS_FQ2BAM' {
ext.args = '--low-memory'
}
// Ref: https://forums.developer.nvidia.com/t/problem-with-gpu/256825/6
// Parabricks’s fq2bam requires 24GB of memory.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we maybe do a

Suggested change
// Parabricks’s fq2bam requires 24GB of memory.
// Parabricks’s fq2bam requires 24GB of memory.
// resourceLimits = [cpus: 6GB, memory: 24.GB]
memory = '24.GB'

https://docs.nvidia.com/clara/parabricks/latest/documentation/tooldocs/man_fq2bam.html#man-fq2bam

Also it would be awesome to do some thing like --memory-limit ${task.memory} / 2 by default or make sure there's 16 cpus per GPU requested.

Just trying to push the resourceLimits syntax to the limits here 😆

// Using --low-memory for testing

}
Loading