Skip to content

dragen-transcriptome-pipeline/4.2.4__20240830041834

Compare
Choose a tag to compare

Overview

MD5Sum: d8bbfbee2f713b2ea768d5ea8a8285b9

Documentation

Documentation for dragen-transcriptome-pipeline v4.2.4

Dockstore

Dockstore Version Link

ICAv2

Tenant: umccr-prod

Bundles Generated

Bundle Name: dragen_transcriptome_pipeline_with_validation_data__4_2_4__20240830041834 / Bundle Version v9_r3__20240830041834

Description
This bundle has been generated by the release of workflows/dragen-transcriptome-pipeline/4.2.4/dragen-transcriptome-pipeline__4.2.4.cwl. The pipeline can be found at https://github.com/umccr/cwl-ica/releases/tag/dragen-transcriptome-pipeline/4.2.4__20240830041834.

Version Description
Bundle version description is currently redundant while we cannot append versions to bundles. Regardless - the bunch version is v9_r3

Bundle ID: dbf1f105-23a6-4b00-9b7a-a9412f50f274

  • Bundle Link
    Pipeline Project ID: 5844391a-69db-4b52-86b5-6a0d55c2386f
    Pipeline Project Name: pipelines
    Pipeline ID: 1e53ae07-08a6-458b-9fa3-9cf7430409a0
    Pipeline Code: dragen-transcriptome-pipeline__4_2_4__20240830041834

Projects

  • development
  • staging

Datasets

  • dragen_hash_table_v9_r3_alt_masked_cnv_hla_rna
  • hg38_fasta
  • arriba_2_4_0
  • hg38_v39_gencode_annotation
  • wts_validation_fastq__SBJ00480
  • wts_validation_fastq__SBJ00028
  • wts_validation_fastq__SBJ00061
  • wts_validation_fastq__SBJ00188
  • wts_validation_fastq__SBJ00199
  • wts_validation_fastq__SBJ00236
  • wts_validation_fastq__SBJ00238
  • wts_multiqc__2023_07_21__4_2_4__Ref_1_Good__SBJ01563
  • wts_multiqc__2023_07_21__4_2_4__Ref_2_Good__SBJ01147
  • wts_multiqc__2023_07_21__4_2_4__Ref_3_Good__SBJ01620
  • wts_multiqc__2023_07_21__4_2_4__Ref_4_Bad__SBJ01286
  • wts_multiqc__2023_07_21__4_2_4__Ref_5_Bad__SBJ01673

Bundle Name: dragen_transcriptome_pipeline_prod__4_2_4__20240830041834 / Bundle Version v9_r3__20240830041834

Description
This bundle has been generated by the release of workflows/dragen-transcriptome-pipeline/4.2.4/dragen-transcriptome-pipeline__4.2.4.cwl. The pipeline can be found at https://github.com/umccr/cwl-ica/releases/tag/dragen-transcriptome-pipeline/4.2.4__20240830041834.

Version Description
Bundle version description is currently redundant while we cannot append versions to bundles. Regardless - the bunch version is v9_r3

Bundle ID: e70dac7e-23c7-4e52-9cff-16a65640afcb

  • Bundle Link
    Pipeline Project ID: 5844391a-69db-4b52-86b5-6a0d55c2386f
    Pipeline Project Name: pipelines
    Pipeline ID: 1e53ae07-08a6-458b-9fa3-9cf7430409a0
    Pipeline Code: dragen-transcriptome-pipeline__4_2_4__20240830041834

Projects

  • production

Datasets

  • dragen_hash_table_v9_r3_alt_masked_cnv_hla_rna
  • hg38_fasta
  • arriba_2_4_0
  • hg38_v39_gencode_annotation
  • wts_multiqc__2023_07_21__4_2_4__Ref_1_Good__SBJ01563
  • wts_multiqc__2023_07_21__4_2_4__Ref_2_Good__SBJ01147
  • wts_multiqc__2023_07_21__4_2_4__Ref_3_Good__SBJ01620
  • wts_multiqc__2023_07_21__4_2_4__Ref_4_Bad__SBJ01286
  • wts_multiqc__2023_07_21__4_2_4__Ref_5_Bad__SBJ01673

Visual Overview

Click to expand!

dragen-transcriptome-pipeline

Inputs Template

Yaml

Click to expand!
# yaml-language-server: $schema=https://github.com/umccr/cwl-ica/releases/download/dragen-transcriptome-pipeline%2F4.2.4__20240830041834/dragen-transcriptome-pipeline__4.2.4__20240830041834.schema.json

# algorithm (Optional)
# Default value: proportional
# Docs: Counting algorithm:
# uniquely-mapped-reads(default) or proportional.
algorithm: "proportional"

# annotation file (Required)
# Docs: Path to annotation transcript file.
annotation_file:
  class: File
  location: icav2://project_id/path/to/file

# bam input (Optional)
# Docs: Input a BAM file for WTS analysis
bam_input:
  class: File
  location: icav2://project_id/path/to/file

# blacklist (Required)
# Docs: File with blacklist range
blacklist:
  class: File
  location: icav2://project_id/path/to/file

# cl config (Optional)
# Docs: command line config to supply additional config values on the command line.
cl_config: string

# contigs (Optional)
# Docs: Optional - List of interesting contigs
# If not specified, defaults to 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y
contigs: string

# cytobands (Required)
# Docs: Coordinates of the Giemsa staining bands.
cytobands:
  class: File
  location: icav2://project_id/path/to/file

# enable duplicate marking (Required)
# Docs: Mark identical alignments as duplicates
enable_duplicate_marking: false

# enable map align (Optional)
# Docs: Enabled by default.
# Set this value to false if using bam_input AND tumor_bam_input
enable_map_align: false

# enable map align output (Required)
# Docs: Do you wish to have the output bam files present
enable_map_align_output: false

# enable rna gene fusion (Optional)
# Docs: Optional - Enable the DRAGEN Gene Fusion module - defaults to true
enable_rna_gene_fusion: false

# enable rna quantification (Optional)
# Docs: Optional - Enable the quantification module - defaults to true
enable_rna_quantification: false

# enable sort (Optional)
# Docs: True by default, only set this to false if using --bam-input as input parameter
enable_sort: false

# fastq list (Optional)
# Docs: CSV file that contains a list of FASTQ files
# to process. read_1 and read_2 components in the CSV file must be presigned urls.
fastq_list:
  class: File
  location: icav2://project_id/path/to/file

# Row of fastq lists (Optional)
# Docs: The row of fastq lists.
# Each row has the following attributes:
#   * RGID
#   * RGLB
#   * RGSM
#   * Lane
#   * Read1File
#   * Read2File (optional)
fastq_list_rows:
- rgid: string
  rglb: string
  rgsm: string
  lane: string
  read_1:
    class: File
    location: icav2://project_id/path/to/file
  read_2:
    class: File
    location: icav2://project_id/path/to/file

# java mem (Optional)
# Default value: 20G
# Docs: Set desired Java heap memory size
java_mem: "20G"

# license instance id location (Optional)
# Docs: You may wish to place your own in.
# Optional value, default set to /opt/instance-identity
# which is a path inside the dragen container
lic_instance_id_location:
  class: File
  location: icav2://project_id/path/to/file

# output file prefix (Required)
# Docs: The prefix given to all output files
output_prefix: string

# protein domains (Required)
# Docs: GFF3 file containing the genomic coordinates of protein domains.
protein_domains:
  class: File
  location: icav2://project_id/path/to/file

# qc reference samples (Required)
# Docs: Reference samples for multiQC report
qc_reference_samples:
- class: Directory
  location: icav2://project_id/path/to/dir/

# read trimming (Optional)
# Docs: To enable trimming filters in hard-trimming mode, set to a comma-separated list of the trimmer tools 
# you would like to use. To disable trimming, set to none. During mapping, artifacts are removed from all reads.
# Read trimming is disabled by default.
read_trimmers: string

# reference Fasta (Required)
# Docs: FastA file with genome sequence
reference_fasta:
  class: File
  location: icav2://project_id/path/to/file

# reference tar (Required)
# Docs: Path to ref data tarball
reference_tar:
  class: File
  location: icav2://project_id/path/to/file

# soft read trimming (Optional)
# Docs: To enable trimming filters in soft-trimming mode, set to a comma-separated list of the trimmer tools 
# you would like to use. To disable soft trimming, set to none. During mapping, reads are aligned as if trimmed,
# and bases are not removed from the reads. Soft-trimming is enabled for the polyg filter by default.
soft_read_trimmers: string

# trim adapter r1 5prime (Optional)
# Docs: Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 1. 
# NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.
trim_adapter_r1_5prime:
  class: File
  location: icav2://project_id/path/to/file

# trim adapter read1 (Optional)
# Docs: Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 1.
trim_adapter_read1:
  class: File
  location: icav2://project_id/path/to/file

# trim adapter read2 (Optional)
# Docs: Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 2.
trim_adapter_read2:
  class: File
  location: icav2://project_id/path/to/file

# trim adapter stringency (Optional)
# Docs: Specify the minimum number of adapter bases required for trimming
trim_adapter_stringency: string

# trim adapter r2 5prime (Optional)
# Docs: Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 2.
# NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.
trim_dapter_r2_5prime:
  class: File
  location: icav2://project_id/path/to/file

# trim r1 3prime (Optional)
# Docs: Specify the minimum number of bases to trim from the 3' end of Read 1 (default: 0).
trim_r1_3prime: string

# trim r1 5prime (Optional)
# Docs: Specify the minimum number of bases to trim from the 5' end of Read 1 (default: 0).
trim_r1_5prime: string

# trim r2 3prime (Optional)
# Docs: Specify the minimum number of bases to trim from the 3' end of Read 2 (default: 0).
trim_r2_3prime: string

# trim r2 5prime (Optional)
# Docs: Specify the minimum number of bases to trim from the 5' end of Read 2 (default: 0).
trim_r2_5prime: string

Json

Click to expand!
{
    "algorithm": "proportional",
    "annotation_file": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "bam_input": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "blacklist": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "cl_config": "string",
    "contigs": "string",
    "cytobands": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "enable_duplicate_marking": false,
    "enable_map_align": false,
    "enable_map_align_output": false,
    "enable_rna_gene_fusion": false,
    "enable_rna_quantification": false,
    "enable_sort": false,
    "fastq_list": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "fastq_list_rows": [
        {
            "rgid": "string",
            "rglb": "string",
            "rgsm": "string",
            "lane": "string",
            "read_1": {
                "class": "File",
                "location": "icav2://project_id/path/to/file"
            },
            "read_2": {
                "class": "File",
                "location": "icav2://project_id/path/to/file"
            }
        }
    ],
    "java_mem": "20G",
    "lic_instance_id_location": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "output_prefix": "string",
    "protein_domains": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "qc_reference_samples": [
        {
            "class": "Directory",
            "location": "icav2://project_id/path/to/dir/"
        }
    ],
    "read_trimmers": "string",
    "reference_fasta": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "reference_tar": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "soft_read_trimmers": "string",
    "trim_adapter_r1_5prime": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "trim_adapter_read1": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "trim_adapter_read2": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "trim_adapter_stringency": "string",
    "trim_dapter_r2_5prime": {
        "class": "File",
        "location": "icav2://project_id/path/to/file"
    },
    "trim_r1_3prime": "string",
    "trim_r1_5prime": "string",
    "trim_r2_3prime": "string",
    "trim_r2_5prime": "string"
}

Outputs Template

Click to expand!
{
    "arriba_output_directory": {
        "class": "Directory",
        "location": "icav2://project_id/path/to/dir/"
    },
    "dragen_transcriptome_output_directory": {
        "class": "Directory",
        "location": "icav2://project_id/path/to/dir/"
    },
    "multiqc_output_directory": {
        "class": "Directory",
        "location": "icav2://project_id/path/to/dir/"
    },
    "qualimap_output_directory": {
        "class": "Directory",
        "location": "icav2://project_id/path/to/dir/"
    }
}

Overrides Template

Zipped workflow

Click to expand!
[
    "workflow.cwl#dragen-transcriptome-pipeline--4.2.4/arriba_drawing_step",
    "workflow.cwl#dragen-transcriptome-pipeline--4.2.4/arriba_fusion_step",
    "workflow.cwl#dragen-transcriptome-pipeline--4.2.4/create_arriba_output_directory",
    "workflow.cwl#dragen-transcriptome-pipeline--4.2.4/create_dummy_file_step",
    "workflow.cwl#dragen-transcriptome-pipeline--4.2.4/dragen_qc_step",
    "workflow.cwl#dragen-transcriptome-pipeline--4.2.4/run_dragen_transcriptome_step",
    "workflow.cwl#dragen-transcriptome-pipeline--4.2.4/run_qualimap_step"
]

Packed workflow

Click to expand!
[
    "#main/arriba_drawing_step",
    "#main/arriba_fusion_step",
    "#main/create_arriba_output_directory",
    "#main/create_dummy_file_step",
    "#main/dragen_qc_step",
    "#main/run_dragen_transcriptome_step",
    "#main/run_qualimap_step"
]

Inputs

Click to expand!

algorithm

ID: algorithm

Optional: True
Type: string
Docs:
Counting algorithm:
uniquely-mapped-reads(default) or proportional.

annotation file

ID: annotation_file

Optional: False
Type: File
Docs:
Path to annotation transcript file.

bam input

ID: bam_input

Optional: True
Type: File
Docs:
Input a BAM file for WTS analysis

blacklist

ID: blacklist

Optional: False
Type: File
Docs:
File with blacklist range

cl config

ID: cl_config

Optional: True
Type: string
Docs:
command line config to supply additional config values on the command line.

contigs

ID: contigs

Optional: True
Type: string
Docs:
Optional - List of interesting contigs
If not specified, defaults to 1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,X,Y

cytobands

ID: cytobands

Optional: False
Type: File
Docs:
Coordinates of the Giemsa staining bands.

enable duplicate marking

ID: enable_duplicate_marking

Optional: False
Type: boolean
Docs:
Mark identical alignments as duplicates

enable map align

ID: enable_map_align

Optional: True
Type: boolean
Docs:
Enabled by default.
Set this value to false if using bam_input AND tumor_bam_input

enable map align output

ID: enable_map_align_output

Optional: False
Type: boolean
Docs:
Do you wish to have the output bam files present

enable rna gene fusion

ID: enable_rna_gene_fusion

Optional: True
Type: boolean
Docs:
Optional - Enable the DRAGEN Gene Fusion module - defaults to true

enable rna quantification

ID: enable_rna_quantification

Optional: True
Type: boolean
Docs:
Optional - Enable the quantification module - defaults to true

enable sort

ID: enable_sort

Optional: True
Type: boolean
Docs:
True by default, only set this to false if using --bam-input as input parameter

fastq list

ID: fastq_list

Optional: True
Type: File
Docs:
CSV file that contains a list of FASTQ files
to process. read_1 and read_2 components in the CSV file must be presigned urls.

Row of fastq lists

ID: fastq_list_rows

Optional: True
Type: fastq-list-row[]
Docs:
The row of fastq lists.
Each row has the following attributes:

  • RGID
  • RGLB
  • RGSM
  • Lane
  • Read1File
  • Read2File (optional)

java mem

ID: java_mem

Optional: False
Type: string
Docs:
Set desired Java heap memory size

license instance id location

ID: lic_instance_id_location

Optional: True
Type: ['File', 'string']
Docs:
You may wish to place your own in.
Optional value, default set to /opt/instance-identity
which is a path inside the dragen container

output file prefix

ID: output_prefix

Optional: False
Type: string
Docs:
The prefix given to all output files

protein domains

ID: protein_domains

Optional: False
Type: File
Docs:
GFF3 file containing the genomic coordinates of protein domains.

qc reference samples

ID: qc_reference_samples

Optional: False
Type: .[]
Docs:
Reference samples for multiQC report

read trimming

ID: read_trimmers

Optional: True
Type: string
Docs:
To enable trimming filters in hard-trimming mode, set to a comma-separated list of the trimmer tools
you would like to use. To disable trimming, set to none. During mapping, artifacts are removed from all reads.
Read trimming is disabled by default.

reference Fasta

ID: reference_fasta

Optional: False
Type: File
Docs:
FastA file with genome sequence

reference tar

ID: reference_tar

Optional: False
Type: File
Docs:
Path to ref data tarball

soft read trimming

ID: soft_read_trimmers

Optional: True
Type: string
Docs:
To enable trimming filters in soft-trimming mode, set to a comma-separated list of the trimmer tools
you would like to use. To disable soft trimming, set to none. During mapping, reads are aligned as if trimmed,
and bases are not removed from the reads. Soft-trimming is enabled for the polyg filter by default.

trim adapter r1 5prime

ID: trim_adapter_r1_5prime

Optional: True
Type: File
Docs:
Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 1.
NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.

trim adapter read1

ID: trim_adapter_read1

Optional: True
Type: File
Docs:
Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 1.

trim adapter read2

ID: trim_adapter_read2

Optional: True
Type: File
Docs:
Specify the FASTA file that contains adapter sequences to trim from the 3' end of Read 2.

trim adapter stringency

ID: trim_adapter_stringency

Optional: True
Type: int
Docs:
Specify the minimum number of adapter bases required for trimming

trim adapter r2 5prime

ID: trim_dapter_r2_5prime

Optional: True
Type: File
Docs:
Specify the FASTA file that contains adapter sequences to trim from the 5' end of Read 2.
NB: the sequences should be in reverse order (with respect to their appearance in the FASTQ) but not complemented.

trim r1 3prime

ID: trim_r1_3prime

Optional: True
Type: int
Docs:
Specify the minimum number of bases to trim from the 3' end of Read 1 (default: 0).

trim r1 5prime

ID: trim_r1_5prime

Optional: True
Type: int
Docs:
Specify the minimum number of bases to trim from the 5' end of Read 1 (default: 0).

trim r2 3prime

ID: trim_r2_3prime

Optional: True
Type: int
Docs:
Specify the minimum number of bases to trim from the 3' end of Read 2 (default: 0).

trim r2 5prime

ID: trim_r2_5prime

Optional: True
Type: int
Docs:
Specify the minimum number of bases to trim from the 5' end of Read 2 (default: 0).

Steps

Click to expand!

arriba drawing step

ID: dragen-transcriptome-pipeline--4.2.4/arriba_drawing_step

Step Type: tool
Docs:

Run Arriba drawing script for fusions predicted by previous step.

arriba fusion step

ID: dragen-transcriptome-pipeline--4.2.4/arriba_fusion_step

Step Type: tool
Docs:

Runs Arriba fusion calling on the bam file produced by Dragen.

create arriba output directory

ID: dragen-transcriptome-pipeline--4.2.4/create_arriba_output_directory

Step Type: tool
Docs:

Create an output directory to contain the arriba files

Create dummy file

ID: dragen-transcriptome-pipeline--4.2.4/create_dummy_file_step

Step Type: tool
Docs:

Intermediate step for letting multiqc-interop be placed in stream mode

dragen qc step

ID: dragen-transcriptome-pipeline--4.2.4/dragen_qc_step

Step Type: tool
Docs:

The dragen qc step - this takes in an array of dirs

run dragen transcriptome step

ID: dragen-transcriptome-pipeline--4.2.4/run_dragen_transcriptome_step

Step Type: tool
Docs:

Runs the dragen transcriptome workflow on the FPGA.
Takes in a fastq list and corresponding mount paths from the predefined_mount_paths.
All other options avaiable at the top of the workflow

run qualimap step

ID: dragen-transcriptome-pipeline--4.2.4/run_qualimap_step

Step Type: tool
Docs:

Run qualimap step to generate additional QC metrics

Outputs

Click to expand!

arriba output directory

ID: dragen-transcriptome-pipeline--4.2.4/arriba_output_directory

Optional: False
Output Type: Directory
Docs:
The directory containing output files from arriba

dragen transcriptome output directory

ID: dragen-transcriptome-pipeline--4.2.4/dragen_transcriptome_output_directory

Optional: False
Output Type: Directory
Docs:
The output directory containing all transcriptome output files

multiqc output directory

ID: dragen-transcriptome-pipeline--4.2.4/multiqc_output_directory

Optional: False
Output Type: Directory
Docs:
The output directory for multiqc

dragen transcriptome output directory

ID: dragen-transcriptome-pipeline--4.2.4/qualimap_output_directory

Optional: False
Output Type: Directory
Docs:
The output directory containing all transcriptome output files