-
Notifications
You must be signed in to change notification settings - Fork 1
Configuration
In the following the parameters in the config.yml
are discussed. In principle, the config.yml
consists of different blocks. ScanNeo2 always utilizes the config.yml
that is located in config/config.yml
. However, the option --configfile
allows specifying custom config files. It is to be noted that merely overwrites the config/config.yml
and should therefore include all parameters to prevent using settings from multiple config files.
On the top level, the parameters are applied system-wide when applicable. This includes the number of cores (threads
), the mapping quality (mapq
), and the average Phred scores (basequal
).
threads: 30
mapq: 30
basequal: 20
The data
block contains the sequencing reads specified as the indented blocks name
, dnaseq
, and rnaseq
.
data:
name: <name/of/sample>
dnaseq:
<group1>: <path/to/dnaseq/reads1> [path/to/dnaseq/reads2]
<group2>: <path/to/dnaseq/reads1> [path/to/dnaseq/reads2]
rnaseq:
<group1>: <path/to/rnaseq/reads1> [path/to/rnaseq/reads2]
normal: <group2>
custom:
variants:
hlatyping:
MHC-I:
MHC-II:
readgroups:
The name
key-value pair contains the name of the sample. This is also the name of the folder in which the analysis results are stored (e.g., results/<name/of/sample>
). The blocks dnaseq
and rnaseq
specify the paths to the sequencing reads. In the <group1>:<path/to/dnaseq/data>
key-value pair, the path to the DNA-seq data is defined. This can be either in .bam
or .fastq
. In the case of paired-end reads, forward and reverse read need to be separated by space. Similarly, rnaseq: <path/to/rnaseq/data>
defines the RNA-seq data. Scanneo2
allows to specify multiple samples, using the same identation within the dnaseq
or rnaseq
blocks (e.g., ). These can correspond to readgroups or conditions. However, these need to be unique. In addition, normal
allows to specify normal samples but is not used currently. Multiple normal
samples can be separated by spaces. Scanneo2
operates on both RNA-seq and DNA-seq, but in principle also works with either DNA-seq or RNA-seq data. However, providing only DNA-seq data is restricted to detecting indels and SNVs.
In addition, the custom
block allows the (optional) specification of user-defined data. In variants
predefined variants in VCF format can be provided. The hlatyping
allows to
Note: The custom
block allows to specify additional information for the analysis. In other words, ScanNeo2 utilizes these files to augment the actual analysis, unless other options are deactivated (e.g., hlatyping).
preproc:
activate: true
minlen: 10
slidingwindow:
activate: true
wsize: 3
ScanNeo2 provides an optional pre-processing procedure that is only applied to raw sequencing data. Here, activate: true
enables the pre-processing, that can be combined with a window trimming from the 3'-end with a defined window size (wsize
). Other parameters include the minimum length of the sequencing reads (minlen
). Note: the globally defined base quality is also applied here.
align:
minovlps: 10
chimsegmin: 20
chimoverhang: 10
chimmax: 50
chimmaxdrop: 30
altsplicing:
activate: true
confidence: 3
iterations: 5
edgelimit: 250
exitronsplicing:
activate: true
ao: 3
pso: 0.05
genefusion:
activate: true
maxevalue: 0.3
suppreads: 2
maxsuppreads: 1000
maxidentity: 0.3
hpolymerlen: 6
readthroughdist: 10000
minanchorlen: 20
splicedevents: 4
maxkmer: 0.6
fraglen: 200
maxmismatch: 0.01
indel:
activate: true
mode: BOTH
strategy: OPTIMAL_F_SCORE # OPTIMAL_F_SCORE, FALSE_DISCOVERY_RATE, CONSTANT
fscorebeta: 1.0
fdr: 0.05
sliplen: 8
sliprate: 0.1
hlatyping:
mode: RNA # DNA, RNA or BOTH