Skip to content

Configuration

Richard A. Schäfer edited this page Feb 26, 2024 · 39 revisions

Config File

In the following the parameters in the config.yml are discussed. In principle, the config.yml consists of different blocks. ScanNeo2 always utilizes the config.yml that is located in config/config.yml. However, the option --configfile allows specifying custom config files. It is to be noted that merely overwrites the config/config.yml and should therefore include all parameters to prevent using settings from multiple config files.

GENERAL

On the top level, the parameters are applied system-wide when applicable. This includes the number of cores (threads), the mapping quality (mapq), and the average Phred scores (basequal).

threads: 30
mapq: 30  
basequal: 20

DATA

The data block contains the sequencing reads specified as the indented blocks name, dnaseq, and rnaseq.

data:
  name: <name/of/sample> 
  dnaseq:
    <group1>: <path/to/dnaseq/reads1> [path/to/dnaseq/reads2]
    <group2>: <path/to/dnaseq/reads1> [path/to/dnaseq/reads2]
  rnaseq:
    <group1>: <path/to/rnaseq/reads1> [path/to/rnaseq/reads2]
  normal: <group2>

  custom:
    variants:
    hlatyping:
      MHC-I:
      MHC-II:
    readgroups:

The name key-value pair contains the name of the sample. This is also the name of the folder in which the analysis results are stored (e.g., results/<name/of/sample>). The blocks dnaseq and rnaseq specify the paths to the sequencing reads. In the <group1>:<path/to/dnaseq/data> key-value pair, the path to the DNA-seq data is defined. This can be either in .bam or .fastq. In the case of paired-end reads, forward and reverse read need to be separated by space. Similarly, rnaseq: <path/to/rnaseq/data> defines the RNA-seq data. Scanneo2 allows to specify multiple samples, using the same identation within the dnaseq or rnaseq blocks (e.g., ). These can correspond to readgroups or conditions. However, these need to be unique. In addition, normal allows to specify normal samples but is not used currently. Multiple normal samples can be separated by spaces. Scanneo2 operates on both RNA-seq and DNA-seq, but in principle also works with either DNA-seq or RNA-seq data. However, providing only DNA-seq data is restricted to detecting indels and SNVs.

In addition, the custom block allows the (optional) specification of user-defined data. In variants predefined variants in VCF format can be provided. The hlatyping allows to

Note: The custom block allows to specify additional information for the analysis. In other words, ScanNeo2 utilizes these files to augment the actual analysis, unless other options are deactivated (e.g., hlatyping).

PRE-PROCESSING

preproc: 
  activate: true  
  minlen: 10
  slidingwindow:
    activate: true
    wsize: 3

ScanNeo2 provides an optional pre-processing procedure that is only applied to raw sequencing data. Here, activate: true enables the pre-processing, that can be combined with a window trimming from the 3'-end with a defined window size (wsize). Other parameters include the minimum length of the sequencing reads (minlen). Note: the globally defined base quality is also applied here.

ALIGNMENT

align:
  minovlps: 10
  chimsegmin: 20
  chimoverhang: 10
  chimmax: 50
  chimmaxdrop: 30

VARIANT CALLING

ALTERNATIVE SPLICING

altsplicing:
  activate: true 
  confidence: 3  
  iterations: 5 
  edgelimit: 250  

EXITRON SPLICING

exitronsplicing:
  activate: true 
  ao: 3  
  pso: 0.05  

GENE FUSION

genefusion:
  activate: true 
  maxevalue: 0.3
  suppreads: 2  
  maxsuppreads: 1000
  maxidentity: 0.3  
  hpolymerlen: 6  
  readthroughdist: 10000  
  minanchorlen: 20  
  splicedevents: 4  
  maxkmer: 0.6  
  fraglen: 200 
  maxmismatch: 0.01

INDELs/SNVs

indel:
  activate: true 
  mode: BOTH  
  strategy: OPTIMAL_F_SCORE # OPTIMAL_F_SCORE, FALSE_DISCOVERY_RATE, CONSTANT 
  fscorebeta: 1.0  
  fdr: 0.05  
  sliplen: 8  
  sliprate: 0.1  

HLA GENOTYPING

hlatyping:
  mode: RNA  # DNA, RNA or BOTH
Clone this wiki locally