How can I fix config file when I only have DNA data? #42

Donbbit · 2025-02-13T07:49:43Z

Hi, I used Scanneo2 when I only have dna_normal and dna_tumor data, I changed the config like this:

Reference

General settings

reference:
release: 111
nonchr: false
threads: 30
mapq: 30 # overall required mapping quality
basequal: 20 # overall required base quality

data:
name: D1
dnaseq:
dna_normal: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/DT2411609481-1/250210_SEQ081_FP500002421_L01_SP2501130808/FP500002421_L01_375_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/DT2411609481-1/250210_SEQ081_FP500002421_L01_SP2501130808/FP500002421_L01_375_2.fq.gz
dna_tumor1: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320266/250210_SEQ082_FP500002422_L01_SP2501130799/FP500002422_L01_492_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320266/250210_SEQ082_FP500002422_L01_SP2501130799/FP500002422_L01_492_2.fq.gz
dna_tumor2: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320262/250210_SEQ082_FP500002422_L01_SP2501130795/FP500002422_L01_488_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320262/250210_SEQ082_FP500002422_L01_SP2501130795/FP500002422_L01_488_2.fq.gz
rnaseq:
rna_tumor:
normal: dna_normal

custom:
variants:
hlatyping:
MHC-I:
MHC-II:

pre-processing (only applied on fastq reads)

preproc:
activate: true # whether (=true) or not (=false) to include pre-processing
minlen: 10
slidingwindow:
activate: true
wsize: 3
wqual: 20

alingment

align:
chimSegmentMin: 20
chimScoreMin: 10
chimJunctionOverhangMin: 10
chimScoreDropMax: 30
chimScoreSeparation: 10

variant calling

alternative splicing

altsplicing:
activate: true # whether (=true) or not (=false) to include alternative splicing events
confidence: 3 # confidence level (1,2 or 3) - filtering of input alignments
iterations: 5 # number of iteratios (when adding intro edges) - increases sensitivity
edgelimit: 250 # limit max number of edges in graph - affects the runtime

exitron splicing

exitronsplicing:
activate: true # whether (=true) or not (=false) to include exitron-splicing events
ao: 3 # allele observation
pso: 0.05 # percent spliced out
#strand: 1 # strand specificity of library (0=unstranded, 1=forward, 2=reverse)
strand: XS # strand specificity of library (0=XS, 1=RF, 2=FR)

gene fusion

genefusion:
activate: true # whether (=true) or not (=false) to include gene fusion events
maxevalue: 0.3
suppreads: 2 # all fusions with less than suppreads are discarded
maxsuppreads: 1000
maxidentity: 0.3 # genes with fraction of identity are discarded (homologs)
hpolymerlen: 6 # removes breakpoints adjacent to homopolymers of length
readthroughdist: 10000 # distance between breakpoints with less than distance
minanchorlen: 20 # removes fusions whose segments are less than minchimlen
splicedevents: 4 # fusions between genes need at least this many spliced breakpoints
maxkmer: 0.6 # remove reads with repetitive 3-mer that make up more than maxkmer
fraglen: 200 # mean fragment length
maxmismatch: 0.01

indel

indel:
activate: true # whether (=true) or not (=false) to include indels
type: all # long, short, all
mode: DNA # DNA, RNA or BOTH -

strategy for optimizing posterior probability threshold

strategy: OPTIMAL_F_SCORE # OPTIMAL_F_SCORE, FALSE_DISCOVERY_RATE, CONSTANT
fscorebeta: 1.0 # rel. weight of recall to precision (when OPTIMAL_F_SCORE is selected)
fdr: 0.05 # false discovery rate (when FALSE_DISCOVERY_RATE is selected)
sliplen: 8 # min number of reference bases to suspect slippage event
sliprate: 0.1 # frequency of slippage when it is supsected

quantification:
mode: DNA # RNA, RNA or BOTH

hlatyping:
class: BOTH # I, II or BOTH

specific path for class II hlatyping (only required when class: II, or BOTH)

MHC-I_mode: DNA # DNA, RNA, or custom (if empty alleles have to be specified in custom)
MHC-II_mode: DNA # DNA, RNA, or custom (if empty alleles have to be specified in custom)

specific path for class II hlatyping (only required when class: II, or BOTH)

freqdata: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/freq_data/
split: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/HLA_gene.split.txt
dict: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/dictionary/

prioritization:
class: I # I, II or BOTH
lengths:
MHC-I: 8,9,10,11
MHC-II: 13,14,15

And I got the error :
Config file /hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/config.yaml is extended by additional config specified via the command line.
Traceback (most recent call last):
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/cli.py", line 1898, in args_to_api
dag_api = workflow_api.dag(
^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 326, in dag
return DAGApi(
^^^^^^^
File "", line 6, in init
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 436, in post_init
self.workflow_api._workflow.dag_settings = self.dag_settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 383, in _workflow
workflow.include(
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/workflow.py", line 1382, in include
exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/Snakefile", line 27, in
include: "rules/custom.smk"
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/workflow.py", line 1382, in include
exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 126, in
config['data'] = data_structure(config['data'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 11, in data_structure
config['data']['rnaseq'], filetype, readtype = handle_seqfiles(config['data']['rnaseq'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 64, in handle_seqfiles
return mod_seqdata, filetype[0], readtype[0]
^^^^^^^^^^^^^^
IndexError: list index out of range

What should I do when I don't have rna data?

riasc · 2025-02-13T16:40:29Z

Hello,

thanks for your issue. So basically you have to leave rnaseq empty. Meaning:

rnaseq:
normal: dna_normal

Just remove rnatumor. Then this should work. Currently working on an update that exits this more gracefully.
Thanks

Donbbit · 2025-02-14T02:51:55Z

Hello, 你好,

thanks for your issue. So basically you have to leave rnaseq empty. Meaning:谢谢你的问题。所以基本上你要让rnaseq为空。意义:
rnaseq:
normal: dna_normal
Just remove rnatumor. Then this should work. Currently working on an update that exits this more gracefully. Thanks 谢谢只要移开。这样就可以了。目前正在进行更新，退出这个更优雅。

Thanks for your reply! And I changed the config like:

rnaseq:
normal: dna_normal

And when I test the pipeline I got this:
snakemake --dag | dot -Tpdf > pipe.pdf

Building DAG of jobs...
Error: : syntax error in line 1 near 'rnaseq'

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How can I fix config file when I only have DNA data? #42

How can I fix config file when I only have DNA data? #42

Donbbit commented Feb 13, 2025

riasc commented Feb 13, 2025

Donbbit commented Feb 14, 2025

How can I fix config file when I only have DNA data? #42

How can I fix config file when I only have DNA data? #42

Comments

Donbbit commented Feb 13, 2025

Reference

General settings

pre-processing (only applied on fastq reads)

alingment

variant calling

alternative splicing

exitron splicing

gene fusion

indel

strategy for optimizing posterior probability threshold

specific path for class II hlatyping (only required when class: II, or BOTH)

specific path for class II hlatyping (only required when class: II, or BOTH)

riasc commented Feb 13, 2025

Donbbit commented Feb 14, 2025