Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How can I fix config file when I only have DNA data? #42

Open
Donbbit opened this issue Feb 13, 2025 · 2 comments
Open

How can I fix config file when I only have DNA data? #42

Donbbit opened this issue Feb 13, 2025 · 2 comments

Comments

@Donbbit
Copy link

Donbbit commented Feb 13, 2025

Hi, I used Scanneo2 when I only have dna_normal and dna_tumor data, I changed the config like this:

Reference

General settings

reference:
release: 111
nonchr: false
threads: 30
mapq: 30 # overall required mapping quality
basequal: 20 # overall required base quality

data:
name: D1
dnaseq:
dna_normal: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/DT2411609481-1/250210_SEQ081_FP500002421_L01_SP2501130808/FP500002421_L01_375_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/DT2411609481-1/250210_SEQ081_FP500002421_L01_SP2501130808/FP500002421_L01_375_2.fq.gz
dna_tumor1: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320266/250210_SEQ082_FP500002422_L01_SP2501130799/FP500002422_L01_492_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320266/250210_SEQ082_FP500002422_L01_SP2501130799/FP500002422_L01_492_2.fq.gz
dna_tumor2: /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320262/250210_SEQ082_FP500002422_L01_SP2501130795/FP500002422_L01_488_1.fq.gz /hsfscqjf3/DIPSEQ/zfssz8/CNGB_DATA/BGISEQ01/DIPSEQ/DIPSEQT20/P24Z10200N0995_Temp/D2411320262/250210_SEQ082_FP500002422_L01_SP2501130795/FP500002422_L01_488_2.fq.gz
rnaseq:
rna_tumor:
normal: dna_normal

custom:
variants:
hlatyping:
MHC-I:
MHC-II:

pre-processing (only applied on fastq reads)

preproc:
activate: true # whether (=true) or not (=false) to include pre-processing
minlen: 10
slidingwindow:
activate: true
wsize: 3
wqual: 20

alingment

align:
chimSegmentMin: 20
chimScoreMin: 10
chimJunctionOverhangMin: 10
chimScoreDropMax: 30
chimScoreSeparation: 10

variant calling

alternative splicing

altsplicing:
activate: true # whether (=true) or not (=false) to include alternative splicing events
confidence: 3 # confidence level (1,2 or 3) - filtering of input alignments
iterations: 5 # number of iteratios (when adding intro edges) - increases sensitivity
edgelimit: 250 # limit max number of edges in graph - affects the runtime

exitron splicing

exitronsplicing:
activate: true # whether (=true) or not (=false) to include exitron-splicing events
ao: 3 # allele observation
pso: 0.05 # percent spliced out
#strand: 1 # strand specificity of library (0=unstranded, 1=forward, 2=reverse)
strand: XS # strand specificity of library (0=XS, 1=RF, 2=FR)

gene fusion

genefusion:
activate: true # whether (=true) or not (=false) to include gene fusion events
maxevalue: 0.3
suppreads: 2 # all fusions with less than suppreads are discarded
maxsuppreads: 1000
maxidentity: 0.3 # genes with fraction of identity are discarded (homologs)
hpolymerlen: 6 # removes breakpoints adjacent to homopolymers of length
readthroughdist: 10000 # distance between breakpoints with less than distance
minanchorlen: 20 # removes fusions whose segments are less than minchimlen
splicedevents: 4 # fusions between genes need at least this many spliced breakpoints
maxkmer: 0.6 # remove reads with repetitive 3-mer that make up more than maxkmer
fraglen: 200 # mean fragment length
maxmismatch: 0.01

indel

indel:
activate: true # whether (=true) or not (=false) to include indels
type: all # long, short, all
mode: DNA # DNA, RNA or BOTH -

strategy for optimizing posterior probability threshold

strategy: OPTIMAL_F_SCORE # OPTIMAL_F_SCORE, FALSE_DISCOVERY_RATE, CONSTANT
fscorebeta: 1.0 # rel. weight of recall to precision (when OPTIMAL_F_SCORE is selected)
fdr: 0.05 # false discovery rate (when FALSE_DISCOVERY_RATE is selected)
sliplen: 8 # min number of reference bases to suspect slippage event
sliprate: 0.1 # frequency of slippage when it is supsected

quantification:
mode: DNA # RNA, RNA or BOTH

hlatyping:
class: BOTH # I, II or BOTH

specific path for class II hlatyping (only required when class: II, or BOTH)

MHC-I_mode: DNA # DNA, RNA, or custom (if empty alleles have to be specified in custom)
MHC-II_mode: DNA # DNA, RNA, or custom (if empty alleles have to be specified in custom)

specific path for class II hlatyping (only required when class: II, or BOTH)

freqdata: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/freq_data/
split: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/HLA_gene.split.txt
dict: /hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/soft/hlahd.1.7.0/dictionary/

prioritization:
class: I # I, II or BOTH
lengths:
MHC-I: 8,9,10,11
MHC-II: 13,14,15

And I got the error :
Config file /hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/config.yaml is extended by additional config specified via the command line.
Traceback (most recent call last):
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/cli.py", line 1898, in args_to_api
dag_api = workflow_api.dag(
^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 326, in dag
return DAGApi(
^^^^^^^
File "", line 6, in init
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 436, in post_init
self.workflow_api._workflow.dag_settings = self.dag_settings
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/api.py", line 383, in _workflow
workflow.include(
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/workflow.py", line 1382, in include
exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/Snakefile", line 27, in
include: "rules/custom.smk"
File "/hsfscqjf1/ST_CQ/P23Z32300N0005/lvmeiqi/software/miniconda3/envs/scanneo2/lib/python3.12/site-packages/snakemake/workflow.py", line 1382, in include
exec(compile(code, snakefile.get_path_or_uri(), "exec"), self.globals)
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 126, in
config['data'] = data_structure(config['data'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 11, in data_structure
config['data']['rnaseq'], filetype, readtype = handle_seqfiles(config['data']['rnaseq'])
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/hsfscqjf1/ST_CQ/P24Z32300N0028/lvmeiqi/Project/1.ESCA/1.scanneo2/D1/workflow/rules/common.smk", line 64, in handle_seqfiles
return mod_seqdata, filetype[0], readtype[0]
^^^^^^^^^^^^^^
IndexError: list index out of range

What should I do when I don't have rna data?

@riasc
Copy link
Collaborator

riasc commented Feb 13, 2025

Hello,

thanks for your issue. So basically you have to leave rnaseq empty. Meaning:

rnaseq:
normal: dna_normal

Just remove rnatumor. Then this should work. Currently working on an update that exits this more gracefully.
Thanks

@Donbbit
Copy link
Author

Donbbit commented Feb 14, 2025

Hello, 你好,

thanks for your issue. So basically you have to leave rnaseq empty. Meaning:谢谢你的问题。所以基本上你要让rnaseq为空。意义:

rnaseq:
normal: dna_normal

Just remove rnatumor. Then this should work. Currently working on an update that exits this more gracefully. Thanks 谢谢只要移开。这样就可以了。目前正在进行更新,退出这个更优雅。

Thanks for your reply! And I changed the config like:

rnaseq:
normal: dna_normal

And when I test the pipeline I got this:
snakemake --dag | dot -Tpdf > pipe.pdf

Building DAG of jobs...
Error: : syntax error in line 1 near 'rnaseq'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants