Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid SyntaxWarning: invalid escape sequence #3232

Open
erfanshekarriz opened this issue Dec 12, 2024 · 2 comments
Open

Invalid SyntaxWarning: invalid escape sequence #3232

erfanshekarriz opened this issue Dec 12, 2024 · 2 comments
Labels
bug Something isn't working

Comments

@erfanshekarriz
Copy link

erfanshekarriz commented Dec 12, 2024

Snakemake Version: 8.25.5

When running snakemake I get the following SyntaxWarning: invalid escape sequence warning:

~/workflow/rules/viral-identify.smk:273: SyntaxWarning: invalid escape sequence '\s'
  combined=relpath("identify/viral/output/combined.final.vOTUs.fa"),
~/workflow/rules/viral-identify.smk:299: SyntaxWarning: invalid escape sequence '\s'
~/workflow/rules/viral-identify.smk:606: SyntaxWarning: invalid escape sequence '\d'
~/workflow/rules/viral-identify.smk:635: SyntaxWarning: invalid escape sequence '\d'

In my .smk file's shell directive I'm using "\s" and "\d" regex characters, so I know this is where the warning comes from:

Rule 1

shell:
"""
...
seqkit grep {input.fna} -f {output.hits} | seqkit replace -p  "\s.*" -r "" | seqkit replace -p $ -r _{wildcards.sample_id}  > {params.tmpdir}/tmp.fa 2> {log}
...
"""

Rule 2

shell:
"""
...
seqkit replace {input.provirusfasta} --f-use-regexp -p "(.+)_\d\s.+$" -r '$1' | seqkit grep -f {input.provirushits} > {params.tmpdir}/tmp1.fa 2> {log}
seqkit replace {input.virusfasta} --f-use-regexp -p "(.+)_\d\s.+$" -r '$1' |  seqkit grep -f {input.provirushits} >> {params.tmpdir}/tmp1.fa 2> {log}
...
"""

But, when I run with the -p flag, it prints my commands fine and it also runs fine, so I don't want to change my code:

Rule 1

shell:
"""
...
seqkit grep sample/contigs/results/identify/viral/intermediate/scores/combined.viralcontigs.fa -f sample/contigs/results/identify/viral/output/derep/cluster_representatives.txt > sample/contigs/results/identify/viral/tmp/tmp.fa 2> sample/contigs/results/identify/viral/logs/clustering/filterderep.log
...
"""

Rule 2

shell:
"""
...
seqkit grep sample/contigs/results/identify/viral/output/checkv/viruses.fna -f sample/contigs/results/identify/viral/output/virus.list.txt > sample/contigs/results/identify/viral/tmp/tmp2.fa 2> sample/contigs/results/identify/viral/logs/vOTUs.log
seqkit grep sample/contigs/results/identify/viral/output/checkv/proviruses.fna -f sample/contigs/results/identify/viral/output/virus.list.txt >> sample/contigs/results/identify/viral/tmp/tmp2.fa 2> sample/contigs/results/identify/viral/logs/vOTUs.log
...
"""

I've double-checked and the shell command works as it should. It likely has to do with the way Snakemake parsers the shell command.

I was not experiencing this with the older version of Snakemake 7.28.1, just popped up as I updated it!

Is there a good way to bypass or suppress the warning?

Best,
Erfan

@erfanshekarriz erfanshekarriz added the bug Something isn't working label Dec 12, 2024
@rwanwork
Copy link

rwanwork commented Jan 8, 2025

@erfanshekarriz Just a comment... I just encounter this bug as well on version 7.32.4 (the current version for Ubuntu 24.10). I noticed the warning goes away when double backslashes are used. i.e., \\d+.

I'm not sure if it is a bug or maybe we need to just escape backslashes from now on.

@m-jahn
Copy link

m-jahn commented Jan 13, 2025

It's not a bug, at least this is a known phenomenon. As a workaround you can try another escape sequence as you found out. Or you can use a lambda function and pass the problematic string as a parameter like this:

rule umi_extraction:
    params:
        pattern=lambda pt: config["pattern"],
    input:
        fastq="{sample}.fastq.gz",
    output:
        fastq="{sample}_extracted.fastq.gz",
    shell:
        "umi_tools extract --bc-pattern='{params.pattern}'  --stdin {input.fastq} --stdout {output.fastq}"

the config file then can have arbitrary strings:

pattern: "^(?P<umi_0>.{2}).*(?P<umi_1>.{5})$"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants