Skip to content

4. How to run the pipeline

Verena Kutschera edited this page Feb 15, 2023 · 3 revisions

The pipeline has to be run using a terminal multiplexer like tmux or screen to be able to send the Snakemake process to the background (e.g. see this introduction to tmux).

If your cluster is using a workload manager such as slurm, the pipeline can send jobs automatically to the system, one job per rule. Please check the WIKI page with requirements under point 5 for information how to install and possibly adjust a Snakemake profile for cluster execution.

For more information on Snakemake incl. a tutorial check the official Snakemake documentation.

The following instructions assume that you first went through all the pipeline requirements before attempting to start a pipeline run.

How to run the pipeline with a Snakemake profile (here it is called slurm)

1) Activate the conda environment

(replace "generode" with the name you chose when creating the conda environment)

conda activate generode

2) Run the pipeline in dry mode to check each step

(rename the --profile parameter if you called your profile anything else than slurm):

snakemake --profile slurm -npr &> YYMMDD_dry_run.out

Check the log file (YYMMDD_dry_run.out) if everything works as it should.

3) Start the main run

(rename the --profile parameter if you called your profile anything else than slurm):

snakemake --profile slurm &> YYMMDD_main_run.out

Check the log file (YYMMDD_main_run.out) regularly while the pipeline is running.

How to run the pipeline without a Snakemake profile

The option --cluster-config is deprecated in favor of Snakemake profiles, but it is currently still available. The resources for each rule are provided in the cluster config file config/slurm/cluster.yaml. Start the pipeline the following:

1) Activate the conda environment

(replace "generode" with the name you chose when creating the conda environment)

conda activate generode

2) Run the pipeline in dry mode to check each step:

snakemake -j 100 --use-singularity --cluster-config config/slurm/cluster.yaml --cluster "sbatch -A {cluster.account} -p {cluster.partition} --ntasks {cluster.ntasks} --cpus-per-task {cluster.cpus-per-task} -t {cluster.time}" -npr &> YYMMDD_dry_run.out

Check the log file (YYMMDD_dry_run.out) if everything works as it should.

2) Start the main run:

snakemake -j 100 --use-singularity --cluster-config config/slurm/cluster.yaml --cluster "sbatch -A {cluster.account} -p {cluster.partition} --ntasks {cluster.ntasks} --cpus-per-task {cluster.cpus-per-task} -t {cluster.time}" &> YYMMDD_main_run.out

Check the log file (YYMMDD_main_run.out) regularly while the pipeline is running.

Note that the parameter -j is controlling how many jobs are allowed to be submitted to slurm and to be run at the same time. For highly parallelized pipeline steps like GERP, it is recommended to increase it to e.g. -j 500, if possible.

Please note that with the cluster configuration through --cluster-config, whenever a job is cancelled due to a time out, you will need to cancel the Snakemake process manually and re-start the pipeline with the flag --rerun-incomplete (see also the FAQ).