Merge pull request #508 from genomic-medicine-sweden/nanoq

Add nanoq to the pipeline
nf-core · Jul 26, 2024 · e1b367f · e1b367f
2 parents 656613e + 97c4c2b
commit e1b367f
Show file tree

Hide file tree

Showing 19 changed files with 682 additions and 39 deletions.
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 - [#417](https://github.com/nf-core/taxprofiler/pull/417) - Added reference-free metagenome estimation with Nonpareil (added by @jfy133)
 - [#466](https://github.com/nf-core/taxprofiler/pull/466) - Input database sheets now require a `db_type` column to distinguish between short- and long-read databases (added by @LilyAnderssonLee)
 - [#505](https://github.com/nf-core/taxprofiler/pull/505) - Add small files to the file `tower.yml` (added by @LilyAnderssonLee)
+- [#508](https://github.com/nf-core/taxprofiler/pull/508) - Add `nanoq` as a filtering tool for nanopore reads (added by @LilyAnderssonLee)
 
 ### `Fixed`
 

diff --git a/CITATIONS.md b/CITATIONS.md
@@ -42,6 +42,10 @@
 
   > Wick R (2021) Filtlong, URL: https://github.com/rrwick/Filtlong
 
+- [nanoq](https://github.com/esteinig/nanoq)
+
+  > Steinig, E., & Coin, L. (2022). Nanoq: ultra-fast quality control for nanopore reads. Journal of Open Source Software, 7(69). https://doi.org/10.21105/joss.02991
+
 - [BBTools](http://sourceforge.net/projects/bbmap/)
 
   > Bushnell B. (2022) BBMap, URL: http://sourceforge.net/projects/bbmap/

diff --git a/README.md b/README.md
@@ -30,7 +30,7 @@
 1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) or [`falco`](https://github.com/smithlabcode/falco) as an alternative option)
 2. Performs optional read pre-processing
    - Adapter clipping and merging (short-read: [fastp](https://github.com/OpenGene/fastp), [AdapterRemoval2](https://github.com/MikkelSchubert/adapterremoval); long-read: [porechop](https://github.com/rrwick/Porechop))
-   - Low complexity and quality filtering (short-read: [bbduk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/), [PRINSEQ++](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus); long-read: [Filtlong](https://github.com/rrwick/Filtlong))
+   - Low complexity and quality filtering (short-read: [bbduk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/), [PRINSEQ++](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus); long-read: [Filtlong](https://github.com/rrwick/Filtlong)), [Nanoq](https://github.com/esteinig/nanoq)
    - Host-read removal (short-read: [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/); long-read: [Minimap2](https://github.com/lh3/minimap2))
    - Run merging
 3. Supports statistics metagenome coverage estimation ([Nonpareil](https://nonpareil.readthedocs.io/en/latest/)) and for host-read removal ([Samtools](http://www.htslib.org/))

diff --git a/conf/modules.config b/conf/modules.config
@@ -294,6 +294,36 @@ process {
         ]
     }
 
+    withName: NANOQ {
+        ext.args = [
+            "-vv",
+            "--min-len ${params.longread_qc_qualityfilter_minlength}",
+            "--min-qual ${params.longread_qc_qualityfilter_minquality}"
+        ]
+        .join(' ').trim()
+        ext.prefix = { "${meta.id}_${meta.run_accession}_filtered" }
+        publishDir = [
+            [
+                path: { "${params.outdir}/nanoq" },
+                mode: params.publish_dir_mode,
+                pattern: '*.fastq.gz',
+                enabled: params.save_preprocessed_reads
+            ],
+            [
+                path: { "${params.outdir}/nanoq" },
+                mode: params.publish_dir_mode,
+                pattern: '*.stats'
+            ],
+            [
+                path: { "${params.outdir}/analysis_ready_fastqs" },
+                mode: params.publish_dir_mode,
+                pattern: '*.fastq.gz',
+                enabled: params.save_analysis_ready_fastqs,
+                saveAs: { ( params.perform_runmerging == false || ( params.perform_runmerging && !meta.is_multirun ) ) && !params.perform_longread_hostremoval && !params.longread_qc_skipqualityfilter && params.perform_longread_qc && params.save_analysis_ready_fastqs ? it : null }
+            ]
+        ]
+    }
+
     withName: BBMAP_BBDUK {
         ext.args =  [
                 "entropy=${params.shortread_complexityfilter_entropy}",

diff --git a/docs/output.md b/docs/output.md
@@ -20,6 +20,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
 - [BBDuk](#bbduk) - Quality trimming and filtering for Illumina data
 - [PRINSEQ++](#prinseq) - Quality trimming and filtering for Illunina data
 - [Filtlong](#filtlong) - Quality trimming and filtering for Nanopore data
+- [Nanoq] (#nanoq) - Quality trimming and filtering for Nanopore data
 - [Bowtie2](#bowtie2) - Host removal for Illumina reads
 - [minimap2](#minimap2) - Host removal for Nanopore reads
 - [SAMtools stats](#samtools-stats) - Statistics from host removal
@@ -238,6 +239,21 @@ You will only find the `.fastq` files in the results directory if you provide `
 We do _not_ recommend using Filtlong if you are performing filtering of low quality reads with ONT's basecaller Guppy.
 :::
 
+### Nanoq
+
+[nanoq](https://github.com/esteinig/nanoq) is an ultra-fast quality filtering tool that also provides summary reports for nanopore reads.
+
+<details markdown="1">
+<summary>Output files</summary>
+
+- `nanoq/`
+  - `<sample_id>_filtered.fastq.gz`: Quality or long read data filtered file
+  - `<sample_id>_filtered.stats`: Summary statistics report
+
+</details>
+
+You will only find the `.fastq` files in the results directory if you provide ` --save_preprocessed_reads`. Alternatively, if you wish only to have the 'final' reads that go into classification/profiling (i.e., that may have additional processing), do not specify this flag but rather specify `--save_analysis_ready_reads`, in which case the reads will be in the folder `analysis_ready_reads`.
+
 ### Bowtie2
 
 [Bowtie 2](https://bowtie-bio.sourceforge.net/bowtie2/index.shtml) is an ultrafast and memory-efficient tool for aligning sequencing reads to long reference sequences. It is particularly good at aligning reads of about 50 up to 100s or 1,000s of characters, and particularly good at aligning to relatively long (e.g. mammalian) genomes.

diff --git a/docs/usage.md b/docs/usage.md
@@ -299,7 +299,7 @@ Complexity filtering is primarily a run-time optimisation step. It is not necess
 
 There are currently three options for short-read complexity filtering: [`bbduk`](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/), [`prinseq++`](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus), and [`fastp`](https://github.com/OpenGene/fastp#low-complexity-filter).
 
-There is one option for long-read quality filtering: [`Filtlong`](https://github.com/rrwick/Filtlong)
+There are two options for long-read quality filtering: [`Filtlong`](https://github.com/rrwick/Filtlong) and [`nanoq`](https://github.com/esteinig/nanoq).
 
 The tools offer different algorithms and parameters for removing low complexity reads and quality filtering. We therefore recommend reviewing the pipeline's [parameter documentation](https://nf-co.re/taxprofiler/parameters) and the documentation of the tools (see links above) to decide on optimal methods and parameters for your dataset.
 

diff --git a/modules.json b/modules.json
@@ -195,6 +195,11 @@
                         "git_sha": "b7ebe95761cd389603f9cc0e0dc384c0f663815a",
                         "installed_by": ["modules"]
                     },
+                    "nanoq": {
+                        "branch": "master",
+                        "git_sha": "cf05b61191f5df35cbbf33d47bbf8f22ca0ae0ab",
+                        "installed_by": ["modules"]
+                    },
                     "nonpareil/curve": {
                         "branch": "master",
                         "git_sha": "729335dda8ba226323edc54dec80ae959079207e",

diff --git a/modules/nf-core/nanoq/environment.yml b/modules/nf-core/nanoq/environment.yml
diff --git a/modules/nf-core/nanoq/main.nf b/modules/nf-core/nanoq/main.nf
diff --git a/modules/nf-core/nanoq/meta.yml b/modules/nf-core/nanoq/meta.yml
diff --git a/modules/nf-core/nanoq/tests/main.nf.test b/modules/nf-core/nanoq/tests/main.nf.test