Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add porechop_abi to the pipeline #511

Merged
merged 10 commits into from
Aug 7, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#466](https://github.com/nf-core/taxprofiler/pull/466) - Input database sheets now require a `db_type` column to distinguish between short- and long-read databases (added by @LilyAnderssonLee)
- [#505](https://github.com/nf-core/taxprofiler/pull/505) - Add small files to the file `tower.yml` (added by @LilyAnderssonLee)
- [#508](https://github.com/nf-core/taxprofiler/pull/508) - Add `nanoq` as a filtering tool for nanopore reads (added by @LilyAnderssonLee)
- [#511](https://github.com/nf-core/taxprofiler/pull/511) - Add `porechop_abi` as an alternative adapter removal tool for long reads nanopore data (added by @LilyAnderssonLee)

### `Fixed`

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@

> Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial Genomics, 3(10), e000132. https://doi.org/10.1099/mgen.0.000132

- [Porechop_ABI](https://github.com/bonsai-team/Porechop_ABI)

> Bonenfant, Q., Noé, L., & Touzet, H. (2023). Porechop_ABI: discovering unknown adapters in Oxford Nanopore Technology sequencing reads for downstream trimming. Bioinformatics Advances, 3(1):vbac085. https://10.1093/bioadv/vbac085

- [Filtlong](https://github.com/rrwick/Filtlong)

> Wick R (2021) Filtlong, URL: https://github.com/rrwick/Filtlong
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) or [`falco`](https://github.com/smithlabcode/falco) as an alternative option)
2. Performs optional read pre-processing
- Adapter clipping and merging (short-read: [fastp](https://github.com/OpenGene/fastp), [AdapterRemoval2](https://github.com/MikkelSchubert/adapterremoval); long-read: [porechop](https://github.com/rrwick/Porechop))
- Adapter clipping and merging (short-read: [fastp](https://github.com/OpenGene/fastp), [AdapterRemoval2](https://github.com/MikkelSchubert/adapterremoval); long-read: [porechop](https://github.com/rrwick/Porechop), [Porechop_ABI](https://github.com/bonsai-team/Porechop_ABI))
- Low complexity and quality filtering (short-read: [bbduk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/), [PRINSEQ++](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus); long-read: [Filtlong](https://github.com/rrwick/Filtlong)), [Nanoq](https://github.com/esteinig/nanoq)
- Host-read removal (short-read: [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/); long-read: [Minimap2](https://github.com/lh3/minimap2))
- Run merging
Expand Down
34 changes: 34 additions & 0 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ report_section_order:
order: 500
porechop:
order: 400
porechop_abi:
order: 450
bbduk:
order: 300
prinseqplusplus:
Expand Down Expand Up @@ -106,7 +108,21 @@ top_modules:
- "*raw*"
extra: "If used in this run, Falco is a drop-in replacement for FastQC producing the same output, written by Guilherme de Sena Brandine and Andrew D. Smith."
- "porechop":
name: "Porechop"
anchor: "porechop"
target: "Porechop"
path_filters:
- "*porechop.log"
extra: "ℹ️: if you get the error message 'Error - was not able to plot data.' this means that porechop did not detect any adapters and therefore no statistics generated."
- "porechop":
name: "Porechop_ABI"
anchor: "porechop_abi"
target: "Porechop_ABI"
doi: "10.1093/bioadv/vbac085"
info: "find and remove adapters from Oxford Nanopore reads."
path_filters:
- "*porechop_abi.log"
extra: "ℹ️: if you get the error message 'Error - was not able to plot data.' this means that porechop_abi did not detect any adapters and therefore no statistics generated."
- "bowtie2":
name: "bowtie2"
- "samtools":
Expand Down Expand Up @@ -177,6 +193,14 @@ table_columns_placement:
End Trimmed Percent: 440
Middle Split: 450
Middle Split Percent: 460
Porechop_ABI:
Input Reads: 400
Start Trimmed: 410
Start Trimmed Percent: 420
End Trimmed: 430
End Trimmed Percent: 440
Middle Split: 450
Middle Split Percent: 460
Filtlong:
Target bases: 500
BBDuk:
Expand Down Expand Up @@ -250,6 +274,14 @@ table_columns_visible:
End Trimmed Percent: True
Middle Split: False
Middle Split Percent: True
porechop_abi:
Input reads: False
Start Trimmed:
Start Trimmed Percent: True
End Trimmed: False
End Trimmed Percent: True
Middle Split: False
Middle Split Percent: True
fastp:
pct_adapter: True
pct_surviving: True
Expand Down Expand Up @@ -315,6 +347,8 @@ extra_fn_clean_exts:
- ".bbduk"
- ".unmapped"
- "_filtered"
- "porechop"
- "porechop_abi"
- type: remove
pattern: "_falco"

Expand Down
30 changes: 27 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -241,12 +241,12 @@ process {
}

withName: PORECHOP_PORECHOP {
ext.prefix = { "${meta.id}_${meta.run_accession}" }
ext.prefix = { "${meta.id}_${meta.run_accession}_porechop" }
publishDir = [
[
path: { "${params.outdir}/porechop" },
mode: params.publish_dir_mode,
pattern: '*_porechopped.fastq.gz',
pattern: '*.fastq.gz',
LilyAnderssonLee marked this conversation as resolved.
Show resolved Hide resolved
enabled: params.save_preprocessed_reads
],
[
Expand All @@ -257,7 +257,31 @@ process {
[
path: { "${params.outdir}/analysis_ready_fastqs" },
mode: params.publish_dir_mode,
pattern: '*_porechopped.fastq.gz',
pattern: '*.fastq.gz',
enabled: params.save_analysis_ready_fastqs,
saveAs: { ( params.perform_runmerging == false || ( params.perform_runmerging && !meta.is_multirun ) ) && !params.perform_longread_hostremoval && params.longread_qc_skipqualityfilter && !params.longread_qc_skipadaptertrim && params.perform_longread_qc && params.save_analysis_ready_fastqs ? it : null }
]
]
}

withName: PORECHOP_ABI {
ext.prefix = { "${meta.id}_${meta.run_accession}_porechop_abi" }
publishDir = [
[
path: { "${params.outdir}/porechop_abi" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
enabled: params.save_preprocessed_reads
],
[
path: { "${params.outdir}/porechop_abi" },
mode: params.publish_dir_mode,
pattern: '*.log'
],
[
path: { "${params.outdir}/analysis_ready_fastqs" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
enabled: params.save_analysis_ready_fastqs,
saveAs: { ( params.perform_runmerging == false || ( params.perform_runmerging && !meta.is_multirun ) ) && !params.perform_longread_hostremoval && params.longread_qc_skipqualityfilter && !params.longread_qc_skipadaptertrim && params.perform_longread_qc && params.save_analysis_ready_fastqs ? it : null }
]
Expand Down
18 changes: 18 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [fastp](#fastp) - Adapter trimming for Illumina data
- [AdapterRemoval](#adapterremoval) - Adapter trimming for Illumina data
- [Porechop](#porechop) - Adapter removal for Oxford Nanopore data
- [Porechop_ABI](#porechop_abi) - Adapter removal for Oxford Nanopore data
- [Nonpareil](#nonpareil) - Read redundancy and metagenome coverage estimation for short reads
- [BBDuk](#bbduk) - Quality trimming and filtering for Illumina data
- [PRINSEQ++](#prinseq) - Quality trimming and filtering for Illunina data
Expand Down Expand Up @@ -178,6 +179,23 @@ You will only find the `.fastq` files in the results directory if you provide `
We do **not** recommend using Porechop if you are already trimming the adapters with ONT's basecaller Guppy.
:::

### Porechop_ABI

[Porechop_ABI](https://github.com/bonsai-team/Porechop_ABI) is an extension of [Porechop](https://github.com/rrwick/Porechop). Porechop_ABI does not use any external knowledge or database for the adapters. Adapters are discovered directly from the reads using approximate k-mers counting and assembly. Then these sequences can be used for trimming, using all standard Porechop options. The software is able to report a combination of distinct sequences if a mix of adapters is used. It can also be used to check whether a dataset has already been trimmed out or not, or to find leftover adapters in datasets that have been previously processed with Guppy.

<details markdown="1">
<summary>Output files</summary>

- `porechop_abi/`
- `<sample_id>.log`: Log file containing trimming statistics
- `<sample_id>.fastq.gz`: Adapter-trimmed file

</details>

The output logs are saved in the output folder and are part of MultiQC report.You do not normally need to check these manually.

You will only find the `.fastq` files in the results directory if you provide ` --save_preprocessed_reads`. Alternatively, if you wish only to have the 'final' reads that go into classification/profiling (i.e., that may have additional processing), do not specify this flag but rather specify `--save_analysis_ready_reads`, in which case the reads will be in the folder `analysis_ready_reads`.

### BBDuk

[BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/) stands for Decontamination Using Kmers. BBDuk was developed to combine most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool.
Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ By default, paired-end merging is not activated. In this case paired-end 'alignm
You can also turn off clipping and only perform paired-end merging, if requested. This can be useful when processing data downloaded from the ENA, SRA, or DDBJ (`--shortread_qc_skipadaptertrim`).
Both tools support length filtering of reads and can be tuned with `--shortread_qc_minlength`. Performing length filtering can be useful to remove short (often low sequencing complexity) sequences that result in unspecific classification and therefore slow down runtime during classification/profiling, with minimal gain.

There is currently one option for long-read Oxford Nanopore processing: [`porechop`](https://github.com/rrwick/Porechop).
There are currently two options for long-read Oxford Nanopore processing: [`porechop`](https://github.com/rrwick/Porechop), [`porechop_abi`](https://github.com/bonsai-team/Porechop_ABI).

For both short-read and long-read preprocessing, you can optionally save the resulting processed reads with `--save_preprocessed_reads`.

Expand Down
6 changes: 6 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,12 @@
"git_sha": "729335dda8ba226323edc54dec80ae959079207e",
"installed_by": ["modules"]
},
"porechop/abi": {
"branch": "master",
"git_sha": "870f9af2eaf0000c94d74910d762cf153752af98",
"installed_by": ["modules"],
"patch": "modules/nf-core/porechop/abi/porechop-abi.diff"
},
"porechop/porechop": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand Down
9 changes: 9 additions & 0 deletions modules/nf-core/porechop/abi/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

55 changes: 55 additions & 0 deletions modules/nf-core/porechop/abi/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions modules/nf-core/porechop/abi/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions modules/nf-core/porechop/abi/porechop-abi.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 59 additions & 0 deletions modules/nf-core/porechop/abi/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading