Skip to content

Commit

Permalink
Merge pull request #511 from genomic-medicine-sweden/porechop_abi
Browse files Browse the repository at this point in the history
Add porechop_abi to the pipeline
  • Loading branch information
jfy133 authored Aug 7, 2024
2 parents e1b367f + ac04c4a commit c779e73
Show file tree
Hide file tree
Showing 21 changed files with 401 additions and 15 deletions.
1 change: 1 addition & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,6 +11,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
- [#466](https://github.com/nf-core/taxprofiler/pull/466) - Input database sheets now require a `db_type` column to distinguish between short- and long-read databases (added by @LilyAnderssonLee)
- [#505](https://github.com/nf-core/taxprofiler/pull/505) - Add small files to the file `tower.yml` (added by @LilyAnderssonLee)
- [#508](https://github.com/nf-core/taxprofiler/pull/508) - Add `nanoq` as a filtering tool for nanopore reads (added by @LilyAnderssonLee)
- [#511](https://github.com/nf-core/taxprofiler/pull/511) - Add `porechop_abi` as an alternative adapter removal tool for long reads nanopore data (added by @LilyAnderssonLee)

### `Fixed`

Expand Down
4 changes: 4 additions & 0 deletions CITATIONS.md
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,10 @@

> Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial Genomics, 3(10), e000132. https://doi.org/10.1099/mgen.0.000132
- [Porechop_ABI](https://github.com/bonsai-team/Porechop_ABI)

> Bonenfant, Q., Noé, L., & Touzet, H. (2023). Porechop_ABI: discovering unknown adapters in Oxford Nanopore Technology sequencing reads for downstream trimming. Bioinformatics Advances, 3(1):vbac085. https://10.1093/bioadv/vbac085
- [Filtlong](https://github.com/rrwick/Filtlong)

> Wick R (2021) Filtlong, URL: https://github.com/rrwick/Filtlong
Expand Down
2 changes: 1 addition & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,7 @@

1. Read QC ([`FastQC`](https://www.bioinformatics.babraham.ac.uk/projects/fastqc/) or [`falco`](https://github.com/smithlabcode/falco) as an alternative option)
2. Performs optional read pre-processing
- Adapter clipping and merging (short-read: [fastp](https://github.com/OpenGene/fastp), [AdapterRemoval2](https://github.com/MikkelSchubert/adapterremoval); long-read: [porechop](https://github.com/rrwick/Porechop))
- Adapter clipping and merging (short-read: [fastp](https://github.com/OpenGene/fastp), [AdapterRemoval2](https://github.com/MikkelSchubert/adapterremoval); long-read: [porechop](https://github.com/rrwick/Porechop), [Porechop_ABI](https://github.com/bonsai-team/Porechop_ABI))
- Low complexity and quality filtering (short-read: [bbduk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/), [PRINSEQ++](https://github.com/Adrian-Cantu/PRINSEQ-plus-plus); long-read: [Filtlong](https://github.com/rrwick/Filtlong)), [Nanoq](https://github.com/esteinig/nanoq)
- Host-read removal (short-read: [BowTie2](http://bowtie-bio.sourceforge.net/bowtie2/); long-read: [Minimap2](https://github.com/lh3/minimap2))
- Run merging
Expand Down
34 changes: 34 additions & 0 deletions assets/multiqc_config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ report_section_order:
order: 500
porechop:
order: 400
porechop_abi:
order: 450
bbduk:
order: 300
prinseqplusplus:
Expand Down Expand Up @@ -106,7 +108,21 @@ top_modules:
- "*raw*"
extra: "If used in this run, Falco is a drop-in replacement for FastQC producing the same output, written by Guilherme de Sena Brandine and Andrew D. Smith."
- "porechop":
name: "Porechop"
anchor: "porechop"
target: "Porechop"
path_filters:
- "*porechop.log"
extra: "ℹ️: if you get the error message 'Error - was not able to plot data.' this means that porechop did not detect any adapters and therefore no statistics generated."
- "porechop":
name: "Porechop_ABI"
anchor: "porechop_abi"
target: "Porechop_ABI"
doi: "10.1093/bioadv/vbac085"
info: "find and remove adapters from Oxford Nanopore reads."
path_filters:
- "*porechop_abi.log"
extra: "ℹ️: if you get the error message 'Error - was not able to plot data.' this means that porechop_abi did not detect any adapters and therefore no statistics generated."
- "bowtie2":
name: "bowtie2"
- "samtools":
Expand Down Expand Up @@ -177,6 +193,14 @@ table_columns_placement:
End Trimmed Percent: 440
Middle Split: 450
Middle Split Percent: 460
Porechop_ABI:
Input Reads: 400
Start Trimmed: 410
Start Trimmed Percent: 420
End Trimmed: 430
End Trimmed Percent: 440
Middle Split: 450
Middle Split Percent: 460
Filtlong:
Target bases: 500
BBDuk:
Expand Down Expand Up @@ -250,6 +274,14 @@ table_columns_visible:
End Trimmed Percent: True
Middle Split: False
Middle Split Percent: True
porechop_abi:
Input reads: False
Start Trimmed:
Start Trimmed Percent: True
End Trimmed: False
End Trimmed Percent: True
Middle Split: False
Middle Split Percent: True
fastp:
pct_adapter: True
pct_surviving: True
Expand Down Expand Up @@ -315,6 +347,8 @@ extra_fn_clean_exts:
- ".bbduk"
- ".unmapped"
- "_filtered"
- "porechop"
- "porechop_abi"
- type: remove
pattern: "_falco"

Expand Down
30 changes: 27 additions & 3 deletions conf/modules.config
Original file line number Diff line number Diff line change
Expand Up @@ -241,12 +241,12 @@ process {
}

withName: PORECHOP_PORECHOP {
ext.prefix = { "${meta.id}_${meta.run_accession}" }
ext.prefix = { "${meta.id}_${meta.run_accession}_porechop" }
publishDir = [
[
path: { "${params.outdir}/porechop" },
mode: params.publish_dir_mode,
pattern: '*_porechopped.fastq.gz',
pattern: '*.fastq.gz',
enabled: params.save_preprocessed_reads
],
[
Expand All @@ -257,7 +257,31 @@ process {
[
path: { "${params.outdir}/analysis_ready_fastqs" },
mode: params.publish_dir_mode,
pattern: '*_porechopped.fastq.gz',
pattern: '*.fastq.gz',
enabled: params.save_analysis_ready_fastqs,
saveAs: { ( params.perform_runmerging == false || ( params.perform_runmerging && !meta.is_multirun ) ) && !params.perform_longread_hostremoval && params.longread_qc_skipqualityfilter && !params.longread_qc_skipadaptertrim && params.perform_longread_qc && params.save_analysis_ready_fastqs ? it : null }
]
]
}

withName: PORECHOP_ABI {
ext.prefix = { "${meta.id}_${meta.run_accession}_porechop_abi" }
publishDir = [
[
path: { "${params.outdir}/porechop_abi" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
enabled: params.save_preprocessed_reads
],
[
path: { "${params.outdir}/porechop_abi" },
mode: params.publish_dir_mode,
pattern: '*.log'
],
[
path: { "${params.outdir}/analysis_ready_fastqs" },
mode: params.publish_dir_mode,
pattern: '*.fastq.gz',
enabled: params.save_analysis_ready_fastqs,
saveAs: { ( params.perform_runmerging == false || ( params.perform_runmerging && !meta.is_multirun ) ) && !params.perform_longread_hostremoval && params.longread_qc_skipqualityfilter && !params.longread_qc_skipadaptertrim && params.perform_longread_qc && params.save_analysis_ready_fastqs ? it : null }
]
Expand Down
18 changes: 18 additions & 0 deletions docs/output.md
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ The pipeline is built using [Nextflow](https://www.nextflow.io/) and processes d
- [fastp](#fastp) - Adapter trimming for Illumina data
- [AdapterRemoval](#adapterremoval) - Adapter trimming for Illumina data
- [Porechop](#porechop) - Adapter removal for Oxford Nanopore data
- [Porechop_ABI](#porechop_abi) - Adapter removal for Oxford Nanopore data
- [Nonpareil](#nonpareil) - Read redundancy and metagenome coverage estimation for short reads
- [BBDuk](#bbduk) - Quality trimming and filtering for Illumina data
- [PRINSEQ++](#prinseq) - Quality trimming and filtering for Illunina data
Expand Down Expand Up @@ -178,6 +179,23 @@ You will only find the `.fastq` files in the results directory if you provide `
We do **not** recommend using Porechop if you are already trimming the adapters with ONT's basecaller Guppy.
:::

### Porechop_ABI

[Porechop_ABI](https://github.com/bonsai-team/Porechop_ABI) is an extension of [Porechop](https://github.com/rrwick/Porechop). Porechop_ABI does not use any external knowledge or database for the adapters. Adapters are discovered directly from the reads using approximate k-mers counting and assembly. Then these sequences can be used for trimming, using all standard Porechop options. The software is able to report a combination of distinct sequences if a mix of adapters is used. It can also be used to check whether a dataset has already been trimmed out or not, or to find leftover adapters in datasets that have been previously processed with Guppy.

<details markdown="1">
<summary>Output files</summary>

- `porechop_abi/`
- `<sample_id>.log`: Log file containing trimming statistics
- `<sample_id>.fastq.gz`: Adapter-trimmed file

</details>

The output logs are saved in the output folder and are part of MultiQC report.You do not normally need to check these manually.

You will only find the `.fastq` files in the results directory if you provide ` --save_preprocessed_reads`. Alternatively, if you wish only to have the 'final' reads that go into classification/profiling (i.e., that may have additional processing), do not specify this flag but rather specify `--save_analysis_ready_reads`, in which case the reads will be in the folder `analysis_ready_reads`.

### BBDuk

[BBDuk](https://jgi.doe.gov/data-and-tools/software-tools/bbtools/bb-tools-user-guide/bbduk-guide/) stands for Decontamination Using Kmers. BBDuk was developed to combine most common data-quality-related trimming, filtering, and masking operations into a single high-performance tool.
Expand Down
2 changes: 1 addition & 1 deletion docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -252,7 +252,7 @@ By default, paired-end merging is not activated. In this case paired-end 'alignm
You can also turn off clipping and only perform paired-end merging, if requested. This can be useful when processing data downloaded from the ENA, SRA, or DDBJ (`--shortread_qc_skipadaptertrim`).
Both tools support length filtering of reads and can be tuned with `--shortread_qc_minlength`. Performing length filtering can be useful to remove short (often low sequencing complexity) sequences that result in unspecific classification and therefore slow down runtime during classification/profiling, with minimal gain.

There is currently one option for long-read Oxford Nanopore processing: [`porechop`](https://github.com/rrwick/Porechop).
There are currently two options for long-read Oxford Nanopore processing: [`porechop`](https://github.com/rrwick/Porechop), [`porechop_abi`](https://github.com/bonsai-team/Porechop_ABI).

For both short-read and long-read preprocessing, you can optionally save the resulting processed reads with `--save_preprocessed_reads`.

Expand Down
6 changes: 6 additions & 0 deletions modules.json
Original file line number Diff line number Diff line change
Expand Up @@ -215,6 +215,12 @@
"git_sha": "729335dda8ba226323edc54dec80ae959079207e",
"installed_by": ["modules"]
},
"porechop/abi": {
"branch": "master",
"git_sha": "870f9af2eaf0000c94d74910d762cf153752af98",
"installed_by": ["modules"],
"patch": "modules/nf-core/porechop/abi/porechop-abi.diff"
},
"porechop/porechop": {
"branch": "master",
"git_sha": "911696ea0b62df80e900ef244d7867d177971f73",
Expand Down
9 changes: 9 additions & 0 deletions modules/nf-core/porechop/abi/environment.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

55 changes: 55 additions & 0 deletions modules/nf-core/porechop/abi/main.nf

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

48 changes: 48 additions & 0 deletions modules/nf-core/porechop/abi/meta.yml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

21 changes: 21 additions & 0 deletions modules/nf-core/porechop/abi/porechop-abi.diff

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

59 changes: 59 additions & 0 deletions modules/nf-core/porechop/abi/tests/main.nf.test

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit c779e73

Please sign in to comment.