Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First complete draft version of the V6 Amplicon Analysis pipeline #7

Merged
merged 109 commits into from
Jun 6, 2024

Conversation

chrisAta
Copy link
Member

This pull request is about merging the first full draft version of the MGnify V6 Amplicon Analysis pipeline.

The pipeline has been fully refactored to use Nextflow, and contains all of the features we envisaged for it in a decent state of polish. Apart from the re-implementation of existing features, the major changes of the pipeline include:

  • Simplification of read QC subworkflow using fastp
  • Added amplified region inference for 16S and 18S rRNA using a newly developed method
  • Added automatic primer identification, trimming, and validation using a newly developed method
  • Added ASV calling using DADA2
  • Added taxonomic classification of ASVs and visualisation using MAPseq and Krona
  • Added the PR2 reference database for taxonomic classification in both the closed-reference method and the new ASV method

More technical changes of the pipeline include:

  • Fully refactored into Nextflow
  • Uses nf-core (both core and ebi-metagenomics) modules and subworkflows everywhere that it makes sense
  • Uses the newly developed mgnify-pipelines-toolkit as a backbone for many of the processing scripts the pipeline uses
  • Reference databases used in V5 have been updated to their newest versions (SILVA 138.1, UNITE 9.0, ITSoneDB 1.141, Rfam 14.10)
  • Uses nf-validation for validation of inputs
  • Uses nf-test for unit testing (with most modules and subworkflows currently having at least one unit test)
  • Uses nf-prov for provenance information
  • Can be run on SLURM

chrisAta and others added 30 commits September 14, 2023 09:40
…support for both single-end and paired-end reads
…an get cleaned fastp reads for paired end reads without merging them
… with more than one detected HV region. fixed now
…eq nf-core module. Needed to patch some changes to add variable region output
@mberacochea
Copy link
Member

Probably we should rename that repo into mgnify-amplicon-v6 or smth ...

yeah, it should be ebi-metagenomics/amplicon-pipeline. I wouldn't include the version on the repo name

Copy link
Member

@mberacochea mberacochea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've submitted another PR as review of this one -> #8.

chrisAta and others added 14 commits May 20, 2024 10:05
The structuer of the pipelines looks good to me, amazing work :).

I've removed the nf-core/ampliconpipeline mentions in the code,
this pipeline is not part of the nf-core ( it could be in the future...
but not now).
Tweaked the parameters, some of them were missing from then nextflow.config file.
I've added `checkIfExists: true` in file() invocations, most of them should be
handled by the nf-schema validation, but it's good practice to keep them.
Renamed ITS_SWF to MASK_FASTA_SWF, it's not ITS specific even though that is the
case for this piepline.
Every module needs a container and a VERSIONS output file.
…moduels that use the MPT (commented for now as I'm waiting to release a fix for it until later)
…ows return a version.yml, collate all the versions at the end of the pipeline and output them, refactor modules that were using shell to just use script
Copy link
Member

@mberacochea mberacochea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well. Impressive work @chrisAta. I left some comments, but I think it's good to go

@chrisAta chrisAta merged commit 2034de1 into main Jun 6, 2024
@chrisAta chrisAta deleted the dev branch June 6, 2024 11:31
@chrisAta chrisAta restored the dev branch June 6, 2024 11:31
@chrisAta chrisAta deleted the dev branch June 6, 2024 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants