-
Notifications
You must be signed in to change notification settings - Fork 22
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add InterProScan to Pipeline and integrate in AMPcombi #428
Conversation
|
Warning Newer version of the nf-core template is available. Your pipeline is using an old version of the nf-core template: 3.1.0. For more documentation on how to update your pipeline, please see the nf-core documentation and Synchronisation documentation. |
@nf-core-bot fix linting |
@nf-core-bot fix linting |
Also fixes issue number #434 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Main issue is I don't like the use of function
, we already use functon
in funcscan
in a broad sense... can you refine what exactly we are using interproscan for and then we can adjust the naming
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Almost there!
Also missing README update
@@ -9,6 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0 | |||
|
|||
- [#421](https://github.com/nf-core/funcscan/pull/421) Updated to nf-core template 3.0.2. (by @jfy133) | |||
- [#427](https://github.com/nf-core/funcscan/pull/427) AMPcombi now can use multiple other databases for classifications. (by @darcy220606) | |||
- [#428](https://github.com/nf-core/funcscan/pull/428) Added InterProScan annotation workflow to the pipeline. The results are coupled to AMPcombi final table. (by @darcy220606) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- [#428](https://github.com/nf-core/funcscan/pull/428) Added InterProScan annotation workflow to the pipeline. The results are coupled to AMPcombi final table. (by @darcy220606) | |
- [#428](https://github.com/nf-core/funcscan/pull/428) Added InterProScan protein annotation workflow to the pipeline. (by @darcy220606) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work overall 💪
Now that you introduce the protein_annotation
workflow, I wonder if we should rename the DNA-level annotation
workflow (of pyrodigal, bakta etc.). Maybe to contig_annotation
, cds_annotation
, or orf_annotation
?
withName: SEQKIT_SEQ_FILTER { | ||
ext.prefix = { "${meta.id}_cleaned.faa" } | ||
publishDir = [ | ||
path: { "${params.outdir}/protein_annotation/interproscan/" }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we sure we want the output in ${params.outdir}/protein_annotation/interproscan/
and not in ${params.outdir}/annotation/interproscan/
? I'd prefer the latter, to have it all in one place regardless of DNA (pyrodigal etc.) or protein annotation (interproscan). I think it's more intuitive to search for any annotation results in a single folder.
If not, what do you think of renaming the annotation
output folder to contig_annotation
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The other annotation tools in the annotation workflow are annotating CDS which also can be proteins. I would leave the annotation workflow as is because thats the baseline annotation step of the pipeline. This annotation step is more of an accessory annotation to the pipeline if the user wants more information from diff DB and (1) its technically not correct protein_annotation because those are nnot necessary proteins and (2) the plan is that we add more to this workflow (e.g. the functional annotation of those CDS) so protein annotation will no longer be valid.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The ouput im not sure if its a good idea to add it in the same folder because those are two diff workflows
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Okay. No strong opinion from my side. I still don't find the naming super intuitive, but it can be. Or why not cds_annotation
for the CDS tools (prokka etc.) versus protein_annotation
(interproscan)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll check if it's feasible to test the most recent interproscan db on our server in order to update it here. If not, then go ahead as is. 👍
nextflow_schema.json
Outdated
}, | ||
"protein_annotation_interproscan_disableresidueannottsv": { | ||
"type": "boolean", | ||
"help_text": "This disables the addition of the annotations assigned according to the databases activated to the final `<sample>_interproscan.tsv` file. Turning this on will remove the annotations from the final table. It is not recommended to use this option, as it will cause a run failure because the format of the resulting files will no longer be adequate for integration in the final summary tables. Currently, only applicable for AMPcombi2. \n\nFor more information about this flag see the tool [documentation](https://interproscan-docs.readthedocs.io/en/latest/HowToRun.html).\n\n> Modifies tool parameter(s):\n> - InterProScan: `--enable-tsv-residue-annot`\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They cannot add it by ext.arg :D Everything that should be ext.arg is a pipeline parameter. If we don't offer it, the user cannot use it. That's why we add as much as necessary and leave out everything that shouldn't be touched (or is too irrelevant).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Things to do before merge:
- address the database update comments from my previous review since I tested the new interpro scan db version today (see comment here)
- either fix ampcombi/parsetables error as reported in AMP screening doesn't run when only a single tool is enabled #444 or outsource this to a different PR
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a few things. I will pull the updated ampcombi module as well so we are ready to go :)
nextflow_schema.json
Outdated
}, | ||
"protein_annotation_interproscan_disableresidueannottsv": { | ||
"type": "boolean", | ||
"help_text": "This disables the addition of the annotations assigned according to the databases activated to the final `<sample>_interproscan.tsv` file. Turning this on will remove the annotations from the final table. It is not recommended to use this option, as it will cause a run failure because the format of the resulting files will no longer be adequate for integration in the final summary tables. Currently, only applicable for AMPcombi2. \n\nFor more information about this flag see the tool [documentation](https://interproscan-docs.readthedocs.io/en/latest/HowToRun.html).\n\n> Modifies tool parameter(s):\n> - InterProScan: `--enable-tsv-residue-annot`\n", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We just discussed on Slack (here), that it might be safer to throw a warning in the AMP workflow, if the user changes this interproscan flag. Because we shouldn't expect them to read the whole params page or give them the pipeline-supported option to switch this flag and thereby crash the pipeline 🙈
PR checklist
This PR adds InterProScan to FUNCSCAN. It also integrates it into AMPcombi v2.0.1, which can parse its output as an optional flag.
This PR also closes issue #434
🚨 🚨 As interproscan requires a large database, i have not added it to any of the CI tests as that would require 4 hours for just downloading the database!!!!
👀 👀 👀 👀 👀 👀 Still TODO once AMPcombi 2.0.1 is updated in nf-core: DONE!!nf-core lint
).nextflow run . -profile test,docker --outdir <OUTDIR>
).nextflow run . -profile debug,test,docker --outdir <OUTDIR>
).docs/usage.md
is updated.docs/output.md
is updated.CHANGELOG.md
is updated.README.md
is updated (including new tool citations and authors/contributors).