From f756160836a2b7dda3d8bc87d02caf6354ba083b Mon Sep 17 00:00:00 2001 From: FriederikeHanssen Date: Fri, 1 Dec 2023 09:53:26 +0000 Subject: [PATCH 1/9] work on modules addition in basic training --- .../docs/contributing/nf_core_basic_training.md | 14 +++++++++----- 1 file changed, 9 insertions(+), 5 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 9f9e04e6b2..d9f01da16e 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -5,7 +5,7 @@ subtitle: A guide to create Nextflow pipelines using nf-core tools # Introduction -This training course aims to demonstrate how to build an nf-core pipeline using the nf-core pipeline template and nf-core modules as well as custom, local modules. Be aware that we are not going to explain any fundamental Nextflow concepts, as such we advise anyone taking this course to have completed the [Basic Nextflow Training Workshop](https://training.nextflow.io/). +This training course aims to demonstrate how to build an nf-core pipeline using the nf-core pipeline template and nf-core modules and subworkflows as well as custom, local modules. Be aware that we are not going to explain any fundamental Nextflow concepts, as such we advise anyone taking this course to have completed the [Basic Nextflow Training Workshop](https://training.nextflow.io/). ```md During this course we are going to build a Simple RNA-Seq workflow. @@ -514,7 +514,9 @@ nf-core lint [...] ``` -# Adding Modules to a pipeline +# Building a pipeline from (existing) components + +Nextflow pipelines can be build in a very modular fashion. In nf-core, we have simple building blocks available: nf-core/modules. They are wrappers around usually individual tools. In addition, we have subworkflows: smaller pre-build pipeline chunks. You can think about the modules as Lego bricks and subworkflows as pre-build chunks that can be added to various sets. These components are centrally available for all Nextflow pipelines. To make working with them easy, we have can use `nf-core/tools` ## Adding an existing nf-core module @@ -584,9 +586,11 @@ You can list all of the modules available on nf-core/modules via the command bel nf-core modules list remote ``` +In addition, all modules are listed on the website: [https://nf-co.re/modules](https://nf-co.re/modules) + ### Install a remote nf-core module -To install a remote nf-core module from the website, you can first get information about a tool, including the installation command by executing: +To install a remote nf-core module, you can first get information about a tool, including the installation command by executing: ```bash nf-core modules info salmon/index @@ -687,13 +691,13 @@ comparison to simple nextflow pipeline from the basic Nextflow training would be ## Adding a remote module -If there is no nf-core module available for the software you want to include, the nf-core tools package can also aid in the generation of a remote module that is specific for your pipeline. To add a remote module run the following: +If there is no nf-core module available for the software you want to include, you can add the module to the nf-core/modules repository. It will then become available to the wider Nextflow Community. See how to [here](https://nf-co.re/docs/contributing/tutorials/dsl2_modules_tutorial). If the module is very pipeline specific, you can also add a local module. The nf-core tools package can aid in the generation of a module template. To add a bare-bone local module run the following: ``` nf-core modules create ``` -Open ./modules/local/demo/module.nf and start customising this to your needs whilst working your way through the extensive TODO comments! +Open ./modules/local/demo/module.nf and start customising this to your needs whilst working your way through the extensive TODO comments! For further help and guidelines for the modules code, check out the [modules specific documentation](https://nf-co.re/docs/contributing/tutorials/dsl2_modules_tutorial). ### Making a remote module for a custom script From cd23eff6ff359f2914e6aef73c0ae806dcf525c6 Mon Sep 17 00:00:00 2001 From: FriederikeHanssen Date: Fri, 1 Dec 2023 12:06:23 +0000 Subject: [PATCH 2/9] text --- .../contributing/nf_core_basic_training.md | 25 ++++++++++++++++++- 1 file changed, 24 insertions(+), 1 deletion(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index d9f01da16e..0692039603 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -629,7 +629,7 @@ nf-core modules info salmon/index 💻 Installation command: nf-core modules install salmon/index ``` -The out put from the info command will among other things give you the nf-core/tools installation command, lets see what it is doing: +The output from the info command will among other things give you the nf-core/tools installation command, lets see what it is doing: ```bash nf-core modules install salmon/index @@ -685,6 +685,29 @@ INFO Use the following statement to include this module: include { SALMON_INDEX } from '../modules/nf-core/salmon/index/main' ``` +The module is now installed into the folder `modules/nf-core`. Now open the file `workflow/demotest.nf`. You will find already several `include` statements there from the installed modules (`MultiQC` and `FastQC`): + +```bash + +include { FASTQC } from '../modules/nf-core/fastqc/main' +include { MULTIQC } from '../modules/nf-core/multiqc/main' +``` + +Now add the above line underneath it: + +```bash + +include { FASTQC } from '../modules/nf-core/fastqc/main' +include { MULTIQC } from '../modules/nf-core/multiqc/main' +include { SALMON_INDEX } from '../modules/nf-core/salmon/index/main' + +``` + +This makes the module now available in the workflow script and it can be called with the right input data. + + + + (lots of steps missing here) exercise to add a different module would be nice! => salmon/quant! comparison to simple nextflow pipeline from the basic Nextflow training would be nice!) From 01342feaea383d71a3f545c5f210d747f8df877f Mon Sep 17 00:00:00 2001 From: FriederikeHanssen Date: Fri, 1 Dec 2023 12:18:52 +0000 Subject: [PATCH 3/9] add todo --- src/content/docs/contributing/nf_core_basic_training.md | 1 + 1 file changed, 1 insertion(+) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 0692039603..bc9ccbaf33 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -705,6 +705,7 @@ include { SALMON_INDEX } from '../modules/nf-core/salmon/index/main' This makes the module now available in the workflow script and it can be called with the right input data. + From 24e9891ba52e1c201ab0f5b2b5ad3606e97f980a Mon Sep 17 00:00:00 2001 From: FriederikeHanssen Date: Fri, 1 Dec 2023 12:44:24 +0000 Subject: [PATCH 4/9] how to add a module --- .../contributing/nf_core_basic_training.md | 38 ++++++++++++++++++- 1 file changed, 37 insertions(+), 1 deletion(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index bc9ccbaf33..0d5900dbc1 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -705,8 +705,44 @@ include { SALMON_INDEX } from '../modules/nf-core/salmon/index/main' This makes the module now available in the workflow script and it can be called with the right input data. - + +We can now call the module in our workflow. Let's place it after FastQC: + +```bash + +workflow DEMOTEST { + + ... + // + // MODULE: Run FastQC + // + FASTQC ( + INPUT_CHECK.out.reads + ) + ch_versions = ch_versions.mix(FASTQC.out.versions.first()) + + BWA_INDEX() +``` + +Now we are still missing an input for our module. In order to build an index, we require the reference fasta. Luckily, the template pipeline has this already all configured, and we can access it by just using `params.fasta` and `view` it to insppect the channel content. (We will see later how to add more input files.) + + +```bash + fasta = Channel.fromPath(params.fasta) + + fasta.view() + + BWA_INDEX( + fasta.map{it -> [id:it.getName(), it]} + ) + ch_versions = ch_versions.mix(BWA_INDEX.out.versions.first()) + +``` + +Now what is happening here: + +To pass over our input FastA file, we need to do a small channel manipulation. nf-core/modules typically take the input together with a `meta` map. This is just a hashmap that contains relevant information for the analysis, that should be passed around the pipeline. There are a couple of keys that we share across all modules, such as `id`. So in order, to have a valid input for our module, we just use the fasta file name (`it.getName()`) as our `id`. In addition, we collect the versions of the tools that are run in the module. This will allow us later to track all tools and all versions allow us to generate a report. (lots of steps missing here) From e54486b5dcafd8f6f70a3739168f3e168d331a41 Mon Sep 17 00:00:00 2001 From: FriederikeHanssen Date: Fri, 1 Dec 2023 12:46:56 +0000 Subject: [PATCH 5/9] run pipeline --- .../docs/contributing/nf_core_basic_training.md | 13 ++++++++++--- 1 file changed, 10 insertions(+), 3 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 0d5900dbc1..7b2b9c717e 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -722,7 +722,7 @@ workflow DEMOTEST { ) ch_versions = ch_versions.mix(FASTQC.out.versions.first()) - BWA_INDEX() + SALMON_INDEX() ``` Now we are still missing an input for our module. In order to build an index, we require the reference fasta. Luckily, the template pipeline has this already all configured, and we can access it by just using `params.fasta` and `view` it to insppect the channel content. (We will see later how to add more input files.) @@ -733,10 +733,10 @@ Now we are still missing an input for our module. In order to build an index, we fasta.view() - BWA_INDEX( + SALMON_INDEX( fasta.map{it -> [id:it.getName(), it]} ) - ch_versions = ch_versions.mix(BWA_INDEX.out.versions.first()) + ch_versions = ch_versions.mix(SALMON_INDEX.out.versions.first()) ``` @@ -744,6 +744,13 @@ Now what is happening here: To pass over our input FastA file, we need to do a small channel manipulation. nf-core/modules typically take the input together with a `meta` map. This is just a hashmap that contains relevant information for the analysis, that should be passed around the pipeline. There are a couple of keys that we share across all modules, such as `id`. So in order, to have a valid input for our module, we just use the fasta file name (`it.getName()`) as our `id`. In addition, we collect the versions of the tools that are run in the module. This will allow us later to track all tools and all versions allow us to generate a report. +How test your pipeline: + +```bash +nextflow run main.nf -profile test,docker --outdir results +``` + +You should now see that `SALMON_INDEX` is run. (lots of steps missing here) exercise to add a different module would be nice! => salmon/quant! From 438f8b60adc6d3c1416d10ed0d271126a214d9d1 Mon Sep 17 00:00:00 2001 From: scheckley <2563006+scheckley@users.noreply.github.com> Date: Tue, 19 Mar 2024 11:23:05 +0000 Subject: [PATCH 6/9] Update subworkflows.md added link to meta maps in the "what is the meta map" section. --- src/content/docs/contributing/subworkflows.md | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/src/content/docs/contributing/subworkflows.md b/src/content/docs/contributing/subworkflows.md index 82028e57d6..d10f025287 100644 --- a/src/content/docs/contributing/subworkflows.md +++ b/src/content/docs/contributing/subworkflows.md @@ -388,9 +388,7 @@ nextflow run tests/subworkflows/nf-core/ -entry test_ +The meta variable can be passed down to processes as a tuple of the channel containing the actual samples, e.g. FastQ files, and the meta variable. The `meta map` is a [groovy map](https://www.tutorialspoint.com/groovy/groovy_maps.htm), which is like a python dictionary. Additional documentation on meta maps is available [here](https://nf-co.re/docs/contributing/modules#what-is-the-meta-map). ## Help From 13a2a8c159e94a9205c65349901d47a6851bdc72 Mon Sep 17 00:00:00 2001 From: nf-core-bot Date: Tue, 19 Mar 2024 12:58:16 +0000 Subject: [PATCH 7/9] [automated] Fix code linting --- src/content/docs/contributing/nf_core_basic_training.md | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index c9a01dd2c0..139636f75f 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -1217,8 +1217,7 @@ A `tag` is simple a user provided identifier associated to the task. In our process example, the input is a tuple comprising a hash of metadata for the maf file called `meta` and the path to the `maf` file. It may look -similar to: `[[id:'123', data_type:'maf'], -/path/to/file/example.maf]`. Hence, when nextflow makes +similar to: `[[id:'123', data_type:'maf'], /path/to/file/example.maf]`. Hence, when nextflow makes the call and `$meta.id` is `123` name of the job will be "CONVERT_MAF2BED(123)". If `meta` does not have `id` in its hash, then this will be literally `null`. From a53654abbfc0294d5e589fc8777715b497a7cd41 Mon Sep 17 00:00:00 2001 From: Franziska Bonath <41994400+FranBonath@users.noreply.github.com> Date: Tue, 19 Mar 2024 15:45:11 +0100 Subject: [PATCH 8/9] Apply suggestions from code review --- .../docs/contributing/nf_core_basic_training.md | 10 +++++----- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 7b2b9c717e..0ac03efd0c 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -516,7 +516,7 @@ nf-core lint # Building a pipeline from (existing) components -Nextflow pipelines can be build in a very modular fashion. In nf-core, we have simple building blocks available: nf-core/modules. They are wrappers around usually individual tools. In addition, we have subworkflows: smaller pre-build pipeline chunks. You can think about the modules as Lego bricks and subworkflows as pre-build chunks that can be added to various sets. These components are centrally available for all Nextflow pipelines. To make working with them easy, we have can use `nf-core/tools` +Nextflow pipelines can be build in a very modular fashion. In nf-core, we have simple building blocks available: nf-core/modules. Usually, they are wrappers around individual tools. In addition, we have subworkflows: smaller pre-build pipeline chunks. You can think about the modules as Lego bricks and subworkflows as pre-build chunks that can be added to various sets. These components are centrally available for all Nextflow pipelines. To make working with them easy, you can use `nf-core/tools`. ## Adding an existing nf-core module @@ -629,7 +629,7 @@ nf-core modules info salmon/index 💻 Installation command: nf-core modules install salmon/index ``` -The output from the info command will among other things give you the nf-core/tools installation command, lets see what it is doing: +The output from the info command will, among other things, give you the nf-core/tools installation command. Lets see what it is doing: ```bash nf-core modules install salmon/index @@ -687,7 +687,7 @@ INFO Use the following statement to include this module: The module is now installed into the folder `modules/nf-core`. Now open the file `workflow/demotest.nf`. You will find already several `include` statements there from the installed modules (`MultiQC` and `FastQC`): -```bash +```bash title="workflow/demotest.nf" include { FASTQC } from '../modules/nf-core/fastqc/main' include { MULTIQC } from '../modules/nf-core/multiqc/main' @@ -695,7 +695,7 @@ include { MULTIQC } from '../modules/nf-core/multiqc/main' Now add the above line underneath it: -```bash +```bash title="workflow/demotest.nf" include { FASTQC } from '../modules/nf-core/fastqc/main' include { MULTIQC } from '../modules/nf-core/multiqc/main' @@ -709,7 +709,7 @@ This makes the module now available in the workflow script and it can be called We can now call the module in our workflow. Let's place it after FastQC: -```bash +```bash title="workflow/demotest.nf" workflow DEMOTEST { From 458b651e33a09c3f6003c88762d00c6ef01ebd1a Mon Sep 17 00:00:00 2001 From: nf-core-bot Date: Tue, 19 Mar 2024 15:34:38 +0000 Subject: [PATCH 9/9] [automated] Fix code linting --- .../contributing/nf_core_basic_training.md | 33 ++++++++++--------- 1 file changed, 18 insertions(+), 15 deletions(-) diff --git a/src/content/docs/contributing/nf_core_basic_training.md b/src/content/docs/contributing/nf_core_basic_training.md index 06e893c71c..bcf25f83be 100644 --- a/src/content/docs/contributing/nf_core_basic_training.md +++ b/src/content/docs/contributing/nf_core_basic_training.md @@ -938,8 +938,7 @@ workflow DEMOTEST { SALMON_INDEX() ``` -Now we are still missing an input for our module. In order to build an index, we require the reference fasta. Luckily, the template pipeline has this already all configured, and we can access it by just using `params.fasta` and `view` it to insppect the channel content. (We will see later how to add more input files.) - +Now we are still missing an input for our module. In order to build an index, we require the reference fasta. Luckily, the template pipeline has this already all configured, and we can access it by just using `params.fasta` and `view` it to insppect the channel content. (We will see later how to add more input files.) ```bash fasta = Channel.fromPath(params.fasta) @@ -1087,7 +1086,7 @@ In the directory `exercise_6` you will find the custom script `print_hello.py`, To generate a module for a custom script you need to follow the same steps when adding a remote module. Then, you can supply the command for your script in the `script` block but your script needs to be present -and *executable* in the `bin` +and _executable_ in the `bin` folder of the pipeline. In the nf-core pipelines, this folder is in the main directory and you can see in [`rnaseq`](https://github.com/nf-core/rnaseq). @@ -1097,7 +1096,7 @@ This is an Rscript present in the [`bin`](https://github.com/nf-core/rnaseq/tree We can find the module that runs this script in [`modules/local/tximport`](https://github.com/nf-core/rnaseq/blob/master/modules/local/tximport/main.nf). As we can see the script is being called in the `script` block, note that `tximport.r` is -being executed as if it was called from the command line and therefore needs to be *executable*. +being executed as if it was called from the command line and therefore needs to be _executable_.
@@ -1113,11 +1112,12 @@ being executed as if it was called from the command line and therefore needs to
_Tip: Try to follow best practices when writing a script for - reproducibility and maintenance purposes: add the - shebang (e.g. `#!/usr/bin/env python`), and a header - with description and type of license._ +reproducibility and maintenance purposes: add the +shebang (e.g. `#!/usr/bin/env python`), and a header +with description and type of license._ ### 1. Write your script + Let's create a simple custom script that converts a MAF file to a BED file called `maf2bed.py` and place it in the bin directory of our nf-core-testpipeline:: ``` @@ -1157,6 +1157,7 @@ if __name__ == "__main__": ``` ### 2. Make sure your script is in the right folder + Now, let's move it to the correct directory: ``` @@ -1165,13 +1166,13 @@ chmod +x /path/where/pipeline/is/bin/maf2bed.py ``` ### 3. Create your custom module + Then, let's write our module. We will call the process "CONVERT_MAF2BED" and add any tags or/and labels that are appropriate (this is optional) and directives (via conda and/or container) for the definition of dependencies. -
More info on labels A `label` will @@ -1186,6 +1187,7 @@ withLabel:process_single { time = { check_max( 1.h * task.attempt, 'time' ) } } ``` +
@@ -1195,8 +1197,7 @@ A `tag` is simple a user provided identifier associated to the task. In our process example, the input is a tuple comprising a hash of metadata for the maf file called `meta` and the path to the `maf` file. It may look -similar to: `[[id:'123', data_type:'maf'], -/path/to/file/example.maf]`. Hence, when nextflow makes +similar to: `[[id:'123', data_type:'maf'], /path/to/file/example.maf]`. Hence, when nextflow makes the call and `$meta.id` is `123` name of the job will be "CONVERT_MAF2BED(123)". If `meta` does not have `id` in its hash, then this will be literally `null`. @@ -1218,6 +1219,7 @@ process foo { ''' } ``` + Multiple packages can be specified separating them with a blank space e.g. `bwa=0.7.15 samtools=1.15.1`. The name of the channel from where a specific package needs to be downloaded can be specified using the usual Conda notation i.e. prefixing the package with the channel name as shown here `bioconda::bwa=0.7.15` ``` @@ -1230,6 +1232,7 @@ process foo { ''' } ``` + Similarly, we can apply the `container` directive to execute the process script in a [Docker](http://docker.io/) or [Singularity](https://docs.sylabs.io/guides/3.5/user-guide/introduction.html) container. When running Docker, it requires the Docker daemon to be running in machine where the pipeline is executed, i.e. the local machine when using the local executor or the cluster nodes when the pipeline is deployed through a grid executor. ``` @@ -1259,6 +1262,7 @@ process foo { ''' } ``` +
Since `maf2bed.py` is in the `bin` directory we can directory call it in the script block of our new module `CONVERT_MAF2BED`. You only have to be careful with how you call variables (some explanations on when to use `${variable}` vs. `$variable`): @@ -1296,9 +1300,8 @@ maf2bed.py --mafin $maf --bedout ${prefix}.bed More on nextflow's process components in the [docs](https://www.nextflow.io/docs/latest/process.html). - - ### Include your module in the workflow + In general, we will call out nextflow module `main.nf` and save it in the `modules` folder under another folder called `conver_maf2bed`. If you believe your custom script could be useful for others and it is potentially reusable or calling a tool that is not yet present in nf-core modules you can start the process of making it official adding a `meta.yml` [explained above](#adding-modules-to-a-pipeline). In the `meta.yml` The overall tree for the pipeline skeleton will look as follows: ``` @@ -1317,7 +1320,7 @@ pipeline/ ... ``` -To use our custom module located in `./modules/local/convert_maf2bed` within our workflow, we use a module inclusions command as follows (this has to be done before we invoke our workflow): +To use our custom module located in `./modules/local/convert_maf2bed` within our workflow, we use a module inclusions command as follows (this has to be done before we invoke our workflow): ``` include { CONVERT_MAF2BED } from './modules/local/convert_maf2bed/main' @@ -1328,6 +1331,7 @@ workflow { ``` ### Other notes + #### What happens in I want to use containers but there is no image created with the packages I need? No worries, this can be done fairly easy thanks to [BioContainers](https://biocontainers-edu.readthedocs.io/en/latest/what_is_biocontainers.html), see instructions [here](https://github.com/BioContainers/multi-package-containers). If you see the combination that you need in the repo, you can also use [this website](https://midnighter.github.io/mulled) to find out the "mulled" name of this container. @@ -1336,10 +1340,9 @@ No worries, this can be done fairly easy thanks to [BioContainers](https://bioco You are in luck, we have more documentation [here](https://nf-co.re/docs/contributing/modules#software-requirements) - #### I want to know more about modules! -See more info about modules in the nextflow docs [here](https://nf-co.re/docs/contributing/modules#software-requirements.) +See more info about modules in the nextflow docs [here](https://nf-co.re/docs/contributing/modules#software-requirements.) ## Lint all modules