Merge pull request #39 from nf-core/dev

PR `dev` -> `master` for first release.
nf-core · Apr 27, 2021 · 4eda7a8 · 4eda7a8
2 parents 78a2c83 + 0fdc242
commit 4eda7a8
Show file tree

Hide file tree

Showing 38 changed files with 51,519 additions and 437 deletions.
diff --git a/.github/.dockstore.yml b/.github/.dockstore.yml
@@ -3,3 +3,4 @@ version: 1.2
 workflows:
   - subclass: nfl
     primaryDescriptorPath: /nextflow.config
+    publish: True
diff --git a/.github/CONTRIBUTING.md b/.github/CONTRIBUTING.md
@@ -69,7 +69,7 @@ If you wish to contribute a new step, please use the following coding standards:
 2. Write the process block (see below).
 3. Define the output channel if needed (see below).
 4. Add any new flags/options to `nextflow.config` with a default (see below).
-5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`)
+5. Add any new flags/options to `nextflow_schema.json` with help text (with `nf-core schema build .`).
 6. Add any new flags/options to the help message (for integer/text parameters, print to help the corresponding `nextflow.config` parameter).
 7. Add sanity checks for all relevant parameters.
 8. Add any new software to the `scrape_software_versions.py` script in `bin/` and the version command to the `scrape_software_versions` process in `main.nf`.
@@ -87,7 +87,7 @@ Once there, use `nf-core schema build .` to add to `nextflow_schema.json`.
 
 ### Default processes resource requirements
 
-Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/%7B%7Bcookiecutter.name_noslash%7D%7D/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
+Sensible defaults for process resource requirements (CPUs / memory / time) for a process should be defined in `conf/base.config`. These should generally be specified generic with `withLabel:` selectors so they can be shared across multiple processes/steps of the pipeline. A nf-core standard set of labels that should be followed where possible can be seen in the [nf-core pipeline template](https://github.com/nf-core/tools/blob/master/nf_core/pipeline-template/conf/base.config), which has the default process as a single core-process, and then different levels of multi-core configurations for increasingly large memory requirements defined with standardised labels.
 
 The process resources can be passed on to the tool dynamically within the process with the `${task.cpu}` and `${task.memory}` variables in the `script:` block.
 

diff --git a/.github/ISSUE_TEMPLATE/bug_report.md b/.github/ISSUE_TEMPLATE/bug_report.md
@@ -55,7 +55,7 @@ Have you provided the following extra information/files:
 
 ## Container engine
 
-- Engine: <!-- [e.g. Conda, Docker, Singularity or Podman] -->
+- Engine: <!-- [e.g. Conda, Docker, Singularity, Podman, Shifter or Charliecloud] -->
 - version: <!-- [e.g. 1.0.0] -->
 - Image tag: <!-- [e.g. nfcore/pgdb:1.0.0] -->
 

diff --git a/.github/ISSUE_TEMPLATE/feature_request.md b/.github/ISSUE_TEMPLATE/feature_request.md
@@ -1,6 +1,6 @@
 ---
 name: Feature request
-about: Suggest an idea for the nf-core website
+about: Suggest an idea for the nf-core/pgdb pipeline
 labels: enhancement
 ---
 

diff --git a/.github/PULL_REQUEST_TEMPLATE.md b/.github/PULL_REQUEST_TEMPLATE.md
@@ -15,9 +15,9 @@ Learn more about contributing: [CONTRIBUTING.md](https://github.com/nf-core/pgdb
 
 - [ ] This comment contains a description of changes (with reason).
 - [ ] If you've fixed a bug or added code that should be tested, add tests!
- - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
- - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/pgdb/tree/master/.github/CONTRIBUTING.md)
- - [ ] If necessary, also make a PR on the nf-core/pgdb _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
+  - [ ] If you've added a new tool - add to the software_versions process and a regex to `scrape_software_versions.py`
+  - [ ] If you've added a new tool - have you followed the pipeline conventions in the [contribution docs](https://github.com/nf-core/pgdb/tree/master/.github/CONTRIBUTING.md)
+  - [ ] If necessary, also make a PR on the nf-core/pgdb _branch_ on the [nf-core/test-datasets](https://github.com/nf-core/test-datasets) repository.
 - [ ] Make sure your code lints (`nf-core lint .`).
 - [ ] Ensure the test suite passes (`nextflow run . -profile test,docker`).
 - [ ] Usage Documentation in `docs/usage.md` is updated.

diff --git a/.github/workflows/awsfulltest.yml b/.github/workflows/awsfulltest.yml
@@ -9,6 +9,16 @@ on:
     types: [completed]
   workflow_dispatch:
 
+
+env:
+  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+  TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
+  AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
+  AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
+  AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
+
+
 jobs:
   run-awstest:
     name: Run AWS full tests
@@ -23,21 +33,13 @@ jobs:
       - name: Install awscli
         run: conda install -c conda-forge awscli
       - name: Start AWS batch job
-        # TODO nf-core: You can customise AWS full pipeline tests as required
         # Add full size test data (but still relatively small datasets for few samples)
         # on the `test_full.config` test runs with only one set of parameters
         # Then specify `-profile test_full` instead of `-profile test` on the AWS batch command
-        env:
-          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
-          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
-          TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
-          AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
-          AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
-          AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
         run: |
           aws batch submit-job \
             --region eu-west-1 \
             --job-name nf-core-pgdb \
             --job-queue $AWS_JOB_QUEUE \
             --job-definition $AWS_JOB_DEFINITION \
-            --container-overrides '{"command": ["nf-core/pgdb", "-r '"${GITHUB_SHA}"' -profile test --outdir s3://'"${AWS_S3_BUCKET}"'/pgdb/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/pgdb/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'
+            --container-overrides '{"command": ["nf-core/pgdb", "-r '"${GITHUB_SHA}"' -profile test_full --outdir s3://'"${AWS_S3_BUCKET}"'/pgdb/results-'"${GITHUB_SHA}"' -w s3://'"${AWS_S3_BUCKET}"'/pgdb/work-'"${GITHUB_SHA}"' -with-tower"], "environment": [{"name": "TOWER_ACCESS_TOKEN", "value": "'"$TOWER_ACCESS_TOKEN"'"}]}'
diff --git a/.github/workflows/awstest.yml b/.github/workflows/awstest.yml
@@ -6,6 +6,16 @@ name: nf-core AWS test
 on:
   workflow_dispatch:
 
+
+env:
+  AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
+  AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
+  TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
+  AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
+  AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
+  AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
+
+
 jobs:
   run-awstest:
     name: Run AWS tests
@@ -20,16 +30,8 @@ jobs:
       - name: Install awscli
         run: conda install -c conda-forge awscli
       - name: Start AWS batch job
-        # TODO nf-core: You can customise CI pipeline run tests as required
         # For example: adding multiple test runs with different parameters
         # Remember that you can parallelise this by using strategy.matrix
-        env:
-          AWS_ACCESS_KEY_ID: ${{ secrets.AWS_ACCESS_KEY_ID }}
-          AWS_SECRET_ACCESS_KEY: ${{ secrets.AWS_SECRET_ACCESS_KEY }}
-          TOWER_ACCESS_TOKEN: ${{ secrets.AWS_TOWER_TOKEN }}
-          AWS_JOB_DEFINITION: ${{ secrets.AWS_JOB_DEFINITION }}
-          AWS_JOB_QUEUE: ${{ secrets.AWS_JOB_QUEUE }}
-          AWS_S3_BUCKET: ${{ secrets.AWS_S3_BUCKET }}
         run: |
           aws batch submit-job \
           --region eu-west-1 \

diff --git a/.github/workflows/branch.yml b/.github/workflows/branch.yml
@@ -13,7 +13,7 @@ jobs:
       - name: Check PRs
         if: github.repository == 'nf-core/pgdb'
         run: |
-          { [[ ${{github.event.pull_request.head.repo.full_name}} == nf-core/pgdb ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
+          { [[ ${{github.event.pull_request.head.repo.full_name }} == nf-core/pgdb ]] && [[ $GITHUB_HEAD_REF = "dev" ]]; } || [[ $GITHUB_HEAD_REF == "patch" ]]
 
 
       # If the above check failed, post a comment on the PR explaining the failure
@@ -23,13 +23,22 @@ jobs:
         uses: mshick/add-pr-comment@v1
         with:
           message: |
+            ## This PR is against the `master` branch :x:
+
+            * Do not close this PR
+            * Click _Edit_ and change the `base` to `dev`
+            * This CI test will remain failed until you push a new commit
+
+            ---
+
             Hi @${{ github.event.pull_request.user.login }},
 
-            It looks like this pull-request is has been made against the ${{github.event.pull_request.head.repo.full_name}} `master` branch.
+            It looks like this pull-request is has been made against the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `master` branch.
             The `master` branch on nf-core repositories should always contain code from the latest release.
-            Because of this, PRs to `master` are only allowed if they come from the ${{github.event.pull_request.head.repo.full_name}} `dev` branch.
+            Because of this, PRs to `master` are only allowed if they come from the [${{github.event.pull_request.head.repo.full_name }}](https://github.com/${{github.event.pull_request.head.repo.full_name }}) `dev` branch.
 
             You do not need to close this PR, you can change the target branch to `dev` by clicking the _"Edit"_ button at the top of this page.
+            Note that even after this, the test will continue to show as failing until you push a new commit.
 
             Thanks again for your contribution!
           repo-token: ${{ secrets.GITHUB_TOKEN }}

diff --git a/.github/workflows/ci.yml b/.github/workflows/ci.yml
@@ -17,41 +17,35 @@ jobs:
     env:
       NXF_VER: ${{ matrix.nxf_ver }}
       NXF_ANSI_LOG: false
+
     strategy:
       matrix:
         # Nextflow versions: check pipeline minimum and current latest
-        nxf_ver: ['20.04.0', '']
+        nxf_ver: ['20.04.0', '21.03.0-edge']
     steps:
       - name: Check out pipeline code
         uses: actions/checkout@v2
-
       - name: Check if Dockerfile or Conda environment changed
         uses: technote-space/get-diff-action@v4
         with:
           FILES: |
             Dockerfile
             environment.yml
-
       - name: Build new docker image
         if: env.MATCHED_FILES
-        run: docker build --no-cache . -t nfcore/pgdb:dev
-
+        run: docker build --no-cache . -t nfcore/pgdb:1.0.0
       - name: Pull docker image
         if: ${{ !env.MATCHED_FILES }}
         run: |
           docker pull nfcore/pgdb:dev
-          docker tag nfcore/pgdb:dev nfcore/pgdb:dev
-
+          docker tag nfcore/pgdb:dev nfcore/pgdb:1.0.0
       - name: Install Nextflow
         env:
           CAPSULE_LOG: none
         run: |
           wget -qO- get.nextflow.io | bash
           sudo mv nextflow /usr/local/bin/
-
       - name: Run pipeline with test data
-        # TODO nf-core: You can customise CI pipeline run tests as required
         # For example: adding multiple test runs with different parameters
         # Remember that you can parallelise this by using strategy.matrix
-        run: |
-          nextflow run ${GITHUB_WORKSPACE} -profile test,docker
+        run: nextflow run ${GITHUB_WORKSPACE} -profile test,docker
diff --git a/.github/workflows/linting.yml b/.github/workflows/linting.yml
@@ -19,6 +19,34 @@ jobs:
         run: npm install -g markdownlint-cli
       - name: Run Markdownlint
         run: markdownlint ${GITHUB_WORKSPACE} -c ${GITHUB_WORKSPACE}/.github/markdownlint.yml
+
+      # If the above check failed, post a comment on the PR explaining the failure
+      - name: Post PR comment
+        if: failure()
+        uses: mshick/add-pr-comment@v1
+        with:
+          message: |
+            ## Markdown linting is failing
+
+            To keep the code consistent with lots of contributors, we run automated code consistency checks.
+            To fix this CI test, please run:
+
+            * Install `markdownlint-cli`
+                * On Mac: `brew install markdownlint-cli`
+                * Everything else: [Install `npm`](https://www.npmjs.com/get-npm) then [install `markdownlint-cli`](https://www.npmjs.com/package/markdownlint-cli) (`npm install -g markdownlint-cli`)
+            * Fix the markdown errors
+                * Automatically: `markdownlint . --config .github/markdownlint.yml --fix`
+                * Manually resolve anything left from `markdownlint . --config .github/markdownlint.yml`
+
+            Once you push these changes the test should pass, and you can hide this comment :+1:
+
+            We highly recommend setting up markdownlint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
+
+            Thanks again for your contribution!
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
+          allow-repeats: false
+
+
   YAML:
     runs-on: ubuntu-latest
     steps:
@@ -29,7 +57,34 @@ jobs:
       - name: Install yaml-lint
         run: npm install -g yaml-lint
       - name: Run yaml-lint
-        run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml")
+        run: yamllint $(find ${GITHUB_WORKSPACE} -type f -name "*.yml" -o -name "*.yaml")
+
+      # If the above check failed, post a comment on the PR explaining the failure
+      - name: Post PR comment
+        if: failure()
+        uses: mshick/add-pr-comment@v1
+        with:
+          message: |
+            ## YAML linting is failing
+
+            To keep the code consistent with lots of contributors, we run automated code consistency checks.
+            To fix this CI test, please run:
+
+            * Install `yaml-lint`
+                * [Install `npm`](https://www.npmjs.com/get-npm) then [install `yaml-lint`](https://www.npmjs.com/package/yaml-lint) (`npm install -g yaml-lint`)
+            * Fix the markdown errors
+                * Run the test locally: `yamllint $(find . -type f -name "*.yml" -o -name "*.yaml")`
+                * Fix any reported errors in your YAML files
+
+            Once you push these changes the test should pass, and you can hide this comment :+1:
+
+            We highly recommend setting up yaml-lint in your code editor so that this formatting is done automatically on save. Ask about it on Slack for help!
+
+            Thanks again for your contribution!
+          repo-token: ${{ secrets.GITHUB_TOKEN }}
+          allow-repeats: false
+
+
   nf-core:
     runs-on: ubuntu-latest
     steps:
@@ -69,7 +124,7 @@ jobs:
         if: ${{ always() }}
         uses: actions/upload-artifact@v2
         with:
-          name: linting-log-file
+          name: linting-logs
           path: |
             lint_log.txt
             lint_results.md

diff --git a/.gitignore b/.gitignore
@@ -7,3 +7,5 @@ tests/
 testing/
 testing*
 *.pyc
+
+.idea/
diff --git a/.nf-core-lint.yml b/.nf-core-lint.yml
@@ -0,0 +1,5 @@
+## NOTE - after nf-core/tools release 1.14 delete this line and
+## uncomment the ones below. See https://github.com/nf-core/tools/pull/1019
+nextflow_config: False
+# nextflow_config:
+#  - params.input
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -3,14 +3,19 @@
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/)
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 
-## v1.0dev - [date]
+## 1.0.0
 
 Initial release of nf-core/pgdb, created with the [nf-core](https://nf-co.re/) template.
 
 ### `Added`
 
-### `Fixed`
+The initial version of the pipeline features the following steps:
 
-### `Dependencies`
+- _(optional)_ ENSEMBL Reference proteomes included in final proteome
+- Convert a Variant genome database like COSMIC or CBioPortal to proteomes
+- Convert provided VCF to proteome database
+- _(optional)_ Generate the decoy database and attach it to the final proteome
 
-### `Deprecated`
+### `Known issues`
+
+If you experience nextflow running forever after a failed step, try setting `errorStrategy = terminate`. See the corresponding [nextflow issue](https://github.com/nextflow-io/nextflow/issues/1457).
diff --git a/CITATIONS.md b/CITATIONS.md
@@ -0,0 +1,34 @@
+# nf-core/pgdb: Citations
+
+## Pipeline tools
+
+* [Nextflow](https://www.ncbi.nlm.nih.gov/pubmed/28398311/)
+  > Di Tommaso P, Chatzou M, Floden EW, Barja PP, Palumbo E, Notredame C. Nextflow enables reproducible computational workflows. Nat Biotechnol. 2017 Apr 11;35(4):316-319. doi: 10.1038/nbt.3820. PubMed PMID: 28398311.
+
+* [pypgatk](https://zenodo.org/record/4651319)
+  > Yasset Perez-Riverol, & Husen M. Umer. (2021, March 31). py-pgatk: Pre-release v0.0.19 (Version v0.0.19). Zenodo.
+
+## Data sources
+
+* [ENSEMBL](https://pubmed.ncbi.nlm.nih.gov/31691826/)
+  > Yates, A. D., Achuthan, P., Akanni, W., Allen, J., Allen, J., Alvarez-Jarreta, J., ... & Flicek, P. (2020). Ensembl 2020. Nucleic acids research, 48(D1), D682-D688.
+
+* [COSMIC](https://pubmed.ncbi.nlm.nih.gov/15188009/)
+  > Bamford, S., Dawson, E., Forbes, S., Clements, J., Pettett, R., Dogan, A., ... & Wooster, R. (2004). The COSMIC (Catalogue of Somatic Mutations in Cancer) database and website. British journal of cancer, 91(2), 355-358.
+
+* [cBioPortal](https://pubmed.ncbi.nlm.nih.gov/23550210/)
+  > Gao, J., Aksoy, B. A., Dogrusoz, U., Dresdner, G., Gross, B., Sumer, S. O., ... & Schultz, N. (2013). Integrative analysis of complex cancer genomics and clinical profiles using the cBioPortal. Science signaling, 6(269), pl1-pl1.
+
+## Software packaging/containerisation tools
+
+* [BioContainers](https://www.ncbi.nlm.nih.gov/pubmed/28379341/)
+  > da Veiga Leprevost F, Grüning BA, Alves Aflitos S, Röst HL, Uszkoreit J, Barsnes H, Vaudel M, Moreno P, Gatto L, Weber J, Bai M, Jimenez RC, Sachsenberg T, Pfeuffer J, Vera Alvarez R, Griss J, Nesvizhskii AI, Perez-Riverol Y. BioContainers: an open-source and community-driven framework for software standardization. Bioinformatics. 2017 Aug 15;33(16):2580-2582. doi: 10.1093/bioinformatics/btx192. PubMed PMID: 28379341.
+
+* [Singularity](https://www.ncbi.nlm.nih.gov/pubmed/28494014/)
+  > Kurtzer GM, Sochat V, Bauer MW. Singularity: Scientific containers for mobility of compute. PLoS One. 2017 May 11;12(5):e0177459. doi: 10.1371/journal.pone.0177459. eCollection 2017. PubMed PMID: 28494014; PubMed Central PMCID: PMC5426675.
+
+* [Conda](https://www.ncbi.nlm.nih.gov/pubmed/29967506/)
+  > Grüning B., Dale R., Sjödin A., Chapman BA., Rowe J., Tomkins-Tinch CH., Valieris R., Köster J., Bioconda Team (2018). Bioconda: sustainable and comprehensive software distribution for the life sciences. Nature methods, 15(7), 475–476. doi: 10.1038/s41592-018-0046-7. PubMed PMID: 29967506.
+
+* [Docker](https://www.docker.com/)
+  > Merkel D. (2014). Docker: lightweight Linux containers for consistent development and deployment. Linux journal, 2014(239), 2.
-Original file line number
+Diff line change
@@ Expand Up / @@ -7,3 +7,5 @@ tests/ @@
     testing/
     testing*
     *.pyc
+    .idea/