From 14f8ab8bcde3e80c7afec8fa4c80d93b1ec2f44e Mon Sep 17 00:00:00 2001 From: David Laehnemann Date: Wed, 31 Jul 2024 22:15:52 +0200 Subject: [PATCH] docs: major overhaul of contribution guidelines and documentation (#3076) This pull request is a major overhaul of the contribution guidelines. It updates it to the current best practices and also cleans up the `I confirm that:` list in the pull request template. Hopefully, this makes contributing (and improving) wrappers a little bit easier. ### QC * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). --------- Co-authored-by: Filipe G. Vieira <1151762+fgvieira@users.noreply.github.com> --- .github/pull_request_template.md | 26 ++-- docs/contributing.rst | 211 +++++++++++++++++++++++++------ 2 files changed, 182 insertions(+), 55 deletions(-) diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md index 2d3457e81c..159e7f1e2c 100644 --- a/.github/pull_request_template.md +++ b/.github/pull_request_template.md @@ -6,22 +6,12 @@ ### QC -* [ ] I confirm that: +* [ ] I confirm that I have followed the [documentation for contributing to `snakemake-wrappers`](https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html). -For all wrappers added by this PR, - -* there is a test case which covers any introduced changes, -* `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, -* either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, -* rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), -* all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), -* the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, -* wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), -* all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), -* `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, -* temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), -* the `meta.yaml` contains a link to the documentation of the respective tool or command, -* `Snakefile`s pass the linting (`snakemake --lint`), -* `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), -* Python wrapper scripts are formatted with [black](https://black.readthedocs.io). -* Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). +While the contributions guidelines are more extensive, please particularly ensure that: +* [ ] `test.py` was updated to call any added or updated example rules in a `Snakefile` +* [ ] `input:` and `output:` file paths in the rules can be chosen arbitrarily +* [ ] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`) +* [ ] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to +* [ ] the `meta.yaml` contains a link to the documentation of the respective tool or command under `url:` +* [ ] conda environments use a minimal amount of channels and packages, in recommended ordering diff --git a/docs/contributing.rst b/docs/contributing.rst index 4d30670c74..64800c7aea 100644 --- a/docs/contributing.rst +++ b/docs/contributing.rst @@ -14,31 +14,60 @@ If you want to contribute we suggest the following procedure: The pull request will be reviewed and included as fast as possible. If your pull request does not get a review quickly, you can `@mention ` previous contributors to a particular wrapper (``git blame``) or regular contributors that you think might be able to give a review. -Contributions should follow the coding style of the already present examples, i.e.: - -* provide a ``meta.yaml`` that describes the wrapper (see the `meta.yaml documentation below `_) -* provide an ``environment.yaml`` which lists all required software packages and follows - `the respective best practices `_. The - packages should be available for installation via the - `default anaconda channels `_ or via the - `conda`_ channels - `bioconda `_ or - `conda-forge `_. - Other sustainable community maintained channels are possible as well. -* add a ``wrapper.py`` or ``wrapper.R`` file that can deal with arbitrary ``input:`` and ``output:`` paths. -* provide a minimal test case in a subfolder called ``test``, with an example - ``Snakefile`` that shows how to use the wrapper (rule names should be descriptive and written in `snake_case `_), some minimal testing data - (also check existing wrappers for suitable data) and add an invocation of the - test in ``test.py`` -* ensure consistent `formatting`_ of Python files and `linting`_ of Snakefiles. + +In general, always take inspiration from existing wrappers. +And then, contributions should: + +* provide the following files: + + * ``meta.yaml`` (wrapper description), see :ref:`meta` + * ``environment.yaml`` (required software), see :ref:`environment` + * ``environment.linux-64.pin.yaml`` (autogenerated pinning of the software), see :ref:`environment` + * ``wrapper.py`` or ``wrapper.R`` (actual wrapper code), see :ref:`wrapper` + * ``test/Snakefile`` (minimal test cases and copy-pasteable examples), see :ref:`Snakefile` + +* amend ``test.py`` to call all of the testing rules provided in ``test/Snakefile``, see :ref:`test` +* ensure consistent: + + * `formatting`_ of Python files + * `linting`_ of Snakefiles + + +.. _development environment: + +``conda``/``mamba`` environment for development +----------------------------------------------- + +To have all the tools you need for developing and testing wrappers in one single ``conda``/``mamba`` environment: + +1. `Install miniforge `_. +2. Set up the channels as `described for bioconda `_. +3. Create an environment with the necessary dependencies: + +.. code-block:: bash + + mamba create -n snakemake-wrappers-development -c conda-forge -c bioconda snakemake snakefmt snakedeploy black mamba pytest + +4. Activate the environment with: + +.. code-block:: bash + + mamba activate snakemake-wrappers-development + .. _meta: ``meta.yaml`` file ------------------- -The following fields are available to use in the wrapper ``meta.yaml`` file. All, except -those marked optional, should be provided. +This file describes the wrapper and how to use it. + +The general file syntax is `YAML`_. +Text / strings (values in a `YAML`_ `mapping`_ or `sequence`_), can use `reStructuredText`_ syntax. + +The following fields are available to use in the wrapper ``meta.yaml`` file. +All, except those marked optional, should be provided. +Especially make sure to include a URL of the respective tool's documentation. * **name**: The name of the wrapper. * **description**: a description of what the wrapper does. @@ -49,7 +78,8 @@ those marked optional, should be provided. * **params** (optional): A `mapping`_ of parameters that can be used in the wrapper's ``params`` directive. If no parameters are used for the wrapper, this field can be omitted. * **notes** (optional): Anything of note that does not fit into the scope of the other fields. -You can add a newline to the rendered text in these fields with the addition of ``|nl|`` +You can add a newline to the rendered text in these fields with the addition of ``|nl|``. + Example ^^^^^^^ @@ -77,9 +107,117 @@ Example notes: Multiple threads can be used during compression of the output file with ``pigz``. +.. _YAML: https://yaml.org/spec/1.2.2/ +.. _sequence: https://yaml.org/spec/1.2.2/#21-collections +.. _mapping: https://yaml.org/spec/1.2.2/#21-collections +.. _reStructuredText: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html + + +.. _environment: + +``environment.yaml`` file +------------------------- + +This file needs to list all the software that the wrapper code needes to run successfully. + +For all software following `semantic versioning `_ conventions, specify (and thus pin) the major and minor version, but leave the patch version unspecified. +Also, unless this is needed to work around version incompatibilities not properly handled by the conda packages themselves, only specify the actual software needed and let ``conda``/``mamba`` determine the dependencies. + +To make sure that ``conda``/``mamba`` knows where to look for the package, include a list of all of the conda channels that the software and its dependencies require. +This will usually include `conda-forge `_, as it contains many essential libraries that other packages and tools depend on. +This channel should usually be specified first, to make sure it takes precedence (``snakemake`` asks users to ``conda config --set channel_priority strict``). +In addition, you may need to include other sustainable community maintained channels (like `bioconda `_). +And as the last channel specification, always include ``nodefaults``. +This avoids software dependency conflicts between the ``conda-forge`` channel and the ``default`` channels that should not be needed nowadays. + +Finally, make sure to run ``snakedeploy pin-conda-envs environment.yaml`` on the finished environment specification. +This will generate a file called ``environment.linux-64.pin.txt`` with all the dependency versions determined by ``conda``/``mamba``, ensuring that a particular wrapper version will always generate the exact same environment with the exact package versions from this file. +You should include this pinning file in the pull request for your wrapper. + +Example +^^^^^^^ + +.. code-block:: yaml + + channels: + - conda-forge + - bioconda + - nodefaults + dependencies: + - bioconductor-biomart =2.58 + - r-nanoparquet =0.3 + - r-tidyverse = 2.0 + + +.. _wrapper: + +``wrapper.py`` or ``wrapper.R`` file +------------------------------------ + +This is the actual code that the wrapper executes. +It is handled like an `external script in snakemake `_, so you have the respective `snakemake` objects available. + +Please ensure that the wrapper: + +* can deal with arbitrary ``input:`` and ``output:`` paths and filenames +* redirects `stdout` and `stderr` to log files specified by the `log:` directive (typical boilerplate code can for example be found in `this knowledge base `_) +* automatically infers command line arguments wherever possible (for example based on file extensions in ``input:`` and ``output:``) +* passes on the `threads` value, if the used tool(s) allow(s) it +* writes any temporary files to a unique hidden folder in the working directory, or (better) stores them where the Python function `tempfile.gettempdir() `_ points (this also means that using any Python tempfile default behavior works) +* is formatted according to the language's standards (for Python, format it with `black `_: ``black wrapper.py``) + +For repeatedly needed functionality you can use the `snakemake-wrapper-utils `_. +Use what is available or create new functionality there, whenever you start repeating functions across wrappers. +Examples of this are: + +1. The command line argument parsing for a software tool like ``samtools`` where you create one wrapper each for a number of different subcommands that share the main arguments. See the `samtools.py `_ utility functions for the respective functionality. +2. The handling of recurring Java options, for example things like memory handling. See `java.py `_ for the respective functionality. + +To use ``snakemake-wrapper-utils``, you have to include them as a depenency in your :ref:`environment` definition file and import the respective function(s) in your :ref:`wrapper` script (for example ``from snakemake_wrapper_utils.java import get_java_opts``). + + +.. _snakefile: + +``test/Snakefile`` file +----------------------- + +In a subfolder called ``test``, create a ``Snakefile`` with example invocations of the wrapper. +These examples should comprehensively showcase the available functionality of the wrapper, as they serve as both the copy-pasteable examples rendered in the documentation, and the test cases run in the continuous integration testing (make sure to include calls to the rules in ``test.py``, see :ref:`test`). +If these rules need any input data, you can also include minimal (small) testing data in the ``test/`` folder (also check existing wrappers for suitable data). + +When writing the ``Snakefile``, please ensure that: + +* rule names in the examples are in `snake_case `_ and descriptive (they should explain what the rule is does, or match the tool's purpose or name; for example ``map_reads`` for a step that maps reads) +* it is formatted correctly by running `snakefmt `_ (``snakefmt Snakefile``) +* it also passes linting, see :ref:`linting` +* all example rules in your ``test/Snakefile`` have an invocation as a test case in ``test.py``, see :ref:`test` +* wherever you can do this with a short comment, explain possible settings for all keywords like ``input:``, ``output:``, ``params:``, ``threads:``, etc. (provide longer explanations in the :ref:`meta` file) +* provide a sensible default for ``threads:``, if more than one thread can be used by the wrapper + +.. _test: + +``test.py`` tests file +---------------------- + +Every example rule listed in a :ref:`snakefile`, should be included as a test case in ``test.py``. +The easiest way is usually to duplicate an existing test and adapt it to your newly added example rule. + +When done editing, make sure that ``test.py`` :ref:`formatting` still follows |black|_ standards. + +Example +^^^^^^^ + +.. code-block:: python + + @skip_if_not_modified + def test_bcftools_sort(): + run( + "bio/bcftools/sort", + ["snakemake", "--cores", "1", "--use-conda", "-F", "a.sorted.bcf"], + ) + + -.. _sequence: https://yaml.org/spec/1.2/spec.html#id2759963 -.. _mapping: https://yaml.org/spec/1.2/spec.html#id2759963 .. _formatting: @@ -108,35 +246,34 @@ Please `lint`_ your test ``Snakefile`` with:: Testing locally --------------- -If you want to debug your contribution locally (before creating a pull request), you -can install all dependencies with |mamba|_ (or |conda|_). `Install miniconda with the -channels as described for bioconda `_ and -set up an environment with the necessary dependencies and activate it:: - - mamba create -n test-snakemake-wrappers snakemake pytest conda snakefmt black - conda activate test-snakemake-wrappers +If you want to debug your contribution locally before creating a pull request, ensure you have the :ref:`development environment` installed and activated. Afterwards, from the main directory of the repo, you can run the test(s) for your contribution by `specifying an expression `_ -that matches the name(s) of your test(s) via the ``-k`` option of ``pytest``:: +that matches the name(s) of your test(s) via the ``-k`` option of ``pytest``: + +.. code-block:: bash pytest test.py -v -k your_test If you also want to test the docs generation locally, create another environment -and activate it:: +and activate it: + +.. code-block:: bash + + mamba create -n test-snakemake-wrapper-docs -c conda-forge sphinx sphinx_rtd_theme pyyaml sphinx-copybutton sphinxawesome_theme myst-parser + mamba activate test-snakemake-wrapper-docs - mamba create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml sphinx-copybutton - conda activate test-snakemake-wrapper-docs +Then, enter the respective directory and build the docs: -Then, enter the respective directory and build the docs:: +.. code-block:: bash cd docs make html -If it runs through, you can open the main page at ``docs/_build/html/index.html`` -in a web browser. If you want to start fresh, you can clean up the build -with ``make clean``. +If it runs through, you can open the main page at ``docs/_build/html/index.html`` in a web browser. +If you want to start fresh, you can clean up the build with ``make clean``. .. |mamba| replace:: ``mamba``