docs: major overhaul of contribution guidelines and documentation (#3076

) This pull request is a major overhaul of the contribution guidelines. It updates it to the current best practices and also cleans up the `I confirm that:` list in the pull request template. Hopefully, this makes contributing (and improving) wrappers a little bit easier. ### QC  * [x] I confirm that: For all wrappers added by this PR, * there is a test case which covers any introduced changes, * `input:` and `output:` file paths in the resulting rule can be changed arbitrarily, * either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default, * rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads), * all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071), * the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine, * wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`), * all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.), * `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool, * temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works), * the `meta.yaml` contains a link to the documentation of the respective tool or command, * `Snakefile`s pass the linting (`snakemake --lint`), * `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt), * Python wrapper scripts are formatted with [black](https://black.readthedocs.io). * Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays). --------- Co-authored-by: Filipe G. Vieira <[email protected]>
snakemake · Jul 31, 2024 · 14f8ab8 · 14f8ab8
1 parent a192628
commit 14f8ab8
Show file tree

Hide file tree

Showing 2 changed files with 182 additions and 55 deletions.
diff --git a/.github/pull_request_template.md b/.github/pull_request_template.md
@@ -6,22 +6,12 @@
 ### QC
 <!-- Make sure that you can tick the boxes below. -->
 
-* [ ] I confirm that:
+* [ ] I confirm that I have followed the [documentation for contributing to `snakemake-wrappers`](https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html).
 
-For all wrappers added by this PR, 
-
-* there is a test case which covers any introduced changes,
-* `input:` and `output:` file paths in the resulting rule can be changed arbitrarily,
-* either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default,
-* rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads),
-* all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071),
-* the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine,
-* wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`),
-* all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.),
-* `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool,
-* temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works),
-* the `meta.yaml` contains a link to the documentation of the respective tool or command,
-* `Snakefile`s pass the linting (`snakemake --lint`),
-* `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt),
-* Python wrapper scripts are formatted with [black](https://black.readthedocs.io).
-* Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).
+While the contributions guidelines are more extensive, please particularly ensure that:
+* [ ] `test.py` was updated to call any added or updated example rules in a `Snakefile`
+* [ ] `input:` and `output:` file paths in the rules can be chosen arbitrarily
+* [ ] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`)
+* [ ] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to 
+* [ ] the `meta.yaml` contains a link to the documentation of the respective tool or command under `url:`
+* [ ] conda environments use a minimal amount of channels and packages, in recommended ordering
diff --git a/docs/contributing.rst b/docs/contributing.rst
@@ -14,31 +14,60 @@ If you want to contribute we suggest the following procedure:
 
 The pull request will be reviewed and included as fast as possible.
 If your pull request does not get a review quickly, you can `@mention <https://github.blog/2011-03-23-mention-somebody-they-re-notified/>` previous contributors to a particular wrapper (``git blame``) or regular contributors that you think might be able to give a review.
-Contributions should follow the coding style of the already present examples, i.e.:
-
-* provide a ``meta.yaml`` that describes the wrapper (see the `meta.yaml documentation below <meta>`_)
-* provide an ``environment.yaml`` which lists all required software packages and follows
-  `the respective best practices <https://stackoverflow.com/a/64594513/2352071>`_. The
-  packages should be available for installation via the
-  `default anaconda channels <https://anaconda.org/anaconda>`_ or via the
-  `conda`_ channels
-  `bioconda <https://bioconda.github.io/recipes.html>`_ or
-  `conda-forge <https://conda-forge.org/feedstocks/>`_.
-  Other sustainable community maintained channels are possible as well.
-* add a ``wrapper.py`` or ``wrapper.R`` file that can deal with arbitrary ``input:`` and ``output:`` paths.
-* provide a minimal test case in a subfolder called ``test``, with an example
-  ``Snakefile`` that shows how to use the wrapper (rule names should be descriptive and written in `snake_case <https://en.wikipedia.org/wiki/Snake_case>`_), some minimal testing data
-  (also check existing wrappers for suitable data) and add an invocation of the
-  test in ``test.py``
-* ensure consistent `formatting`_ of Python files and `linting`_ of Snakefiles.
+
+In general, always take inspiration from existing wrappers.
+And then, contributions should:
+
+* provide the following files:
+
+  * ``meta.yaml`` (wrapper description), see :ref:`meta`
+  * ``environment.yaml`` (required software), see :ref:`environment`
+  * ``environment.linux-64.pin.yaml`` (autogenerated pinning of the software), see :ref:`environment`
+  * ``wrapper.py`` or ``wrapper.R`` (actual wrapper code), see :ref:`wrapper`
+  * ``test/Snakefile`` (minimal test cases and copy-pasteable examples), see :ref:`Snakefile`
+
+* amend ``test.py`` to call all of the testing rules provided in ``test/Snakefile``, see :ref:`test`
+* ensure consistent:
+
+  * `formatting`_ of Python files
+  * `linting`_ of Snakefiles
+
+
+.. _development environment:
+
+``conda``/``mamba`` environment for development
+-----------------------------------------------
+
+To have all the tools you need for developing and testing wrappers in one single ``conda``/``mamba`` environment:
+
+1. `Install miniforge <https://github.com/conda-forge/miniforge?tab=readme-ov-file#install>`_.
+2. Set up the channels as `described for bioconda <https://bioconda.github.io/#using-bioconda>`_.
+3. Create an environment with the necessary dependencies:
+
+.. code-block:: bash
+
+  mamba create -n snakemake-wrappers-development -c conda-forge -c bioconda snakemake snakefmt snakedeploy black mamba pytest
+
+4. Activate the environment with:
+
+.. code-block:: bash
+
+  mamba activate snakemake-wrappers-development
+
 
 .. _meta:
 
 ``meta.yaml`` file
 -------------------
 
-The following fields are available to use in the wrapper ``meta.yaml`` file. All, except
-those marked optional, should be provided.
+This file describes the wrapper and how to use it.
+
+The general file syntax is `YAML`_.
+Text / strings (values in a `YAML`_ `mapping`_ or `sequence`_), can use `reStructuredText`_ syntax.
+
+The following fields are available to use in the wrapper ``meta.yaml`` file.
+All, except those marked optional, should be provided.
+Especially make sure to include a URL of the respective tool's documentation.
 
 * **name**: The name of the wrapper.
 * **description**: a description of what the wrapper does.
@@ -49,7 +78,8 @@ those marked optional, should be provided.
 * **params** (optional): A `mapping`_ of parameters that can be used in the wrapper's ``params`` directive. If no parameters are used for the wrapper, this field can be omitted.
 * **notes** (optional): Anything of note that does not fit into the scope of the other fields.
 
-You can add a newline to the rendered text in these fields with the addition of ``|nl|``
+You can add a newline to the rendered text in these fields with the addition of ``|nl|``.
+
 
 Example
 ^^^^^^^
@@ -77,9 +107,117 @@ Example
     notes: Multiple threads can be used during compression of the output file with ``pigz``.
 
 
+.. _YAML: https://yaml.org/spec/1.2.2/
+.. _sequence: https://yaml.org/spec/1.2.2/#21-collections
+.. _mapping: https://yaml.org/spec/1.2.2/#21-collections
+.. _reStructuredText: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html
+
+
+.. _environment:
+
+``environment.yaml`` file
+-------------------------
+
+This file needs to list all the software that the wrapper code needes to run successfully.
+
+For all software following `semantic versioning <https://semver.org/>`_ conventions, specify (and thus pin) the major and minor version, but leave the patch version unspecified.
+Also, unless this is needed to work around version incompatibilities not properly handled by the conda packages themselves, only specify the actual software needed and let ``conda``/``mamba`` determine the dependencies.
+
+To make sure that ``conda``/``mamba`` knows where to look for the package, include a list of all of the conda channels that the software and its dependencies require.
+This will usually include `conda-forge <https://conda-forge.org/>`_, as it contains many essential libraries that other packages and tools depend on.
+This channel should usually be specified first, to make sure it takes precedence (``snakemake`` asks users to ``conda config --set channel_priority strict``).
+In addition, you may need to include other sustainable community maintained channels (like `bioconda <https://bioconda.github.io/>`_).
+And as the last channel specification, always include ``nodefaults``.
+This avoids software dependency conflicts between the ``conda-forge`` channel and the ``default`` channels that should not be needed nowadays.
+
+Finally, make sure to run ``snakedeploy pin-conda-envs environment.yaml`` on the finished environment specification.
+This will generate a file called ``environment.linux-64.pin.txt`` with all the dependency versions determined by ``conda``/``mamba``, ensuring that a particular wrapper version will always generate the exact same environment with the exact package versions from this file.
+You should include this pinning file in the pull request for your wrapper.
+
+Example
+^^^^^^^
+
+.. code-block:: yaml
+
+    channels:
+      - conda-forge
+      - bioconda
+      - nodefaults
+    dependencies:
+      - bioconductor-biomart =2.58
+      - r-nanoparquet =0.3
+      - r-tidyverse = 2.0
+
+
+.. _wrapper:
+
+``wrapper.py`` or ``wrapper.R`` file
+------------------------------------
+
+This is the actual code that the wrapper executes.
+It is handled like an `external script in snakemake <https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#external-scripts>`_, so you have the respective `snakemake` objects available.
+
+Please ensure that the wrapper:
+
+* can deal with arbitrary ``input:`` and ``output:`` paths and filenames
+* redirects `stdout` and `stderr` to log files specified by the `log:` directive (typical boilerplate code can for example be found in `this knowledge base <https://koesterlab.github.io/data-science-for-bioinfo/workflows/snakemake.html#language-specific-debugging>`_)
+* automatically infers command line arguments wherever possible (for example based on file extensions in ``input:`` and ``output:``)
+* passes on the `threads` value, if the used tool(s) allow(s) it
+* writes any temporary files to a unique hidden folder in the working directory, or (better) stores them where the Python function `tempfile.gettempdir() <https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir>`_ points (this also means that using any Python tempfile default behavior works)
+* is formatted according to the language's standards (for Python, format it with `black <https://black.readthedocs.io/>`_: ``black wrapper.py``)
+
+For repeatedly needed functionality you can use the `snakemake-wrapper-utils <https://github.com/snakemake/snakemake-wrapper-utils>`_.
+Use what is available or create new functionality there, whenever you start repeating functions across wrappers.
+Examples of this are:
+
+1. The command line argument parsing for a software tool like ``samtools`` where you create one wrapper each for a number of different subcommands that share the main arguments. See the `samtools.py <https://github.com/snakemake/snakemake-wrapper-utils/blob/master/snakemake_wrapper_utils/samtools.py>`_ utility functions for the respective functionality.
+2. The handling of recurring Java options, for example things like memory handling. See `java.py <https://github.com/snakemake/snakemake-wrapper-utils/blob/master/snakemake_wrapper_utils/java.py>`_ for the respective functionality.
+
+To use ``snakemake-wrapper-utils``, you have to include them as a depenency in your :ref:`environment` definition file and import the respective function(s) in your :ref:`wrapper` script (for example ``from snakemake_wrapper_utils.java import get_java_opts``).
+
+
+.. _snakefile:
+
+``test/Snakefile`` file
+-----------------------
+
+In a subfolder called ``test``, create a ``Snakefile`` with example invocations of the wrapper.
+These examples should comprehensively showcase the available functionality of the wrapper, as they serve as both the copy-pasteable examples rendered in the documentation, and the test cases run in the continuous integration testing (make sure to include calls to the rules in ``test.py``, see :ref:`test`).
+If these rules need any input data, you can also include minimal (small) testing data in the ``test/`` folder (also check existing wrappers for suitable data).
+
+When writing the ``Snakefile``, please ensure that:
+
+* rule names in the examples are in `snake_case <https://en.wikipedia.org/wiki/Snake_case>`_ and descriptive (they should explain what the rule is does, or match the tool's purpose or name; for example ``map_reads`` for a step that maps reads)
+* it is formatted correctly by running `snakefmt <https://github.com/snakemake/snakefmt>`_ (``snakefmt Snakefile``)
+* it also passes linting, see :ref:`linting`
+* all example rules in your ``test/Snakefile`` have an invocation as a test case in ``test.py``, see :ref:`test`
+* wherever you can do this with a short comment, explain possible settings for all keywords like ``input:``, ``output:``, ``params:``, ``threads:``, etc. (provide longer explanations in the :ref:`meta` file)
+* provide a sensible default for ``threads:``, if more than one thread can be used by the wrapper
+
+.. _test:
+
+``test.py`` tests file
+----------------------
+
+Every example rule listed in a :ref:`snakefile`, should be included as a test case in ``test.py``.
+The easiest way is usually to duplicate an existing test and adapt it to your newly added example rule.
+
+When done editing, make sure that ``test.py`` :ref:`formatting` still follows |black|_ standards.
+
+Example
+^^^^^^^
+
+.. code-block:: python
+
+    @skip_if_not_modified
+    def test_bcftools_sort():
+        run(
+            "bio/bcftools/sort",
+            ["snakemake", "--cores", "1", "--use-conda", "-F", "a.sorted.bcf"],
+        )
+
+
 
-.. _sequence: https://yaml.org/spec/1.2/spec.html#id2759963
-.. _mapping: https://yaml.org/spec/1.2/spec.html#id2759963
 
 .. _formatting:
 
@@ -108,35 +246,34 @@ Please `lint`_ your test ``Snakefile`` with::
 Testing locally
 ---------------
 
-If you want to debug your contribution locally (before creating a pull request), you
-can install all dependencies with |mamba|_ (or |conda|_). `Install miniconda with the
-channels as described for bioconda <https://bioconda.github.io/#using-bioconda>`_ and
-set up an environment with the necessary dependencies and activate it::
-
-  mamba create -n test-snakemake-wrappers snakemake pytest conda snakefmt black
-  conda activate test-snakemake-wrappers
+If you want to debug your contribution locally before creating a pull request, ensure you have the :ref:`development environment` installed and activated.
 
 Afterwards, from the main directory of the repo, you can run the test(s) for your
 contribution by `specifying an expression <https://docs.pytest.org/en/stable/usage.html#specifying-tests-selecting-tests>`_
-that matches the name(s) of your test(s) via the ``-k`` option of ``pytest``::
+that matches the name(s) of your test(s) via the ``-k`` option of ``pytest``:
+
+.. code-block:: bash
 
   pytest test.py -v -k your_test
 
 
 If you also want to test the docs generation locally, create another environment
-and activate it::
+and activate it:
+
+.. code-block:: bash
+
+  mamba create -n test-snakemake-wrapper-docs -c conda-forge sphinx sphinx_rtd_theme pyyaml sphinx-copybutton sphinxawesome_theme myst-parser
+  mamba activate test-snakemake-wrapper-docs
 
-  mamba create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml sphinx-copybutton
-  conda activate test-snakemake-wrapper-docs
+Then, enter the respective directory and build the docs:
 
-Then, enter the respective directory and build the docs::
+.. code-block:: bash
 
   cd docs
   make html
 
-If it runs through, you can open the main page at ``docs/_build/html/index.html``
-in a web browser. If you want to start fresh, you can clean up the build
-with ``make clean``.
+If it runs through, you can open the main page at ``docs/_build/html/index.html`` in a web browser.
+If you want to start fresh, you can clean up the build with ``make clean``.
 
 
 .. |mamba| replace:: ``mamba``