Skip to content

Commit

Permalink
docs: major overhaul of contribution guidelines and documentation (#3076
Browse files Browse the repository at this point in the history
)

This pull request is a major overhaul of the contribution guidelines. It
updates it to the current best practices and also cleans up the `I
confirm that:` list in the pull request template. Hopefully, this makes
contributing (and improving) wrappers a little bit easier.

### QC
<!-- Make sure that you can tick the boxes below. -->

* [x] I confirm that:

For all wrappers added by this PR, 

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed
arbitrarily,
* either the wrapper can only use a single core, or the example rule
contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in
[snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell
what the rule is about or match the tools purpose or name (e.g.,
`map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best
practices](https://stackoverflow.com/a/64594513/2352071),
* the `environment.yaml` pinning has been updated by running
`snakedeploy pin-conda-envs environment.yaml` on a linux machine,
* wherever possible, command line arguments are inferred and set
automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries
are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on
the wrapped tool,
* temporary files are either written to a unique hidden folder in the
working directory, or (better) stored where the Python function
`tempfile.gettempdir()` points to (see
[here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir);
this also means that using any Python `tempfile` default behavior
works),
* the `meta.yaml` contains a link to the documentation of the respective
tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with
[snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with
[black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended
ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as
conda-forge should have highest priority and defaults channels are
usually not needed because most packages are in conda-forge nowadays).

---------

Co-authored-by: Filipe G. Vieira <[email protected]>
  • Loading branch information
dlaehnemann and fgvieira authored Jul 31, 2024
1 parent a192628 commit 14f8ab8
Show file tree
Hide file tree
Showing 2 changed files with 182 additions and 55 deletions.
26 changes: 8 additions & 18 deletions .github/pull_request_template.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,12 @@
### QC
<!-- Make sure that you can tick the boxes below. -->

* [ ] I confirm that:
* [ ] I confirm that I have followed the [documentation for contributing to `snakemake-wrappers`](https://snakemake-wrappers.readthedocs.io/en/stable/contributing.html).

For all wrappers added by this PR,

* there is a test case which covers any introduced changes,
* `input:` and `output:` file paths in the resulting rule can be changed arbitrarily,
* either the wrapper can only use a single core, or the example rule contains a `threads: x` statement with `x` being a reasonable default,
* rule names in the test case are in [snake_case](https://en.wikipedia.org/wiki/Snake_case) and somehow tell what the rule is about or match the tools purpose or name (e.g., `map_reads` for a step that maps reads),
* all `environment.yaml` specifications follow [the respective best practices](https://stackoverflow.com/a/64594513/2352071),
* the `environment.yaml` pinning has been updated by running `snakedeploy pin-conda-envs environment.yaml` on a linux machine,
* wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`),
* all fields of the example rules in the `Snakefile`s and their entries are explained via comments (`input:`/`output:`/`params:` etc.),
* `stderr` and/or `stdout` are logged correctly (`log:`), depending on the wrapped tool,
* temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to (see [here](https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir); this also means that using any Python `tempfile` default behavior works),
* the `meta.yaml` contains a link to the documentation of the respective tool or command,
* `Snakefile`s pass the linting (`snakemake --lint`),
* `Snakefile`s are formatted with [snakefmt](https://github.com/snakemake/snakefmt),
* Python wrapper scripts are formatted with [black](https://black.readthedocs.io).
* Conda environments use a minimal amount of channels, in recommended ordering. E.g. for bioconda, use (conda-forge, bioconda, nodefaults, as conda-forge should have highest priority and defaults channels are usually not needed because most packages are in conda-forge nowadays).
While the contributions guidelines are more extensive, please particularly ensure that:
* [ ] `test.py` was updated to call any added or updated example rules in a `Snakefile`
* [ ] `input:` and `output:` file paths in the rules can be chosen arbitrarily
* [ ] wherever possible, command line arguments are inferred and set automatically (e.g. based on file extensions in `input:` or `output:`)
* [ ] temporary files are either written to a unique hidden folder in the working directory, or (better) stored where the Python function `tempfile.gettempdir()` points to
* [ ] the `meta.yaml` contains a link to the documentation of the respective tool or command under `url:`
* [ ] conda environments use a minimal amount of channels and packages, in recommended ordering
211 changes: 174 additions & 37 deletions docs/contributing.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,31 +14,60 @@ If you want to contribute we suggest the following procedure:

The pull request will be reviewed and included as fast as possible.
If your pull request does not get a review quickly, you can `@mention <https://github.blog/2011-03-23-mention-somebody-they-re-notified/>` previous contributors to a particular wrapper (``git blame``) or regular contributors that you think might be able to give a review.
Contributions should follow the coding style of the already present examples, i.e.:

* provide a ``meta.yaml`` that describes the wrapper (see the `meta.yaml documentation below <meta>`_)
* provide an ``environment.yaml`` which lists all required software packages and follows
`the respective best practices <https://stackoverflow.com/a/64594513/2352071>`_. The
packages should be available for installation via the
`default anaconda channels <https://anaconda.org/anaconda>`_ or via the
`conda`_ channels
`bioconda <https://bioconda.github.io/recipes.html>`_ or
`conda-forge <https://conda-forge.org/feedstocks/>`_.
Other sustainable community maintained channels are possible as well.
* add a ``wrapper.py`` or ``wrapper.R`` file that can deal with arbitrary ``input:`` and ``output:`` paths.
* provide a minimal test case in a subfolder called ``test``, with an example
``Snakefile`` that shows how to use the wrapper (rule names should be descriptive and written in `snake_case <https://en.wikipedia.org/wiki/Snake_case>`_), some minimal testing data
(also check existing wrappers for suitable data) and add an invocation of the
test in ``test.py``
* ensure consistent `formatting`_ of Python files and `linting`_ of Snakefiles.

In general, always take inspiration from existing wrappers.
And then, contributions should:

* provide the following files:

* ``meta.yaml`` (wrapper description), see :ref:`meta`
* ``environment.yaml`` (required software), see :ref:`environment`
* ``environment.linux-64.pin.yaml`` (autogenerated pinning of the software), see :ref:`environment`
* ``wrapper.py`` or ``wrapper.R`` (actual wrapper code), see :ref:`wrapper`
* ``test/Snakefile`` (minimal test cases and copy-pasteable examples), see :ref:`Snakefile`

* amend ``test.py`` to call all of the testing rules provided in ``test/Snakefile``, see :ref:`test`
* ensure consistent:

* `formatting`_ of Python files
* `linting`_ of Snakefiles


.. _development environment:

``conda``/``mamba`` environment for development
-----------------------------------------------

To have all the tools you need for developing and testing wrappers in one single ``conda``/``mamba`` environment:

1. `Install miniforge <https://github.com/conda-forge/miniforge?tab=readme-ov-file#install>`_.
2. Set up the channels as `described for bioconda <https://bioconda.github.io/#using-bioconda>`_.
3. Create an environment with the necessary dependencies:

.. code-block:: bash
mamba create -n snakemake-wrappers-development -c conda-forge -c bioconda snakemake snakefmt snakedeploy black mamba pytest
4. Activate the environment with:

.. code-block:: bash
mamba activate snakemake-wrappers-development
.. _meta:

``meta.yaml`` file
-------------------

The following fields are available to use in the wrapper ``meta.yaml`` file. All, except
those marked optional, should be provided.
This file describes the wrapper and how to use it.

The general file syntax is `YAML`_.
Text / strings (values in a `YAML`_ `mapping`_ or `sequence`_), can use `reStructuredText`_ syntax.

The following fields are available to use in the wrapper ``meta.yaml`` file.
All, except those marked optional, should be provided.
Especially make sure to include a URL of the respective tool's documentation.

* **name**: The name of the wrapper.
* **description**: a description of what the wrapper does.
Expand All @@ -49,7 +78,8 @@ those marked optional, should be provided.
* **params** (optional): A `mapping`_ of parameters that can be used in the wrapper's ``params`` directive. If no parameters are used for the wrapper, this field can be omitted.
* **notes** (optional): Anything of note that does not fit into the scope of the other fields.

You can add a newline to the rendered text in these fields with the addition of ``|nl|``
You can add a newline to the rendered text in these fields with the addition of ``|nl|``.


Example
^^^^^^^
Expand Down Expand Up @@ -77,9 +107,117 @@ Example
notes: Multiple threads can be used during compression of the output file with ``pigz``.
.. _YAML: https://yaml.org/spec/1.2.2/
.. _sequence: https://yaml.org/spec/1.2.2/#21-collections
.. _mapping: https://yaml.org/spec/1.2.2/#21-collections
.. _reStructuredText: https://www.sphinx-doc.org/en/master/usage/restructuredtext/basics.html


.. _environment:

``environment.yaml`` file
-------------------------

This file needs to list all the software that the wrapper code needes to run successfully.

For all software following `semantic versioning <https://semver.org/>`_ conventions, specify (and thus pin) the major and minor version, but leave the patch version unspecified.
Also, unless this is needed to work around version incompatibilities not properly handled by the conda packages themselves, only specify the actual software needed and let ``conda``/``mamba`` determine the dependencies.

To make sure that ``conda``/``mamba`` knows where to look for the package, include a list of all of the conda channels that the software and its dependencies require.
This will usually include `conda-forge <https://conda-forge.org/>`_, as it contains many essential libraries that other packages and tools depend on.
This channel should usually be specified first, to make sure it takes precedence (``snakemake`` asks users to ``conda config --set channel_priority strict``).
In addition, you may need to include other sustainable community maintained channels (like `bioconda <https://bioconda.github.io/>`_).
And as the last channel specification, always include ``nodefaults``.
This avoids software dependency conflicts between the ``conda-forge`` channel and the ``default`` channels that should not be needed nowadays.

Finally, make sure to run ``snakedeploy pin-conda-envs environment.yaml`` on the finished environment specification.
This will generate a file called ``environment.linux-64.pin.txt`` with all the dependency versions determined by ``conda``/``mamba``, ensuring that a particular wrapper version will always generate the exact same environment with the exact package versions from this file.
You should include this pinning file in the pull request for your wrapper.

Example
^^^^^^^

.. code-block:: yaml
channels:
- conda-forge
- bioconda
- nodefaults
dependencies:
- bioconductor-biomart =2.58
- r-nanoparquet =0.3
- r-tidyverse = 2.0
.. _wrapper:

``wrapper.py`` or ``wrapper.R`` file
------------------------------------

This is the actual code that the wrapper executes.
It is handled like an `external script in snakemake <https://snakemake.readthedocs.io/en/stable/snakefiles/rules.html#external-scripts>`_, so you have the respective `snakemake` objects available.

Please ensure that the wrapper:

* can deal with arbitrary ``input:`` and ``output:`` paths and filenames
* redirects `stdout` and `stderr` to log files specified by the `log:` directive (typical boilerplate code can for example be found in `this knowledge base <https://koesterlab.github.io/data-science-for-bioinfo/workflows/snakemake.html#language-specific-debugging>`_)
* automatically infers command line arguments wherever possible (for example based on file extensions in ``input:`` and ``output:``)
* passes on the `threads` value, if the used tool(s) allow(s) it
* writes any temporary files to a unique hidden folder in the working directory, or (better) stores them where the Python function `tempfile.gettempdir() <https://docs.python.org/3/library/tempfile.html#tempfile.gettempdir>`_ points (this also means that using any Python tempfile default behavior works)
* is formatted according to the language's standards (for Python, format it with `black <https://black.readthedocs.io/>`_: ``black wrapper.py``)

For repeatedly needed functionality you can use the `snakemake-wrapper-utils <https://github.com/snakemake/snakemake-wrapper-utils>`_.
Use what is available or create new functionality there, whenever you start repeating functions across wrappers.
Examples of this are:

1. The command line argument parsing for a software tool like ``samtools`` where you create one wrapper each for a number of different subcommands that share the main arguments. See the `samtools.py <https://github.com/snakemake/snakemake-wrapper-utils/blob/master/snakemake_wrapper_utils/samtools.py>`_ utility functions for the respective functionality.
2. The handling of recurring Java options, for example things like memory handling. See `java.py <https://github.com/snakemake/snakemake-wrapper-utils/blob/master/snakemake_wrapper_utils/java.py>`_ for the respective functionality.

To use ``snakemake-wrapper-utils``, you have to include them as a depenency in your :ref:`environment` definition file and import the respective function(s) in your :ref:`wrapper` script (for example ``from snakemake_wrapper_utils.java import get_java_opts``).


.. _snakefile:

``test/Snakefile`` file
-----------------------

In a subfolder called ``test``, create a ``Snakefile`` with example invocations of the wrapper.
These examples should comprehensively showcase the available functionality of the wrapper, as they serve as both the copy-pasteable examples rendered in the documentation, and the test cases run in the continuous integration testing (make sure to include calls to the rules in ``test.py``, see :ref:`test`).
If these rules need any input data, you can also include minimal (small) testing data in the ``test/`` folder (also check existing wrappers for suitable data).

When writing the ``Snakefile``, please ensure that:

* rule names in the examples are in `snake_case <https://en.wikipedia.org/wiki/Snake_case>`_ and descriptive (they should explain what the rule is does, or match the tool's purpose or name; for example ``map_reads`` for a step that maps reads)
* it is formatted correctly by running `snakefmt <https://github.com/snakemake/snakefmt>`_ (``snakefmt Snakefile``)
* it also passes linting, see :ref:`linting`
* all example rules in your ``test/Snakefile`` have an invocation as a test case in ``test.py``, see :ref:`test`
* wherever you can do this with a short comment, explain possible settings for all keywords like ``input:``, ``output:``, ``params:``, ``threads:``, etc. (provide longer explanations in the :ref:`meta` file)
* provide a sensible default for ``threads:``, if more than one thread can be used by the wrapper

.. _test:

``test.py`` tests file
----------------------

Every example rule listed in a :ref:`snakefile`, should be included as a test case in ``test.py``.
The easiest way is usually to duplicate an existing test and adapt it to your newly added example rule.

When done editing, make sure that ``test.py`` :ref:`formatting` still follows |black|_ standards.

Example
^^^^^^^

.. code-block:: python
@skip_if_not_modified
def test_bcftools_sort():
run(
"bio/bcftools/sort",
["snakemake", "--cores", "1", "--use-conda", "-F", "a.sorted.bcf"],
)
.. _sequence: https://yaml.org/spec/1.2/spec.html#id2759963
.. _mapping: https://yaml.org/spec/1.2/spec.html#id2759963
.. _formatting:

Expand Down Expand Up @@ -108,35 +246,34 @@ Please `lint`_ your test ``Snakefile`` with::
Testing locally
---------------

If you want to debug your contribution locally (before creating a pull request), you
can install all dependencies with |mamba|_ (or |conda|_). `Install miniconda with the
channels as described for bioconda <https://bioconda.github.io/#using-bioconda>`_ and
set up an environment with the necessary dependencies and activate it::

mamba create -n test-snakemake-wrappers snakemake pytest conda snakefmt black
conda activate test-snakemake-wrappers
If you want to debug your contribution locally before creating a pull request, ensure you have the :ref:`development environment` installed and activated.

Afterwards, from the main directory of the repo, you can run the test(s) for your
contribution by `specifying an expression <https://docs.pytest.org/en/stable/usage.html#specifying-tests-selecting-tests>`_
that matches the name(s) of your test(s) via the ``-k`` option of ``pytest``::
that matches the name(s) of your test(s) via the ``-k`` option of ``pytest``:

.. code-block:: bash
pytest test.py -v -k your_test
If you also want to test the docs generation locally, create another environment
and activate it::
and activate it:

.. code-block:: bash
mamba create -n test-snakemake-wrapper-docs -c conda-forge sphinx sphinx_rtd_theme pyyaml sphinx-copybutton sphinxawesome_theme myst-parser
mamba activate test-snakemake-wrapper-docs
mamba create -n test-snakemake-wrapper-docs sphinx sphinx_rtd_theme pyyaml sphinx-copybutton
conda activate test-snakemake-wrapper-docs
Then, enter the respective directory and build the docs:

Then, enter the respective directory and build the docs::
.. code-block:: bash
cd docs
make html
If it runs through, you can open the main page at ``docs/_build/html/index.html``
in a web browser. If you want to start fresh, you can clean up the build
with ``make clean``.
If it runs through, you can open the main page at ``docs/_build/html/index.html`` in a web browser.
If you want to start fresh, you can clean up the build with ``make clean``.


.. |mamba| replace:: ``mamba``
Expand Down

0 comments on commit 14f8ab8

Please sign in to comment.