tags | |
---|---|
|
Python is becoming more ubiquitous across The Data Shed engineering space, often the first choice of language for a new project.
The purpose of this page is to outline some best practices for Python development across t'Shed in order to facilitate standardisation and whatnot.
We should push these standards where we own the project.
For non-Shed-owned projects we should aim to follow existing standards set out 🤷♂️
The default version you should use is the latest currently supported across all 3 major cloud providers (Azure, AWS, GCP.) At the time of writing, this is 3.9.
If your project does not currently use this version then you need to schedule a work item to upgrade. The version should be maintained as per the above and this should be accommodated in every project roadmap.
Never 2. Shedders don't let Shedders use Python 2.
While Python is pre-installed or easily available in many operating systems, we
recommend: pyenv
.
Note: it should be installed using the
pyenv-installer
tool.
See the pyenv
documentation on
managing multiple Python versions for different projects.
For managing virtual environments using pyenv
, use the
pyenv-virtualenv
plugin.
Note: this is installed by default when using the above pyenv-installer
tool.
See the pyenv-virtualenv
documentation
on managing multiple virtual environments for different projects.
We should aim for consistency and predictability across our Python code.
Python code should be formatted using
black
, using the latest
stable
version. Flake8
can be a useful tool
for this.
Imports should be ordered as per isort
(see
also the
black
compatibility guide.)
It effectively requires passing --profile black
to the isort command in your
editor config.
Stick to the default 88 character length. It's easier to maintain across projects and it's handy when splitting a window or terminal for editing and still being able to see all the text in both files. In conjunction with isort, we go from:
From:
from statistics import mean
import typing as t
numbers = [1, 2, 4,
2,10, 12]
def calculate_mean(numbers:t.List[t.Union[int,float]]) :
return mean(numbers)
To:
import typing as t
from statistics import mean
numbers = [1, 2, 4, 2, 10, 12]
def calculate_mean(numbers: t.List[t.Union[int, float]]):
return mean(numbers)
When introducing black
to a new project (or upgrading to a more recent
version), make any formatting changes in an isolated merge-/pull-request. Don't
confuse formatting with changes in logic.
- Sphinx/Rst style docstrings
darglint
is a useful tool for
verifying that docstrings match the expected style (It works with both Sphinx
and Google style docstrings). It also adds Flake8 linting of doc strings if you
read the documentation on how to enable that.
PEP-257
safety
will check whether a Python project's dependencies have any known vulnerabilities.- Note: the free version of
safety
is only updated monthly and as such carries some risk. Alternative suggestions are encouraged.
- Note: the free version of
- If a project is hosted on GitHub, their own Dependabot should be used for the same purpose.
pytest
should be used for all new projects.
Existing projects should plan to migrate to pytest
though this can be
facilitated by
running any existing unittest
tests with pytest
.
All new tests, however, should use pytest
.
- the
pytest
pluginpytest-cov
should be used to measure code coverage, ideally on all projects; diff_cover
is suitable for existing projects, enforcing the required degree of test coverage for only those lines changed.
bandit
is a tool for identifying common security issues in Python and should be adopted into all relevant assurance pipelines.pytest-mypy
can be used for static type-checking viamypy
.flake8
can be used for fast and generally fast errors with your code.
.gitignore
file as provided by gitignore.io.
The above should be used and any updates reflected in the template for all to use.
Although GitHub provides similar templates, they are more specific, missing many of the ancillary files added by OSs, IDEs, etc.
pip
, making sure to update regularly to the
latest version.
Specifically, use the python3 -m pip
variant for calling the pip
module
directly.
Some of the following fall outside the scope of this document but are included until such time as similar documents exist to replace these next headings.
Others, such as the use of type hints in Python code, are currently only recommendations, though this may change as tooling gains more adoption.
All projects must include a README which is formatted as per the
CommonMark specification and validated using
markdownlint
(or
markdownlint-cli
).
pre-commit
is a framework for managing pre-commit
hooks in variety of languages, including pre-configured hooks for many of the
above tools. Its use is highly encouraged.
Use of type hints using Python's
typing
module is highly
encouraged.
Common IDEs include:
- Visual Studio Code.
- see also the PyLance plugin.
- PyCharm.
- Neovim. For IntelliSense/IDE like environment check out Michael Park's neovim config: here
If all of this is overwhelming and there are too many things to manage, you can use a cookie-cutter curated to include the above. This can be found here:
Data Shed Cookie Cutter Python
It also includes a Data Shed badge.
Install the latest Cookiecutter if you haven't installed it yet (this requires Cookiecutter 1.4.0 or higher)
pip install -U cookiecutter
cookiecutter https://github.com/TheDataShed/cookiecutter-pypackage.git
--8<-- "includes/acronyms.md"