-
Notifications
You must be signed in to change notification settings - Fork 15
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Initial commit of paper and bibliography.
- Loading branch information
Showing
2 changed files
with
180 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,30 @@ | ||
@book{martin2008, | ||
author = {Martin, Robert C.}, | ||
title = {Clean Code: A Handbook of Agile Software Craftsmanship}, | ||
year = {2008}, | ||
isbn = {0132350882}, | ||
publisher = {Prentice Hall PTR}, | ||
address = {USA}, | ||
edition = {1} | ||
} | ||
|
||
@article{perkel2022, | ||
author = {Perkel, Jeffrey M.}, | ||
title = {How to fix your scientific coding errors}, | ||
year = {2022}, | ||
month = {02}, | ||
pages = {172-173}, | ||
title = {How to fix your scientific coding errors}, | ||
volume = {602}, | ||
journal = {Nature}, | ||
doi = {10.1038/d41586-022-00217-0} | ||
} | ||
|
||
@misc{copier, | ||
author = {Juan-Pablo Scaletti and Ben Felder and Jairo Llopis and Timothée Mazzucotelli and Sigurd Spieckermann and Copier Contributors}, | ||
title = {Copier}, | ||
year = {2024}, | ||
publisher = {GitHub}, | ||
journal = {GitHub repository}, | ||
url = {https://github.com/copier-org/copier} | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,150 @@ | ||
--- | ||
title: 'A Python Project Template for Healthy Scientific Software' | ||
tags: | ||
- Python | ||
- template | ||
authors: | ||
- name: Drew Oldag | ||
orcid: asdfasdf | ||
equal-contrib: true | ||
affiliation: "1, 2" | ||
- name: Melissa DeLucchi | ||
orcid: asdfasdf | ||
equal-contrib: true | ||
affiliation: "1, 3" | ||
- name: Wilson Beebe | ||
orcid: asdfasdf | ||
affiliation: "1, 2" | ||
- name: Doug Branton | ||
orcid: asdfasdf | ||
affiliation: "1, 2" | ||
- name: Sandro Campos | ||
orcid: asdfasdf | ||
affiliation: "1, 3" | ||
- name: Carl Christofferson | ||
orcid: asdfasdf | ||
affiliation: "1, 2" | ||
- name: Andrew Connolly <-----------???? | ||
orcid: asdfasdf | ||
affiliation: "1, 2" | ||
- name: Jeremy Kubica | ||
orcid: asdfasdf | ||
affiliation: "1, 3" | ||
- name: Olivia Lynn | ||
orcid: asdfasdf | ||
affiliation: "1, 3" | ||
- name: Kostya Malanchev | ||
orcid: asdfasdf | ||
affiliation: "1, 3" | ||
- name: Alex Malz <----------???? | ||
orcid: asdfasdf | ||
affiliation: "1, 3" | ||
- name: Sean McGuire | ||
orcid: asdfasdf | ||
affiliation: "1, 3" | ||
- name: Chris Wenneman | ||
orcid: asdfasdf | ||
affiliation: "1, 2" | ||
affiliations: | ||
- name: LINCC-Frameworks | ||
index: 1 | ||
- name: University of Washington | ||
index: 2 | ||
- name: Carnegie Mellon University | ||
index: 3 | ||
date: 28 February 2024 | ||
bibliography: paper.bib | ||
--- | ||
|
||
# Summary | ||
|
||
The creation of healthy code is vital for its successful long term use in scientific research. To maximize impact throughout the community, software packages must be accurate, usable, and maintainable. Here we discuss several engineering processes that are important for developing healthy software. Unfortunately these processes often require configuration leading to short-term overhead for new projects. We introduce the LINCC-Frameworks Python Project Template, a configurable template designed for scientific software projects that greatly simplifies adopting such practices by automating the setup and configuration of important code health tools. | ||
|
||
# Statement of need | ||
|
||
Software has long played a vital role in analyzing scientific data and driving new discoveries. As the scientific community continues to build upon established projects and create novel algorithms, there is a need to ensure the accuracy, repeatability, usability, and maintainability of the software. The first factor is required for the validity of the scientific results, while the latter factors enable the software to have a broad, sustained impact. | ||
|
||
The LINCC-Frameworks Python Project Template (LF-PPT) was originally created with the needs of astronomers in mind, however, it became apparent that it is broadly applicable for many scientific use cases. The LF-PPT codifies our best engineering practices because we were unable to find existing tooling that met our needs. Many other templates exist for specific applications that include non-trivial amounts of code, but we wanted a template that 1) is agnostic to specific applications 2) includes tooling needed for healthy software 3) doesn’t preclude use of other application-specific templates and 4) is updatable as best practices and tooling evolve. | ||
|
||
|
||
# Code health processes | ||
|
||
There exists a wide variety of literature on best practices for developing healthy code (see for example [@martin2008] or [@perkel2022]). In this section we discuss several such practices that benefit from the existence of automated tooling. This list is not meant to be exhaustive, but each item should be considered important for developing a trustworthy and impactful software package. | ||
|
||
## Automated testing | ||
|
||
Automated testing is critical to ensuring the correctness of code during initial implementation and through ongoing changes. Code should be validated by a comprehensive suite of tests including unit tests, which confirm the accuracy of individual functions, and integration tests that ensure end-to-end functionality for supported use cases. A comprehensive and automated test suite is an indicator of a trustworthy software package. | ||
|
||
The LF-PPT supports automated testing by configuring several different continuous integration GitHub workflows, that run test suites automatically: | ||
- at each push or pull request to ensure that changes to the current branch do not break the program’s behavior and | ||
- in regularly scheduled smoke tests (usually daily) to ensure that the code has not broken due to a change of the project’s code or the behavior of its dependencies. | ||
|
||
## Documentation | ||
|
||
Thorough documentation of code allows others to use the package more readily and encourages maintainability. The LF-PPT supports the use of sphinx[^1] for integration with ReadTheDocs[^2] to automatically render Python docstrings into well formatted documentation alongside manually written documentation. Additionally, to demonstrate the use of the package in context, the LF-PPT provides automatic rendering of example Jupyter notebooks within the documentation. | ||
|
||
## Distribution | ||
|
||
Software is only useful to the broader community if other users can find, install, and update it. Package management systems such as pip[^3] or conda[^4] provide simple tools for a user to download a new software package and its dependencies from public repositories such as PyPI[^7] or conda-forge[^8]. To enable easy code installation with pip or conda the LF-PPT provides support for automatic distribution through both PyPI and conda-forge when the user applies a new git tag via the GitHub UI. | ||
|
||
## Additional code health tools | ||
|
||
The LF-PPT supports many more features than those listed here including static code analysis, performance benchmarking, code testing coverage and more. For a complete list, please see the documentation at http://lincc-ppt.readthedocs.io/. | ||
|
||
The most notable exclusion from the LF-PPT is code. Aside from a few optional stub source and test files, the LF-PPT allows the user focus on scientific code, while supporting them with the industry best practices in maintainable software engineering. | ||
|
||
# Usage | ||
|
||
## General usage | ||
|
||
The LINCC-Frameworks Python Project Template automates the setup of the above processes for Python projects hosted on GitHub. The only direct dependency is copier[^5] [@copier], which is used as the engine to generate new projects from the LF-PPT with a specific directory structure, the requested configuration and stub files. To hydrate a new or existing project with the LF-PPT via copier the user calls: | ||
|
||
> copier copy gh:lincc-frameworks/python-project-template <new/project/directory> | ||
A questionnaire is presented to configure the project and establish the various features of the template to include. After the directory structure and files have been generated, the user should run the included initialization script to configure the local git repository and install the new package in the virtual environment: | ||
|
||
> cd <new/project/directory> | ||
> bash .initialize-new-project.sh | ||
The process is designed so as not to require significant time, and, thus, if the user is unhappy with the generated project, they can simply delete and recreate it. Depending on the options selected, some additional configuration may be required, such as registering with ReadTheDocs or PyPI. To assist the user, a customized, post-creation checklist is generated. | ||
|
||
## Existing projects | ||
|
||
In addition to creating new projects from scratch, the LF-PPT can be applied to existing projects to incorporate the features it provides. Often a more established project will require more effort to apply the LF-PPT. However, LINCC-Frameworks has had success applying the template to multiple legacy projects. A collection of tips for applying the template to existing projects is included in the documentation http://lincc-ppt.readthedocs.io/. | ||
|
||
## Updating projects | ||
|
||
As best practices evolve and new tools are introduced, the LF-PPT will incorporate those into the template. Updates to the template can be applied to existing projects with minimal effort, allowing the users to focus on science and not software maintenance. | ||
|
||
> copier update | ||
## Hibernating projects | ||
|
||
Scientific projects often go into periods of hibernation when not under active development or use, and are often challenging to revive. With the LF-PPT, hibernating projects should be much easier to reactivate. Automatically scheduled smoke tests and dependabot[^6] integration ensure that the code and dependencies continue to function correctly without significant interaction from the maintainers. | ||
|
||
## Recent applications | ||
|
||
The LF-PPT has been applied to multiple LINCC Frameworks projects including LSDB[^11], Hipscat[^12], and TAPE[^13], short term collaborations such as Sorcha[14], SuperPhot+[^10], DeepDISC[^15], and MacCauff[^9], and external project including kcorrect[^16] and FlexCode[^17] Additionally a project specific version was forked from the project and has been applied to all RAIL packages[^18]. | ||
|
||
# Acknowledgments | ||
|
||
This project is supported by Schmidt Sciences. | ||
|
||
[^1]: https://www.sphinx-doc.org/ | ||
[^2]: https://about.readthedocs.com/ | ||
[^3]: https://pip.pypa.io/en/latest/ | ||
[^4]: https://docs.conda.io/en/latest/ | ||
[^5]: https://copier.readthedocs.io/en/stable/ | ||
[^6]: https://docs.github.com/en/code-security/dependabot | ||
[^7]: https://pypi.org/ | ||
[^8]: https://conda-forge.org/docs/ | ||
[^9]: https://something-for-maccauff??? | ||
[^10]: https://github.com/VTDA-Group/superphot-plus | ||
[^11]: https://github.com/astronomy-commons/lsdb | ||
[^12]: https://github.com/astronomy-commons/hipscat | ||
[^13]: https://github.com/lincc-frameworks/tape | ||
[^14]: https://github.com/dirac-institute/sorcha | ||
[^15]: https://github.com/lincc-frameworks/deepdisc | ||
[^16]: https://github.com/blanton144/kcorrect | ||
[^17]: https://github.com/lee-group-cmu/FlexCode | ||
[^18]: https://github.com/LSSTDESC/RAIL-project-template |