Material for the workshop at the AGU 2018 Fall Meeting.
Conveners: Leonardo Uieda, Lindsey Heagy, Lion Krischer, Florian M. Wagner
Info | |
---|---|
Location | Grand Hyatt / Room: Penn Quarter |
Time | Wednesday, 12 December 2018 / 13:40 - 18:00 |
Workshop ID | WS24 |
Shared notes | Google Docs |
Modern science increasingly relies on code, ranging from small scripts to workflows with many interacting parts. Reproducibility and extension of studies employing these codes require that they are accessible. The open-source community has established modern best-practices for making software available, usable, and maintainable. In this workshop, we will demonstrate a workflow for publishing research code following these best-practices. We will cover open-source licenses, version control, automated testing, documentation, and continuous integration. The workshop will be hands-on: participants will work to set up a project using sample code provided by the instructors. By the end, participants will have the knowledge needed to continue learning independently and apply these practices to their own research code. These resources can be applied to any programming language or scientific discipline.
Our aim with this workshop is for participants to:
- Gain awareness of tools available to researchers within the open-source ecosystem including Jupyter, git, ReadTheDocs, continuous integration services (for testing), etc
- Learn modern best-practices for structuring a repository for research software that promotes accessibility, reusability, and reproducibility
- Learn about the tools available for testing, publishing documentation, and versioning that can be immediately applied to their own codes
During the workshop, we'll introduce these topics by working through an example. The goal is to convert a notebook (or script) that does some data analysis into a Python library that is tested, documented, and can be reused. The final version of the library and a history of each step in the conversion process can be found at https://github.com/opengeophysics/2018-agu-oss-example-repo
Duration (min) | Topic | Tools |
---|---|---|
15 | Introduce the motivations and problems that the workshop will address | Example data analysis in Jupyter |
30 | Describe the example we will working through and provide an overview of the Jupyter notebook | Jupyter notebook |
45 | Overview of version control with git and setting up an online repository | GitHub, Slides |
45 | How to setup a small Python library (though the example is in python, participants are encouraged to use their own research code in whichever language they prefer) | Python packaging guide |
15 | Discussion on choosing an open-source license | Choose a license, OSI Licenses |
15 | Including a Code of Conduct and Contributing Guidelines | Contributor Covenant |
30 | How to write automated tests in Python | Pytest |
30 | Setup continuous integration services to check that the code is tested on every update | TravisCI |
30 | Write and publish documentation on ReadTheDocs, a free hosting service for open source software projects | ReadTheDocs |
15 | Overview of other resources available within the open source community |
If you would like to follow along interactively during the course, please do the following before the course starts:
- Download and install Anaconda. Use the latest version of Python 3 and be sure to check the box that says "Add Anaconda to my PATH environment variable" if on Windows.
- Download and install git. The easiest way to do this is to follow the instructions on the Software Carpentry website.
- If you do not already have one, set up a free GitHub account.
Since the time allocated for the workshop does not allow to cover scientific software development in its entirety, we provide links to some alternatives and guides to extend and deepen some of the taught concepts.
- Version control with Git
- Software packing
-
Version control
- GitLab: Version control for with private repositories and for your own server
-
Continous Integration
- CircleCI: Alternative to Travis (https://circleci.com/circleci-versus-travisci/).
- Jenkins: Continuous integration on your own server. Might come in handy, for computationally more demanding software tests.
-
Documentation
- MkDocs: Fast and simple project documentation using Markdown.
Workshop - AGU 2018
This work is licensed under a Creative Commons Attribution 4.0 International License.