Skip to content

Commit

Permalink
Fix references
Browse files Browse the repository at this point in the history
  • Loading branch information
utf committed Oct 5, 2023
1 parent d46f13f commit af11ca0
Showing 1 changed file with 9 additions and 9 deletions.
18 changes: 9 additions & 9 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -83,11 +83,11 @@ We present Jobflow, a domain-agnostic Python package for writing computational w

# Statement of Need

The current era of big data and high-performance computing has emphasized the significant need for robust, flexible, and scalable workflow management solutions that can be used to efficiently orchestrate scientific calculations `[@ben2020workflows; @da2023workflows]`. To date, a wide variety of workflow systems have been developed, and it has become clear that there is no one-size-fits-all solution due to the diverse needs of the computational community `[@wflowsystems; @al2021exaworks]`. While several popular software packages in this space have emerged over the last decade, many of them require the user to tailor their domain-specific code with the underlying workflow management framework closely in mind. This can be a barrier to entry for many users and puts significant constraints on the portability of the underlying workflows.
The current era of big data and high-performance computing has emphasized the significant need for robust, flexible, and scalable workflow management solutions that can be used to efficiently orchestrate scientific calculations [@ben2020workflows; @da2023workflows]. To date, a wide variety of workflow systems have been developed, and it has become clear that there is no one-size-fits-all solution due to the diverse needs of the computational community [@wflowsystems; @al2021exaworks]. While several popular software packages in this space have emerged over the last decade, many of them require the user to tailor their domain-specific code with the underlying workflow management framework closely in mind. This can be a barrier to entry for many users and puts significant constraints on the portability of the underlying workflows.

Here, we introduce Jobflow: a free, open-source Python library that makes it simple to transform collections of functions into complex workflows that can be executed either locally or across distributed computing environments. Jobflow has been intentionally designed to act as middleware between the user’s domain-specific routines that they wish to execute and the workflow “manager” that ultimately orchestrates the calculations across different computing environments. Jobflow uses a simple decorator-based syntax that is similar to that of other recently developed workflow tools `[@babuji2019parsl; @prefect; @covalent; @redun]`. This approach makes it possible to turn virtually any function into a Jobflow `Job` instance (i.e., a discrete unit of work) with minimal changes to the underlying code itself.
Here, we introduce Jobflow: a free, open-source Python library that makes it simple to transform collections of functions into complex workflows that can be executed either locally or across distributed computing environments. Jobflow has been intentionally designed to act as middleware between the user’s domain-specific routines that they wish to execute and the workflow “manager” that ultimately orchestrates the calculations across different computing environments. Jobflow uses a simple decorator-based syntax that is similar to that of other recently developed workflow tools [@babuji2019parsl; @prefect; @covalent; @redun]. This approach makes it possible to turn virtually any function into a Jobflow `Job` instance (i.e., a discrete unit of work) with minimal changes to the underlying code itself.

Jobflow has grown out of a need to carry out high-throughput computational materials science workflows at scale as part of the Materials Project `[@materialsproject]`. As the kinds of calculations — from _ab initio_ to semi-empirical to those based on machine learning — continue to evolve and the resulting data streams continue to diversify, it was necessary to rethink how we managed an increasingly diverse range of computational workflows. Going forward, Jobflow will become the computational backbone of the Materials Project, which we hope will inspire additional confidence in the readiness of Jobflow for production-quality scientific computing applications.
Jobflow has grown out of a need to carry out high-throughput computational materials science workflows at scale as part of the Materials Project [@materialsproject]. As the kinds of calculations — from _ab initio_ to semi-empirical to those based on machine learning — continue to evolve and the resulting data streams continue to diversify, it was necessary to rethink how we managed an increasingly diverse range of computational workflows. Going forward, Jobflow will become the computational backbone of the Materials Project, which we hope will inspire additional confidence in the readiness of Jobflow for production-quality scientific computing applications.

# Features and Implementation

Expand Down Expand Up @@ -155,7 +155,7 @@ responses = run_locally(flow)

## Data Management

Jobflow has first-class support for a variety of data stores through an interface with the `maggma` Python package `[@maggma]`. This makes it possible to easily store the results of workflows in a manner that is independent of the choice of storage medium and that is entirely decoupled from the workflow logic itself. Additionally, it is possible within Jobflow to specify multiple types of data stores for specific Python objects (e.g., primitive types vs. large binary blobs) created by a given workflow, which is often useful for storing a combination of metadata (e.g., in a NoSQL database like MongoDB or file-system based store like MontyDB `[@montydb]`) and raw data (e.g., in a cloud object store like Amazon S3 or Microsoft Azure).
Jobflow has first-class support for a variety of data stores through an interface with the `maggma` Python package [@maggma]. This makes it possible to easily store the results of workflows in a manner that is independent of the choice of storage medium and that is entirely decoupled from the workflow logic itself. Additionally, it is possible within Jobflow to specify multiple types of data stores for specific Python objects (e.g., primitive types vs. large binary blobs) created by a given workflow, which is often useful for storing a combination of metadata (e.g., in a NoSQL database like MongoDB or file-system based store like MontyDB [@montydb]`) and raw data (e.g., in a cloud object store like Amazon S3 or Microsoft Azure).

## Promoting Code Reuse

Expand Down Expand Up @@ -190,7 +190,7 @@ responses = run_locally(flow)

Unlike many other workflow packages, one of the major benefits of Jobflow is that it decouples the details related to workflow execution from the workflow definitions themselves. The simplest way to execute a workflow is to run it directly on the machine where the workflow is defined using the `run_locally(...)` function, as shown in the examples above. This makes it possible to quickly test even complex workflows without the need to rely on a database or configuring remote resources.

When deploying production calculations, workflows often need to be dispatched to large supercomputers through a remote execution engine. Jobflow has an interface with the FireWorks package `[@fireworks]` via a one-line command to convert a `Flow` and its underlying `Job` objects into the analogous FireWorks `Workflow` and `Firework` objects that enable execution on high-performance computing machines. The logic behind the `Job` and `Flow` objects are not tied to FireWorks in any direct way, such that the two packages are fully decoupled.
When deploying production calculations, workflows often need to be dispatched to large supercomputers through a remote execution engine. Jobflow has an interface with the FireWorks package [@fireworks] via a one-line command to convert a `Flow` and its underlying `Job` objects into the analogous FireWorks `Workflow` and `Firework` objects that enable execution on high-performance computing machines. The logic behind the `Job` and `Flow` objects are not tied to FireWorks in any direct way, such that the two packages are fully decoupled.

Additionally, a remote mode of execution built solely around Jobflow is currently under active development. With this approach, workflows can be executed across multiple “workers” (e.g., a simple computer, a supercomputer or a cloud-based service) and managed through a modern command-line interface without relying on an external workflow execution engine. The Jobflow remote mode of execution has been designed such that no inbound connection from the workers to the database of jobs and results is needed, thus ensuring data and network security for professional usage.

Expand All @@ -204,10 +204,10 @@ Jobflow has been designed with robustness in mind. The Jobflow codebase has 100%

While domain-agnostic, Jobflow has been used in several materials science Python packages at the time of writing, including but not limited to:

- Atomate2 `[@atomate2]`, Quacc `[@quacc]`: Libraries of computational chemistry and materials science workflows.
- NanoParticleTools `[@nptools]`: Workflows for Monte Carlo simulations of nanoparticles.
- Reaction Network `[@rxnnetwork; @mcdermott2021graph]`: Workflows for constructing and analyzing inorganic chemical reaction networks.
- WFacer `[@wfacer]`: Workflows for modeling the statistical thermodynamics of solids via automated cluster expansion.
- Atomate2 [@atomate2], Quacc [@quacc]: Libraries of computational chemistry and materials science workflows.
- NanoParticleTools [@nptools]: Workflows for Monte Carlo simulations of nanoparticles.
- Reaction Network [@rxnnetwork; @mcdermott2021graph]: Workflows for constructing and analyzing inorganic chemical reaction networks.
- WFacer [@wfacer]: Workflows for modeling the statistical thermodynamics of solids via automated cluster expansion.

# Additional Details

Expand Down

0 comments on commit af11ca0

Please sign in to comment.