Skip to content

Commit

Permalink
Edits for JOSS paper (#799)
Browse files Browse the repository at this point in the history
* Edits for JOSS paper

* [pre-commit.ci] auto fixes from pre-commit.com hooks
for more information, see https://pre-commit.ci

* update orcid

* incoporate discussed changes to summary

* edit statement of need to put more focus on GMSO

* quick fix

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Co Quach <[email protected]>
  • Loading branch information
3 people authored Feb 2, 2024
1 parent 27733b0 commit 40b0277
Show file tree
Hide file tree
Showing 2 changed files with 27 additions and 6 deletions.
17 changes: 16 additions & 1 deletion paper/paper.bib
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ @article{thompson2022lammps
}

# HOOMD-Blue
@article{anderson2010hoomd,
@article{anderson2020hoomd,
title = {HOOMD-blue: A Python package for high-performance molecular dynamics and hard particle Monte Carlo simulations},
volume = {173},
ISSN = {0927-0256},
Expand Down Expand Up @@ -412,3 +412,18 @@ @article{marrink2019computational
month = jan,
pages = {6184–6226}
}

# nanoparticle paper
@article{craven2021examining
author = {Craven, Nicholas C and Gilmer, Justin B and Spindel, Caroline J and Summers, Andrew Z and Iacovella, Christopher R and McCabe, Clare},
doi = {10.1063/5.0032658},
issn = {0021-9606},
journal = {The Journal of Chemical Physics},
month = {jan},
number = {3},
pages = {34903},
title = {{Examining the self-assembly of patchy alkane-grafted silica nanoparticles using molecular simulation}},
url = {https://doi.org/10.1063/5.0032658},
volume = {154},
year = {2021}
}
16 changes: 11 additions & 5 deletions paper/paper.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,8 @@ tags:
- molecular-simulations
- data-structure
- MoSDeF
- interoperability
- force fields

authors:
- name: Co D. Quach
Expand Down Expand Up @@ -35,6 +37,7 @@ authors:
orcid: 0000-0002-6196-5274
affiliation: "5"
- name: Ryan S. DeFever
orcid: 0000-0001-5311-6718
affiliation: "6"
- name: Brad Crawford
orcid: 0000-0003-0638-7333
Expand All @@ -44,11 +47,12 @@ authors:
affiliation: "1, 2"
- name: Jeffrey Potoff
affiliation: "7"
orcid: "0000-0002-4421-8787"
orcid: 0000-0002-4421-8787
-name: Eric Jankowski
affiliation: "5"
orcid: "0000-0002-3267-1410"
orcid: 0000-0002-3267-1410
- name: Edward J. Maginn
orcid: 0000-0002-6309-1347
affiliation: "6"
- name: Clare McCabe
orcid: 0000-0002-8552-9135
Expand Down Expand Up @@ -86,18 +90,19 @@ bibliography: paper.bib


# Summary
The General Molecular Simulation Object, or GMSO, stands as an open-source Python data structure, offering a versatile and expandable framework for handling chemical and biomolecular topologies. This library is an integral component of the Molecular Simulation Design Framework (MoSDeF), dedicated to streamlining the creation, parameterization, and representation of systems for molecule simulations. The GMSO library serves as a dynamic repository for storing chemical/biomolecular structures, encompassing metadata, coordinates, and interaction potentials. Moreover, the library includes routines for editing and exporting stored structures into various file formats, which can be used with other software for visualization (e.g., VMD[@humphrey1996vmd] and OVITO[@]) or conducting molecular simulations (e.g., GROMACS [@abraham2015gromacs], LAMMPS[@thompson2022lammps], GOMC[@nejahi2021update]).
The General Molecular Simulation Object, or GMSO, is an open-source Python package designed to supplement molecular simulation workflow. This library offers a versatile and expandable data structures crucial for storage of chemical and biomolecular topologies, along with utilities necessary for editing and outputting these systems. GMSO is a core component of the Molecular Simulation Design Framework (MoSDeF), dedicated to streamlining the creation, parameterization, and representation of systems for molecular simulations. The GMSO library serves as a dynamic repository for storing chemical/biomolecular structures, encompassing metadata, coordinates, and interaction potentials. Moreover, the library includes routines for editing and exporting stored structures into various file formats, which can be used with other software for visualization (e.g., VMD[@humphrey1996vmd] and OVITO[@]) or conducting molecular simulations (e.g., GROMACS [@abraham2015gromacs], LAMMPS[@thompson2022lammps], GOMC[@nejahi2021update], and HOOMD-blue[@anderson2020hoomd]).


# Statement of need

The Molecular Simulation Design Framework (MoSDeF) is a suite of software tailored to facilitate the initialization of chemical and biomolecular systems for computational simulations [@cummings2021opena]. These tools were developed to specifically address a critical aspect of the (ir)reproducibility issue within the molecular simulation community — namely, the insufficient documentation of the structure preparation process and force field parameter implementation [@thompson2020towards]. The initialization step, often performed through Graphical User Interfaces (GUI) or via the use of ad-hoc, unpublished, and unreviewed code, poses the risk of introducing irreproducible and untraceable errors[@baker2016reproducibility]. By providing general-purposed and standardized tools that build and parameterize molecular systems for molecular simulations, directly support various molecular dynamics (MD) and Monte Carlo (MC) engines, MoSDeF aims to trivialize the describing and disseminating such processes without creating extra burdens for computational simulation researchers [@cummings2021opena].
The General Molecular Simulation Object (GMSO) is a component of the Molecular Simulation Design Framework (MoSDeF), provides a framework and utilities for storing, manipulating, and outputting of molecular systems. MoSDeF is a suite of software tailored to facilitate the initialization of chemical and biomolecular systems for computational simulations [@cummings2021opena]. These tools were developed to specifically address a critical aspect of the (ir)reproducibility issue within the molecular simulation community — namely, the insufficient documentation of the structure preparation process and force field parameter implementation [@thompson2020towards]. The initialization step, often performed through Graphical User Interfaces (GUI) or via the use of ad-hoc, unpublished, and unreviewed code, poses the risk of introducing irreproducible and untraceable errors[@baker2016reproducibility]. By providing general-purposed and standardized tools that build and parameterize molecular systems for molecular simulations, directly support various molecular dynamics and Monte Carlo engines, MoSDeF aims to trivialize the describing and disseminating such processes without creating extra burdens for computational simulation researchers [@cummings2021opena].


The initialization of chemical/biomolecular systems comprises of three key steps:
1. Constructing structures: Encompassing loading and/or creating molecules/structures that mirrors the phenomena under investigation.
2. Parameterizing: Assigning interactional parameters to all particles and connections within the structures.
3. Storing Structures and Output Generation: Storing parameterized structures, and outputting to file formats compatible with various simulation software.

Each of these steps necessitates distinct routines, and as such, is addressed by a series of specialized libraries — specifically, mBuild [@klein2016hierarchical], Foyer [@klein2019formalizing], and GMSO, which is introduced in this work. mBuild functions as a molecular builder, equipped with extensive utilities for creating, loading, and manipulating positions of atoms and molecules, along with managing their connectivity through bonds[@klein2016hierarchical]. Foyer assumes the role of parameterizing for the created structures, involving the identification and assignment of interaction parameters to each atom or group of atoms and their associated connections (e.g., bonds, angles, and dihedrals) [@klein2019formalizing]. This process entails matching the connectivity (bond graph) of the provided structure with the SMARTS grammar of the corresponding atom type, defining the interactional parameters[@klein2019formalizing]. The use of a graph matching method, departing from the traditional approach of matching via atom indices, allows for a more flexible parameterization. This feature proves particularly advantageous in the study of functionalized polymers, whose structures consistently deviate slightly from the standard polymer[@summers2020mosdef, @quach2022high]. These utilities have been utilized in various projects to explore a wide range of structures and applications [@thompson2019scalable; @summers2020mosdef; @quach2022high; @ma2022dynamics], and integrated into other scientific libraries [@albooyeh2023flowermd; @defever2021mosdef; @crawford2023mosdefgomc].


Expand All @@ -106,10 +111,11 @@ The parameterization step introduces additional information, requiring a more so
- Providing flexibility for exotic potentials and unit systems
- Being compatible with existing community tools
- Being extensible (to support new simulation models/engines/workflows)

Currently, existing data structures, such as ParmEd and OpenMM[@shirts2016lessons; @eastmann2017openmm], fulfill many functionalities and are widely adopted [@elenareal2023real; @kehrein2023unravel; @tesei2021accurate; @marrink2019computational]. However, their underlying structures are tailored to specific subsets of simulation workflows and ecosystems, as well as force field equation forms, sacrificing generality and broad applicability. This limitation includes hard-coding and assumptions about potential expressions and units. They lack the generality that MoSDeF and its users seek, such as the ability to define and store arbitrary potential expressions or unit systems. Integrating these new features into existing software, unfortunately, would require a major overhaul, potentially impacting existing simulation workflows and is not appealing to current project stakeholders.


Hence, we developed the General Molecular Simulation Object (GMSO) library, which is a lightweight, extensible data structure encapsulating chemical/biomolecular systems and their associated interaction parameters, i.e., force fields, to cater to MoSDeF ecosystem. The library is designed to accommodate a wide range of chemical/biomolecular models, offering the capability to support arbitrary potential expressions and unit systems. Generalizing these potential (force field) expressions allows users to enter the force field in its native form and units, minimizing user error when setting up the force field file while providing the ability to easily auto-convert the potential form and units to the molecular engine's required form. GMSO satisfies the broader community's need for a general, extensible, and reproducible method of setting up molecular simulations. In addition to core data classes, the library includes routines for interacting/converting to and from other ecosystems, including ParmEd and OpenMM, enhancing interoperability without reinventing functionalities. GMSO supports output to multiple molecular simulation engine-specific file formats, currently including , including GROMACS, LAMMPS, HOOMD-Blue, NAMD, Cassandra, and GOMC, with plans for future expansion[@abraham2015gromacs; @thompson2022lammps; @anderson2010hoomd, @phillips2020scalable, @shah2017cassandra, @nejahi2021update]. When integrated with other MoSDeF software and workflow manager like Signac [@adorf2018simple], GMSO facilitates large-scale automated molecular screening for diverse molecules/structures and state points, which is critical for developing new materials, chemicals and drugs [@quach2022high, @thompson2019scalable].
Hence, we developed the General Molecular Simulation Object (GMSO) library, which is a lightweight and extensible data structure encapsulating chemical/biomolecular systems and their associated interaction parameters, i.e., force fields, to cater to the general force fields. The library is designed to accommodate a wide range of chemical/biomolecular models, offering the capability to support arbitrary potential expressions and unit systems. Generalizing these potential (force field) expressions allows users to enter the force field in its native form and units, minimizing user error when setting up the force field file while providing the ability to easily auto-convert the potential form and units to the molecular engine's required form. GMSO satisfies the broader community's need for a general, extensible, and reproducible method of setting up molecular simulations. In addition to core data classes, the library includes routines for interacting/converting to and from other ecosystems, including ParmEd and OpenMM, enhancing interoperability without reinventing functionalities. GMSO supports output to multiple molecular simulation engine-specific file formats, currently including: GROMACS [@abraham2015gromacs], LAMMPS [@thompson2022lammps], HOOMD-Blue [@anderson2020hoomd], NAMD[@phillips2020scalable], Cassandra [@shah2017cassandra], and GOMC [@nejahi2021update], with plans for future expansion. When integrated with other MoSDeF software and workflow manager like Signac [@adorf2018simple], GMSO facilitates large-scale automated molecular screening for diverse molecules/structures and state points, which is critical for developing new materials, chemicals and drugs [@craven2021examining, @quach2022high, @thompson2019scalable].


# Acknowledgements
Expand Down

0 comments on commit 40b0277

Please sign in to comment.