Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[REVIEW]: A course on the setup, running, and analysis of biomolecular simulations #265

Open
editorialbot opened this issue Oct 16, 2024 · 32 comments
Assignees
Labels

Comments

@editorialbot
Copy link
Collaborator

editorialbot commented Oct 16, 2024

Submitting author: @degiacom (Matteo Degiacomi)
Repository: https://github.com/CCPBioSim/BioSim-analysis-workshop
Branch with paper.md (empty if default branch):
Version: v1.0
Editor: @arm61
Reviewers: @raquellrios, @djcole56
Archive: Pending
Paper kind: learning module

Status

status

Status badge code:

HTML: <a href="https://jose.theoj.org/papers/085047c344a15394568f262b57eb920f"><img src="https://jose.theoj.org/papers/085047c344a15394568f262b57eb920f/status.svg"></a>
Markdown: [![status](https://jose.theoj.org/papers/085047c344a15394568f262b57eb920f/status.svg)](https://jose.theoj.org/papers/085047c344a15394568f262b57eb920f)

Reviewers and authors:

Please avoid lengthy details of difficulties in the review thread. Instead, please create a new issue in the target repository and link to those issues (especially acceptance-blockers) by leaving comments in the review thread below. (For completists: if the target issue tracker is also on GitHub, linking the review thread in the issue or vice versa will create corresponding breadcrumb trails in the link target.)

Reviewer instructions & questions

@raquellrios & @djcole56, your review will be checklist based. Each of you will have a separate checklist that you should update when carrying out your review.
First of all you need to run this command in a separate comment to create the checklist:

@editorialbot generate my checklist

The reviewer guidelines are available here: https://openjournals.readthedocs.io/en/jose/reviewer_guidelines.html. Any questions/concerns please let @arm61 know.

Please start on your review when you are able, and be sure to complete your review in the next six weeks, at the very latest

Checklists

📝 Checklist for @raquellrios

📝 Checklist for @djcole56

@editorialbot
Copy link
Collaborator Author

Hello humans, I'm @editorialbot, a robot that can help you with some common editorial tasks.

For a list of things I can do to help you, just type:

@editorialbot commands

For example, to regenerate the paper pdf after making changes in the paper's md or bib files, type:

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.5281/zenodo.13155316 is OK
- 10.1038/s41586-021-03819-2 is OK
- 10.25080/Majora-629e541a-00e is OK
- 10.1038/253694a0 is OK
- 10.1126/science.1208351 is OK
- 10.1021/acs.jchemed.1c00022 is OK
- 10.1140/epjb/s10051-021-00249-x is OK
- 10.1016/j.neuron.2018.08.011 is OK
- 10.1016/j.bpj.2022.11.2277 is OK
- 10.1002/jcc.21787 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Scikit-Learn: Machine Learning in Python

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

@editorialbot
Copy link
Collaborator Author

Software report:

github.com/AlDanial/cloc v 1.90  T=0.36 s (61.3 files/s, 26524.5 lines/s)
-------------------------------------------------------------------------------
Language                     files          blank        comment           code
-------------------------------------------------------------------------------
Jupyter Notebook                 9              0           7142           1112
Python                           1            103            263            223
Markdown                         5             88              0            195
TeX                              1             10              0            161
YAML                             3              1             10             56
Bourne Shell                     1             14             63             35
CSV                              1              0              0             26
XML                              1              0              0             18
-------------------------------------------------------------------------------
SUM:                            22            216           7478           1826
-------------------------------------------------------------------------------

Commit count by author:

    24	Matteo Degiacomi
    17	DEGIACOMI
    10	ppxasjsm
     3	Toni Mey
     1	Micaela Matta

@editorialbot
Copy link
Collaborator Author

Paper file info:

📄 Wordcount for paper.md is 1437

✅ The paper includes a Statement of need section

@editorialbot
Copy link
Collaborator Author

License info:

🟡 License found: Other (Check here for OSI approval)

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@ppxasjsm
Copy link

ppxasjsm commented Nov 1, 2024

Hi @raquellrios, @djcole56 anything I could help with to get you started with the review?

@djcole56
Copy link

djcole56 commented Nov 1, 2024

Review checklist for @djcole56

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source for this learning module available at the https://github.com/CCPBioSim/BioSim-analysis-workshop?
  • License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)
  • Version: Does the release version given match the repository release?
  • Authorship: Has the submitting author (@degiacom) made visible contributions to the module? Does the full list of authors seem appropriate and complete?

Documentation

  • A statement of need: Do the authors clearly state the need for this module and who the target audience is?
  • Installation instructions: Is there a clearly stated list of dependencies?
  • Usage: Does the documentation explain how someone would adopt the module, and include examples of how to use it?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the module 2) Report issues or problems with the module 3) Seek support

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

  • Learning objectives: Does the module make the learning objectives plainly clear? (We don't require explicitly written learning objectives; only that they be evident from content and design.)
  • Content scope and length: Is the content substantial for learning a given topic? Is the length of the module appropriate?
  • Pedagogy: Does the module seem easy to follow? Does it observe guidance on cognitive load? (working memory limits of 7 +/- 2 chunks of information)
  • Content quality: Is the writing of good quality, concise, engaging? Are the code components well crafted? Does the module seem complete?
  • Instructional design: Is the instructional design deliberate and apparent? For example, exploit worked-example effects; effective multi-media use; low extraneous cognitive load.

JOSE paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Does the paper clearly state the need for this module and who the target audience is?
  • Description: Does the paper describe the learning materials and sequence?
  • Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?
  • Could someone else teach with this module, given the right expertise?
  • Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

@djcole56
Copy link

djcole56 commented Nov 5, 2024

Hi, I'll drop any issues with the notebooks in here as I'm going along.

In 4_Simulation_Setup, only this method works:

lig27_smiles = "[H:7][C@:6]1([C:13](=[C:11]([C:9](=[O:10])[O:8]1)[O:12][H:19])[O:14][H:20])[C@:3]([H:4])([C:2]([H:16])([H:17])[O:1][H:15])[O:5][H:18]"
ligand = Molecule.from_mapped_smiles(lig27_smiles)

ie reading from file does not (I guess because it's not a mapped smiles).

More generally, it would be good to include instruction for how to make a mapped smiles if this is necessary input.

@djcole56
Copy link

@ppxasjsm @degiacom Finished going through the material. The course comprises a series of around 8 lectures and 8 workshops, with the first half devoted to setting up and running biomolecular simulations, and the second half to data analysis, including introductions to machine learning. The lectures are very clear and could be adopted by other teachers for their own courses. The workshops are presented through Jupyter notebooks, and I followed them on Google colab (as I imagine most students would). These are again very clearly presented and at a suitable level for a graduate or advanced undergrad class. Worked examples and problems are used throughout to hold attention, and the material usually progresses nicely from toy models to realistic simulation data. This course is commended for focusing on the fundamentals of simulation and analysis that is mostly agnostic to the underlying MD codes, and doesn't require any expensive licenses.

The material has clearly been used for teaching on several occasions, so I only found the one issue (above). Also nglview is not always able to run the trajectories on colab, but I this is a recurring problem, and not the authors' issue (it might still be worth warning students that they can use their favourite pdb viewer if this happens).

If I understand the journal requirements correctly, the GitHub page needs a 'statement of need' and 'community guidelines', then I'm happy to sign it off.

@arm61
Copy link

arm61 commented Nov 14, 2024

The statement of need is typically for the paper not the GitHub repo.

@raquellrios
Copy link

raquellrios commented Nov 24, 2024

Review checklist for @raquellrios

Conflict of interest

Code of Conduct

General checks

  • Repository: Is the source for this learning module available at the https://github.com/CCPBioSim/BioSim-analysis-workshop?
  • License: Does the repository contain a plain-text LICENSE file with the contents of a standard license? (OSI-approved for code, Creative Commons for content)
  • Version: Does the release version given match the repository release?
  • Authorship: Has the submitting author (@degiacom) made visible contributions to the module? Does the full list of authors seem appropriate and complete?

Documentation

  • A statement of need: Do the authors clearly state the need for this module and who the target audience is?
  • Installation instructions: Is there a clearly stated list of dependencies?
  • Usage: Does the documentation explain how someone would adopt the module, and include examples of how to use it?
  • Community guidelines: Are there clear guidelines for third parties wishing to 1) Contribute to the module 2) Report issues or problems with the module 3) Seek support

Pedagogy / Instructional design (Work-in-progress: reviewers, please comment!)

  • Learning objectives: Does the module make the learning objectives plainly clear? (We don't require explicitly written learning objectives; only that they be evident from content and design.)
  • Content scope and length: Is the content substantial for learning a given topic? Is the length of the module appropriate?
  • Pedagogy: Does the module seem easy to follow? Does it observe guidance on cognitive load? (working memory limits of 7 +/- 2 chunks of information)
  • Content quality: Is the writing of good quality, concise, engaging? Are the code components well crafted? Does the module seem complete?
  • Instructional design: Is the instructional design deliberate and apparent? For example, exploit worked-example effects; effective multi-media use; low extraneous cognitive load.

JOSE paper

  • Authors: Does the paper.md file include a list of authors with their affiliations?
  • A statement of need: Does the paper clearly state the need for this module and who the target audience is?
  • Description: Does the paper describe the learning materials and sequence?
  • Does it describe how it has been used in the classroom or other settings, and how someone might adopt it?
  • Could someone else teach with this module, given the right expertise?
  • Does the paper tell the "story" of how the authors came to develop it, or what their expertise is?
  • References: Do all archival references that should have a DOI list one (e.g., papers, datasets, software)?

@raquellrios
Copy link

raquellrios commented Nov 24, 2024

@raquellrios
Copy link

@raquellrios
Copy link

raquellrios commented Nov 25, 2024

Review:

The course "Molecular Dynamics Simulation and Analysis Workshop", developed by @ppxasjsm and @degiacom covers the theoretical foundations of MD simulations of protein systems and their analysis, including advanced analysis topics, such as dimensionality reduction and supervised machine learning (ML) classifications. The course targets graduate students with a basic understanding of MD simulations and Python.

The course is divided into two units:

  • Unit 1 (Lectures 1–4): focuses on protein simulation preparation, including essential steps like protein structure preparation, or protein-ligand docking.
  • Unit 2 (Lectures 5–8): centers around simulation analysis, incorporating modern techniques like dimensionality reduction and ML-based classification.

Each lecture includes a PDF document with slides. I found the content in all lectures clear and easy to follow, enhanced by numerous visual aids that will help students understand complex topics. Such visual representations will be of great help for students following this material individually. Furthermore, the analysis lectures (Unit 2) provide examples that demonstrate the practical application of the described methods (PCA, Random forests, etc.) to molecular simulations.

In addition to the slides, the course offers practical sessions (except for Lecture 1, which serves as an introduction to proteins). Most of these sessions are presented as Jupyter Notebooks that run seamlessly on Google Colab. These notebooks are well-structured with clear code explanations. Furthermore, practicals include questions to students. These questions will foster active learning, preventing students from just pressing "run" on every cell without thinking about what they are doing. The inclusion of solutions to these questions ensures that students can complete the exercises independently. The use of Colab eliminates the need for students to download software or have access to high-performance computing resources, significantly improving the usability and accessibility of the course. Also, a key strength of the course is its exclusive reliance on open-source software, aligning with the principles of open science and ensuring accessibility for all students.

Furthermore, the course is able to go from an introduction to proteins to ML, addressing both fundamental and cutting-edge topics like AlphaFold and supervised classification. I believe Lectures 6–8 are particularly noteworthy, as they cover advanced topics for which there are not that many well established learning resources (compared to docking or MD simulations) that cover these topics in such a clear way. Also, the modular design of the material will allow teachers to adopt individual lectures or practical sessions independently, except for Practical 8, which builds upon Practical 7. This flexibility increases the course's applicability across different educational settings.

Overall I only have a few comments on the material, which I have reported as Issues in the target repository. Some of these comments are classified as "optional", which means that I do not think they are absolutely necessary but would improve explainability. Regarding the JOSE checklist, I do not see "community guidelines" in the documentation, but I assume this is a self-contained module so it may not need them.

This course provides a robust, well-designed learning process that spans foundational to advanced topics in MD simulations. Its focus on modern methods makes it a valuable resource for students and teachers. I highly recommend this material for publication after addressing the submitted comments.

@arm61
Copy link

arm61 commented Nov 25, 2024

Okay, it looks like @raquellrios is happy. However, @degiacom and @ppxasjsm please let me know when you have addressed their issues. And when @djcole56's comment about the Statement of Need has been addressed.

@ppxasjsm
Copy link

Hi @djcole56,

In 4_Simulation_Setup, only this method works:

lig27_smiles = "[H:7][C@:6]1(C:13[O:14][H:20])C@:3(C:2([H:17])[O:1][H:15])[O:5][H:18]"
ligand = Molecule.from_mapped_smiles(lig27_smiles)
ie reading from file does not (I guess because it's not a mapped smiles).

More generally, it would be good to include instruction for how to make a mapped smiles if this is necessary input.

I cannot recreate your issues. Maybe you could try again? I have also added a link to more explantions in an other resource on how to read in molcules. Please take a look a this commit to see if everything has been adequately addressed: CCPBioSim/BioSim-analysis-workshop@cd8c7f7

@ppxasjsm
Copy link

ppxasjsm commented Dec 22, 2024

Hi @raquellrios and @djcole56 thank you so much for your nice comments and thoughtful reviews. We have not added a statement of need to the Github as it is covered in the paper. We have added some community guidelines in this commit: CCPBioSim/BioSim-analysis-workshop@d381349. I believe we have now addressed all the concerns raised by issues on the repo and in this thread. If you agree, then we hope that @arm61 can proceed with the submission. Have a lovely holiday break, everyone!

@djcole56
Copy link

  • I can no longer reproduce the issue in workshop 4, and further help has been added in any case.
  • Community guidelines have been added.

@arm61 all good to go from my side.

@raquellrios
Copy link

Same here! Everything looks great!

@arm61
Copy link

arm61 commented Jan 20, 2025

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@arm61
Copy link

arm61 commented Jan 20, 2025

@editorialbot check references

@editorialbot
Copy link
Collaborator Author

Reference check summary (note 'MISSING' DOIs are suggestions that need verification):

✅ OK DOIs

- 10.5281/zenodo.13155316 is OK
- 10.1038/s41586-021-03819-2 is OK
- 10.25080/Majora-629e541a-00e is OK
- 10.1038/253694a0 is OK
- 10.1126/science.1208351 is OK
- 10.1021/acs.jchemed.1c00022 is OK
- 10.1140/epjb/s10051-021-00249-x is OK
- 10.1016/j.neuron.2018.08.011 is OK
- 10.1016/j.bpj.2022.11.2277 is OK
- 10.1002/jcc.21787 is OK

🟡 SKIP DOIs

- No DOI given, and none found for title: Scikit-Learn: Machine Learning in Python

❌ MISSING DOIs

- None

❌ INVALID DOIs

- None

@arm61
Copy link

arm61 commented Jan 20, 2025

Sorry for the delay all (in short -- teaching). @ppxasjsm and @degiacom, is there a nicer way to format the "Open in Colab" parts in unit breakdowns, perhaps more in keeping with the style of the pdf? Just a thought.

@degiacom
Copy link

Hello @arm61, thank you for having had a look at our paper. Perhaps, we could leave the Colab badges in the repo's README file, but replace them with a text ("Open in Colab") with hyperlink in the paper?

@arm61
Copy link

arm61 commented Jan 21, 2025

Yeah, I think that would look better (small things 😄).

@degiacom
Copy link

Done! I replaced the "Open in Colab" badges with a "Notebook" plain text.

@arm61
Copy link

arm61 commented Jan 22, 2025

@editorialbot generate pdf

@editorialbot
Copy link
Collaborator Author

👉📄 Download article proof 📄 View article proof on GitHub 📄 👈

@arm61
Copy link

arm61 commented Jan 22, 2025

Great, the next step is for you to proof read it and ensure there are no typos, etc. Let me know when you are happy with the text.

@degiacom
Copy link

We have now checked grammar/orthograph of the paper, we are happy with it!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants