Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement the Alchemical Model from torch_alchemical repo #66

Merged
merged 40 commits into from
Feb 20, 2024

Conversation

abmazitov
Copy link
Contributor

@abmazitov abmazitov commented Feb 8, 2024

Current status

  • Model is implemented
  • Tests are implemented
  • Test are working
  • Fixed the Failed to downcast a Function to a GraphFunction issue
  • Docs prototype is ready
  • WIP example is ready
  • Found out the way to pre-compute the neighborlists before the training loop
  • Found a place to compute the basis_normalization_factor (ideally make it as a method of the model)
  • Fixed the example
  • Checked if training works on CUDA and MPS: Doesn't work on MPS because of CPU and CUDA only implementation of spheric art
  • Figured out if float32 is an option for splines in torch_spex, waiting for update of splining routine
  • Updated to main, including new Dataset and Dataloader

Status of tests (local run)

  • Functionality
  • Invariance
  • Regression
  • TorchScript

TODO

  • Implement the domain decomposition integration - TBD in the next PR

📚 Documentation preview 📚: https://metatensor-models--66.org.readthedocs.build/en/66/

pyproject.toml Outdated Show resolved Hide resolved
@frostedoyster
Copy link
Collaborator

Should this model (and PET) go to an experimental folder for now?

@PicoCentauri @abmazitov @serfg @Luthaf

@Luthaf
Copy link
Member

Luthaf commented Feb 8, 2024

I'd put all models including SOAP-BPNN in an experimental folder for now!

@PicoCentauri
Copy link
Contributor

I agree. Let's discuss this next week on the meeting on how we implement this.

Copy link
Contributor

@PicoCentauri PicoCentauri left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please don't forget to add the model to the docs. One entry at the README plus a page at docs/src/architectures.

learning_rate: 0.001
log_interval: 10
checkpoint_interval: 25
device: 'cpu'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should not be here but will be passed directly from the train function.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These lines are somehow needed for training step in test_regression.py

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes these lines but the device is a global parameter and will be provided via the train function.

@abmazitov abmazitov marked this pull request as ready for review February 13, 2024 21:46
Copy link
Collaborator

@frostedoyster frostedoyster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work, just some small details.
Regarding the large file with structures, I think it's too large (@PicoCentauri). It doesn't train on my laptop (runs out of memory) and we already have 100 structures from the original alchemical dataset in the repo. Can we re-use them?

Bonus question: @abmazitov have you tried learning from the stresses?

docs/src/architectures/alchemical-model.rst Show resolved Hide resolved
================

This is an implementation of Alchemical Model: a Behler-Parrinello neural network
with SOAP features and Alchemical Compression of the composition space. This model
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
with SOAP features and Alchemical Compression of the composition space. This model
with SOAP features and alchemical compression of the composition space. This model

examples/alchemical_model/README.rst Show resolved Hide resolved
@@ -0,0 +1,35 @@
#!\bin\bash
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't want this for each model in the docs, right @PicoCentauri? However, maybe we can leave the example here and run it from the test CI of the alchemical model

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would not duplicate this file. If we want this to be a test, let's make it a test and not an example.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fine, then I'll delete this file

src/metatensor/models/utils/compute_loss.py Outdated Show resolved Hide resolved
for i in range(len(dataset)):
structure = dataset[i].structure
if not isinstance(structure, torch.ScriptObject):
raise RuntimeError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a small description of the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thx, this is no longer needed

for i in range(len(dataset)):
structure = dataset[i].structure
if not isinstance(structure, torch.ScriptObject):
raise RuntimeError
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We need a small description of the error

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same

@PicoCentauri
Copy link
Contributor

Very nice work @abmazitov !!! I will take a look after @frostedoyster comments are done.

Regarding the large file with structures, I think it's too large (@PicoCentauri). It doesn't train on my laptop (runs out of memory) and we already have 100 structures from the original alchemical dataset in the repo. Can we re-use them?

Yes try to use the file we already have. If you can not test the model on a notebook it is not good xD

@abmazitov
Copy link
Contributor Author

abmazitov commented Feb 14, 2024

What do you think of adding the model_requires_neighborlists flag to ModelCapabilities or to the forward pass keywords, or to the model class attribute? Since the Alchemical Model doesn't work without NLs, there is a few places in the code (i.e. metatensor-models eval step), where the workflow breaks because of the NLs missing.

What I would like to see, is a model check performed any time the forward pass is called. In other words, if the model requires the NLs and systems don't have them, then we compute the missing NLs first, and then do the forward pass

@Luthaf @PicoCentauri @frostedoyster Let me know what do you think

@Luthaf
Copy link
Member

Luthaf commented Feb 14, 2024

What do you think of adding the model_requires_neighborlists flag to ModelCapabilities or to the forward pass keywords, or to the model class attribute? Since the Alchemical Model doesn't work without NLs, there is a few places in the code (i.e. metatensor-models eval step), where the workflow breaks because of the NLs missing.

If your model requires NL, it should define somewhere in the module tree the corresponding requested_neighbors_lists(self) -> List[NeighborsListOptions]. This is done automatically by rascaline calculators.

Then MetatensorAtomisticModels will use these functions to assemble the full list of NL, and the simulation engine (here the eval script) is responsible for computing the NL and storing them in the systems.

@Luthaf
Copy link
Member

Luthaf commented Feb 16, 2024

At the very least for tests. For train/eval, it depends how much slower check_consistency=True makes the code

Copy link
Collaborator

@frostedoyster frostedoyster left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@Luthaf could you please check this?

docs/src/architectures/alchemical-model.rst Outdated Show resolved Hide resolved
docs/src/architectures/alchemical-model.rst Show resolved Hide resolved
examples/alchemical_model/README.rst Show resolved Hide resolved
examples/alchemical_model/alchemical_reduced_10.xyz Outdated Show resolved Hide resolved
@@ -0,0 +1,35 @@
#!\bin\bash
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, I would not duplicate this file. If we want this to be a test, let's make it a test and not an example.

src/metatensor/models/utils/normalize.py Show resolved Hide resolved
src/metatensor/models/utils/normalize.py Outdated Show resolved Hide resolved
src/metatensor/models/utils/normalize.py Outdated Show resolved Hide resolved
tox.ini Outdated Show resolved Hide resolved
pyproject.toml Outdated Show resolved Hide resolved
@abmazitov
Copy link
Contributor Author

Ready for merging

@frostedoyster
Copy link
Collaborator

frostedoyster commented Feb 20, 2024

I'm having a look and I will add my comments here before we merge:

  • Documentation: it's there, having a small guide on how to change the parameters wouldn't hurt, but it can be left for later. I'm going to add some basic comments next to each hyperparameter. I really dislike the citation format we're using, which doesn't even make it easy for the user to cite the relevant papers. It looks really cool, but it doesn't do what we want in our context. Bibtex needs to be visible and ready to be copied. I'll open an issue
  • Model: only some small changes (4 lines) were needed for it to work on different device and dtype. However, the handling of the composition weights is wrong (I assume due to the internal rescaling of targets in the model), and the energy errors are very high as a result

EDIT: the problem was in the order that the rescaling and the composition weights were applied. Changing the order to rescaling first, then applying composition weights fixed the issue. I'm merging

@frostedoyster frostedoyster merged commit 06ca9b4 into main Feb 20, 2024
9 checks passed
@frostedoyster frostedoyster deleted the alchemical-model branch February 20, 2024 04:49
@PicoCentauri
Copy link
Contributor

Very nice work!!! 🥳

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants