Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pet architecture description #295

Merged
merged 3 commits into from
Jul 12, 2024
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
39 changes: 39 additions & 0 deletions docs/src/architectures/pet.rst
Original file line number Diff line number Diff line change
Expand Up @@ -175,6 +175,45 @@ layers in each message-passing block (see more details in the `PET paper
about `1.5` times more lightweight and faster, with an expected minimal deterioration in
accuracy.


Description of the Architecture
-------------------------------
This section contains a simplified description of the architecture covering
most important macro-organization without all the details and nuances.

PET is a graph neural network (GNN) architecture featuring
``N_GNN_LAYERS`` message-passing layers. At each layer, messages are exchanged
between all atoms within a distance ``R_CUT`` from each other. The functional
form of each layer is an arbitrarily deep transformer applied individually to
each atom. Atomic environments are constructed around each atom, defined by all
neighbors within ``R_CUT``. Each neighbor sends a message to the central atom,
with each message being a token of fixed size ``TRANSFORMER_D_MODEL``.

These tokens are processed by a transformer, which performs a permutationally
equivariant sequence-to-sequence transformation. The output sequence is then
treated as outbound messages from the central atom to all neighbors. Consequently,
for a model with ``N_GNN_LAYERS`` layers and a system with ``N`` atoms, there are
``N_GNN_LAYERS`` individual transformers with distinct weights, each independently
invoked ``N`` times, resulting in ``N_GNN_LAYERS * N`` transformer runs. The
number of input tokens for each transformer run is determined by the number of
neighbors of the central atom.

In addition to an input message from a neighboring atom, geometric information
about the displacement vector ``r_ij`` from the central atom to the corresponding
neighbor is incorporated into the token. After each message-passing layer, all
output messages are fed into a head (individual for each message-passing layer),
implemented as a shallow MLP, to produce a contribution to the total prediction.
The total prediction is computed
as the sum of all head outputs over all message-passing layers and all messages.

This architecture is rigorously invariant with respect to translations because it
uses displacement vectors that do not change if both the central atom and a neighbor
are rigidly shifted. It is invariant with respect to permutations of identical atoms
because the transformer defines a permutationally covariant sequence-to-sequence
transformation, and the sum over the contributions from all edges yields an overall
invariant energy prediction. However, it is not rotationally invariant since it
operates with the raw Cartesian components of displacement vectors.

Architecture Hyperparameters
----------------------------
.. warning::
Expand Down
Loading