DeePMD-kit v3: A multiple-backend framework for deep potentials

We are excited to announce the first alpha version of DeePMD-kit v3. DeePMD-kit v3 allows you to train and run deep potential models on top of TensorFlow or PyTorch. DeePMD-kit v3 also supports the DPA-2 model, a novel architecture for large atomic models.

Highlights

Multiple-backend framework

DeePMD-kit v3 adds a pluggable multiple-backend framework to provide consistent training and inference experiences between different backends. You can:

Use the same training data and the input script to train a deep potential model with different backends. Switch backends based on efficiency, functionality, or convenience:

# Training a model using the TensorFlow backend
dp --tf train input.json
dp --tf freeze

# Training a mode using the PyTorch backend
dp --pt train input.json
dp --pt freeze

Use any model to perform inference via any existing interfaces, including dp test, Python/C++/C interface, and third-party packages (dpdata, ASE, LAMMPS, AMBER, Gromacs, i-PI, CP2K, OpenMM, ABACUS, etc). Take an example on LAMMPS:

# run LAMMPS with a TensorFlow backend model
pair_style deepmd frozen_model.pb
# run LAMMPS with a PyTorch backend model
pair_style deepmd frozen_model.pth
# Calculate model deviation using both models
pair_style deepmd frozen_model.pb frozen_model.pth out_file md.out out_freq 100

Convert models between backends, using dp convert-backend, if both backends support a model:

dp convert-backend frozen_model.pb frozen_model.pth
dp convert-backend frozen_model.pth frozen_model.pb

Add a new backend to DeePMD-kit much more quickly if you want to contribute to DeePMD-kit.

PyTorch backend: a backend designed for large atomic models and new research

We added the PyTorch backend in DeePMD-kit v3 to support the development of new models, especially for large atomic models.

DPA-2 model: Towards a universal large atomic model for molecular and material simulation

DPA-2 model is a novel architecture for Large Atomic Model (LAM) and can accurately represent a diverse range of chemical systems and materials, enabling high-quality simulations and predictions with significantly reduced efforts compared to traditional methods. The DPA-2 model is only implemented in the PyTorch backend. An example configuration is in the examples/water/dpa2 directory.

The DPA-2 descriptor includes two primary components: repinit and repformer. The detailed architecture is shown in the following figure.

Training strategies for large atomic models

The PyTorch backend has supported multiple training strategies to develop large atomic models.

Parallel training: Large atomic models have a number of hyper-parameters and complex architecture, so training a model on multiple GPUs is necessary. Benefiting from the PyTorch community ecosystem, the parallel training for the PyTorch backend can be driven by torchrun, a launcher for distributed data parallel.

torchrun --nproc_per_node=4 --no-python dp --pt train input.json

Multi-task training: Large atomic models are trained against data in a wide scope and at different DFT levels, which requires multi-task training. The PyTorch backend supports multi-task training, sharing the descriptor between different An example is given in examples/water_multi_task/pytorch_example/input_torch.json.

Finetune: Fine-tune is useful to train a pre-train large model on a smaller, task-specific dataset. The PyTorch backend has supported --finetune argument in the dp --pt train command line.

Developing new models using Python and dynamic graphs

Researchers may feel pain about the static graph and the custom C++ OPs from the TensorFlow backend, which sacrifices research convenience for computational performance. The PyTorch backend has a well-designed code structure written using the dynamic graph, which is currently 100% written with the Python language, making extending and debugging new deep potential models easier than the static graph.

Supporting traditional deep potential models

People may still want to use the traditional models already supported by the TensorFlow backend in the PyTorch backend and compare the same model among different backends. We almost rewrote all of the traditional models in the PyTorch backend, which are listed below:

Features supported:
- Descriptor: se_e2_a, se_e2_r, se_atten, hybrid;
- Fitting: energy, dipole, polar, fparam/apram support
- Model: standard, DPRc
- Python inference interface
- C++ inference interface for energy only
- TensorBoard
Features not supported yet:
- Descriptor: se_e3, se_atten_v2, se_e2_a_mask
- Fitting: dos
- Model: linear_ener, DPLR, pairtab, linear_ener, frozen, pairwise_dprc, ZBL, Spin
- Model compression
- Python inference interface for DPLR
- C++ inference interface for tensors and DPLR
- Paralleling training using Horovod
Features not planned:
- Descriptor: loc_frame, se_e2_a + type embedding, se_a_ebd_v2
- NVNMD

Warning

As part of an alpha release, the PyTorch backend's API or user input arguments may change before the first stable version.

DP backend and format: reference backend for other backends

DP is a reference backend for development that uses pure NumPy to implement models without using any heavy deep-learning frameworks. It cannot be used for training but only for Python inference. As a reference backend, it is not aimed at the best performance but only the correct results. The DP backend uses HDF5 to store model serialization data, which is backend-independent.
The DP backend and the serialization data are used in the unit test to ensure different backends have consistent results and can be converted between each other.
In the current version, the DP backend has a similar supporting status to the PyTorch backend, while DPA-1 and DPA-2 are not supported yet.

Authors

The above highlights were mainly contributed by

Hangrui Bi (@20171130), in #3180
Chun Cai (@caic99), in #3180
Junhan Chang (@TablewareBox), in #3180
Yiming Du (@nahso), in #3180
Guolin Ke (@guolinke), in #3180
Xinzijian Liu (@zjgemi), in #3180
Anyang Peng (@anyangml), in #3362, #3192, #3212, #3210, #3248, #3266, #3281, #3296, #3309, #3314, #3321, #3327, #3338, #3351, #3376, #3385
Xuejian Qin (@qin2xue3jian4), in #3180
Han Wang (@wanghan-iapcm), in #3188, #3190, #3208, #3184, #3199, #3202, #3219, #3225, #3232, #3235, #3234, #3241, #3240, #3246, #3260, #3274, #3268, #3279, #3280, #3282, #3295, #3289, #3340, #3352, #3357, #3389, #3391, #3400
Jinzhe Zeng (@njzjz), in #3171, #3173, #3174, #3179, #3193, #3200, #3204, #3205, #3333, #3360, #3364, #3365, #3169, #3164, #3175, #3176, #3187, #3186, #3191, #3195, #3194, #3196, #3198, #3201, #3207, #3226, #3222, #3220, #3229, #3226, #3239, #3228, #3244, #3243, #3213, #3249, #3250, #3254, #3247, #3253, #3271, #3263, #3258, #3276, #3285, #3286, #3292, #3294, #3293, #3303, #3304, #3308, #3307, #3306, #3316, #3315, #3318, #3323, #3325, #3332, #3331, #3330, #3339, #3335, #3346, #3349, #3350, #3310, #3356, #3361, #3342, #3348, #3358, #3366, #3374, #3370, #3373, #3377, #3382, #3383, #3384, #3386, #3390, #3395, #3394, #3396, #3397
Chengqian Zhang (@Chengqian-Zhang), in #3180
Duo Zhang (@iProzd), in #3180, #3203, #3245, #3261, #3262, #3355, #3367, #3359, #3371, #3387, #3388, #3380, #3378
Xiangyu Zhang (@CaRoLZhangxy), in #3162, #3287, #3337, #3375, #3379

Breaking changes

Python 3.7 support is dropped. by @njzjz in #3185
We require all model files to have the correct filename extension for all interfaces so a corresponding backend can load them. TensorFlow model files must end with .pb extension.
Python class DeepTensor (including DeepDiople and DeepPolar) now returns atomic tensor in the dimension of natoms instead of nsel_atoms. by @njzjz in #3390
For developers: the Python module structure is fully refactored. The old deepmd module was moved to deepmd.tf without other API changes, and deepmd_utils was moved to deepmd without other API changes. by @njzjz in #3177, #3178

Other changes

Enhancement

Neighbor stat for the TensorFlow backend is 80x accelerated. by @njzjz in #3275
i-PI: remove normalize_coord by @njzjz in #3257
LAMMPS: fix_dplr.cpp delete redundant setup and set atom->image when pre_force by @shiruosong in #3344, #3345
Bump scikit-build-core to 0.8 by @njzjz in #3369
Bump LAMMPS to stable_2Aug2023_update3 by @njzjz in #3399
Add fparam/aparam support for fine-tune by @njzjz in #3313
TF: remove freeze warning for optional nodes by @njzjz in #3381

CI/CD

Build macos-arm64 wheel on M1 runners by @njzjz in #3206
Other improvements and fixes to GitHub Actions by @njzjz in #3238, #3283, #3284, #3288, #3290, #3326
Enable docstring code format by @njzjz in #3267

Bugfix

Fix TF 2.16 compatibility by @njzjz in #3343
Detect version in advance before building deepmd-kit-cu11 by @njzjz in #3172
C API: change the required shape of electric field to nloc * 3 by @njzjz in #3237

New Contributors

@anyangml made their first contribution in #3192
@shiruosong made their first contribution in #3344

Full Changelog: v2.2.8...v3.0.0a0

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v3.0.0a0