-
Notifications
You must be signed in to change notification settings - Fork 8
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
- Loading branch information
1 parent
2e7c48e
commit 906e232
Showing
1 changed file
with
63 additions
and
0 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,63 @@ | ||
--- | ||
title: "What Can DP Do too? | Data Dimensionality Reduction Method Facilitates the Construction of DeePMD Force Field Feature Datasets" | ||
date: 2025-01-23 | ||
categories: | ||
- DeePMD-kit | ||
--- | ||
|
||
In machine learning potential development, it is challenging to sample complex chemical spaces with a small amount of data. The research group led by Dongping Chen from Beijing Institute of Technology constructed a feature dataset with low redundancy and low data requirements based on a data dimensionality reduction method. This dataset was combined with the DeePMD method to develop a machine learning potential with high accuracy, high efficiency, and wide application conditions, aiming to describe the interactions in aluminum-lithium (Al-Li) alloys and ammonium perchlorate (AP) interfaces. The relevant research results were published in the journal Journal of Chemical Theory and Computation under the title "Minimizing Redundancy and Data Requirements of Machine Learning Potential: A Case Study in Interface Combustion" (DOI: 10.1021/acs.jctc.4c00587)[1]. Xiaoya Chang, a doctoral student, is the first author. | ||
|
||
|
||
<!-- more --> | ||
|
||
## Research Background | ||
|
||
Machine learning potentials balance the accuracy and efficiency of molecular simulations, enabling large-scale molecular dynamics simulations with first-principles accuracy. The quality of the training set directly determines the performance and extrapolation ability of the force field model. Currently, most research relies on AIMD sampling to cover the reaction potential energy surfaces in target scenarios, but this process often incurs high computational costs. It remains a significant challenge in force field development to accurately describe the characteristics of complex reaction potential energy surfaces with a small amount of data. | ||
|
||
Compared with simple substances, interface systems have higher sampling dimensions, especially in complex interface combustion reaction scenarios. This study focuses on novel aluminum-lithium alloy propellants. By combining a data dimensionality reduction method, a degenerate representative dataset was constructed, and a Deep Potential (DP) force field was developed to describe the interactions between Al-Li and AP. | ||
|
||
## Method Introduction | ||
|
||
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/DeePMD_23_01_2025/pic1.jpeg pic_center width="60%" height="60%" /></center> | ||
|
||
*Figure 1 Workflow of force field development based on the data dimensionality reduction method.* | ||
|
||
Firstly, the target system is fully pre-sampled using an empirical force field or a published machine learning potential. Subsequently, the SOAP descriptor is used to represent the structural information of the pre-sampled configurations, and the principal component analysis (PCA) method is employed to simplify the high-dimensional structural features into two-dimensional features. Configurations with similar structural features are distributed closely on the PCA plane. Grids are divided on the two-dimensional plane, and one configuration is randomly selected from each grid to represent all the configurations within the grid. First-principles calculations are performed on the selected structures to establish a feature dataset and develop a DP force field. Finally, active learning is carried out based on the DP-GEN framework to supplement and improve the force field model. | ||
|
||
## Method Validation | ||
|
||
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/DeePMD_23_01_2025/pic2.png pic_center width="60%" height="60%" /></center> | ||
|
||
*Figure 2 (a) PCA distribution and (b) grid division of the AP pyrolysis dataset. (c) Dataset size and (d) predicted AP decomposition curves for different grid sizes.* | ||
|
||
Based on the strategies of extracting structural features using the SOAP descriptor, PCA dimensionality reduction, and grid division for selection, the method was first validated using the AP monomer pyrolysis dataset[2]. The original dataset contained 11,826 data structures. As the size of the divided grids increased, the size of the dataset decreased continuously. When the grid size was 15, the dataset only contained 331 configurations (accounting for 2.7% of the total dataset), and the energy prediction accuracy only decreased by 14%. The trained DP force field could still accurately predict the AP pyrolysis curve. The above tests verified the feasibility of the workflow and emphasized the significance of the feature dataset for force field development. | ||
|
||
## Method Application | ||
|
||
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/DeePMD_23_01_2025/pic3.jpeg pic_center width="60%" height="60%" /></center> | ||
|
||
*Figure 3 PCA distribution of the Al - Li and AP interface dataset.* | ||
|
||
For the Al-Li alloy and AP interface system, the PCA distribution of its feature dataset is shown in Figure 3. The PC1 component represents the pressure of the system, ranging from gas-phase combustion to solid-phase combustion; the PC2 component represents the lithium content in the system, ranging from pure aluminum to 20 wt% lithium doping. The configurations obtained from the active learning of DP-GEN are still within the distribution range of the training set, supplementing and improving the feature dataset. The final dataset only contains 11,251 structures and can be used to study interface reactions under a wide range of temperatures, pressures, and lithium contents. | ||
|
||
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/DeePMD_23_01_2025/pic4.png pic_center width="60%" height="60%" /></center> | ||
|
||
*Figure 4 PCA distribution of the training set (gray), validation set (red solid dots), and surface adsorption configurations (red hollow dots).* | ||
|
||
The mean absolute error (MAE) of the energy prediction of the developed DP force field is 7.54 meV/atom. To characterize the extrapolation ability of this force field, a validation set was additionally constructed in the article. Its PCA distribution is within the training set, and the MAE of the energy prediction of the DP force field for the validation set is 15.25 meV/atom. For the surface small molecule adsorption configurations, their PCA distribution is far from the training set range, that is, far beyond the training range and prediction ability of the force field. Therefore, the deviation between the energy prediction and the DFT calculation results is large, reaching 547.56 meV/atom. The above results indicate that the PCA distribution can be used to measure the prediction range and extrapolation ability of the DP force field. | ||
|
||
<center><img src=https://dp-public.oss-cn-beijing.aliyuncs.com/community/Blog%20Files/DeePMD_23_01_2025/pic5.png pic_center width="60%" height="60%" /></center> | ||
|
||
*Figure 5 Influence of lithium doping on mass diffusion at the Al - AP interface.* | ||
|
||
Using the developed DP force field, the article compared the heat and mass transfer and reaction processes at the Al-AP interface and the AlLi-AP interface. The mass diffusion process is shown in Figure 5. Lithium atoms have strong reactivity. As the temperature rises, the lithium atoms at both ends of the alloy gradually diffuse into the AP region, and their diffusion coefficient is three times that of aluminum atoms. At the same time, the diffusion of lithium atoms also promotes the decomposition of AP, advancing it by 8 ps. This result shows that adding an Al-Li alloy helps to improve the combustion efficiency of the propellant. | ||
|
||
## Summary | ||
|
||
Based on the data dimensionality reduction method, this article proposes an innovative strategy for constructing a feature dataset: First, an existing force field is used to conduct extensive pre-sampling of complex potential energy surfaces; then, the PCA method is used to screen feature data; finally, an active learning strategy is adopted to perform local fine-tuning and optimization of the DP force field. This strategy can not only be effectively applied to the sampling of complex potential energy surfaces but also provides important reference and inspiration for the force field development of complex interface reaction systems. | ||
|
||
## References | ||
|
||
[1] Chang, X., Zhang, D., Chu, Q., & Chen, D. Minimizing Redundancy and Data Requirements of Machine Learning Potential: A Case Study in Interface Combustion. Journal of Chemical Theory and Computation, 2024, 20(15), 6813 - 6825. | ||
|
||
[2] Chu, Q., Wen, M., Fu, X., Eslami, A., & Chen, D. Reaction Network of Ammonium Perchlorate (AP) Decomposition: The Missing Piece from Atomic Simulations. The Journal of Physical Chemistry C, 2023, 127(27), 12976 - 12982. |