Skip to content

Commit

Permalink
Use doped structure matching (~2 order of magnitude speed increase)
Browse files Browse the repository at this point in the history
  • Loading branch information
kavanase committed Nov 2, 2024
1 parent 080989d commit e3cce49
Show file tree
Hide file tree
Showing 10 changed files with 146 additions and 217 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -135,13 +135,13 @@ Automatic testing is run on the master and develop branches using Github Actions
- B. E. Murdock et al. **_Li-Site Defects Induce Formation of Li-Rich Impurity Phases: Implications for Charge Distribution and Performance of LiNi<sub>0.5-x</sub>M<sub>x</sub>Mn<sub>1.5</sub>O<sub>4</sub> Cathodes (M = Fe and Mg; x = 0.05–0.2)_** [_Advanced Materials_](https://doi.org/10.1002/adma.202400343) 2024
- Y. Fu & H. Lohan et al. **_Factors Enabling Delocalized Charge-Carriers in Pnictogen-Based
Solar Absorbers: In-depth Investigation into CuSbSe<sub>2</sub>_** [_arXiv_](https://doi.org/10.48550/arXiv.2401.02257) 2024
- S. Hachmioune et al. **_Exploring the Thermoelectric Potential of MgB4: Electronic Band Structure, Transport Properties, and Defect Chemistry_** [_Chemistry of Materials_](https://doi.org/10.1021/acs.chemmater.4c00584) 2024
- J. Hu et al. **_Enabling ionic transport in Li3AlP2 the roles of defects and disorder_** [_ChemRxiv_](https://doi.org/10.26434/chemrxiv-2024-3s0kh) 2024
- A. G. Squires et al. **_Oxygen dimerization as a defect-driven process in bulk LiNiO2<sub>2</sub>_** [_ACS Energy Letters_](https://pubs.acs.org/doi/10.1021/acsenergylett.4c01307) 2024
- X. Wang et al. **_Upper efficiency limit of Sb<sub>2</sub>Se<sub>3</sub> solar cells_** [_Joule_](https://doi.org/10.1016/j.joule.2024.05.004) 2024
- I. Mosquera-Lois et al. **_Machine-learning structural reconstructions for accelerated point defect calculations_** [_npj Computational Materials_](https://doi.org/10.1038/s41524-024-01303-9) 2024
- S. R. Kavanagh et al. **_doped: Python toolkit for robust and repeatable charged defect supercell calculations_** [_Journal of Open Source Software_](https://doi.org/10.21105/joss.06433) 2024
- K. Li et al. **_Computational Prediction of an Antimony-based n-type Transparent Conducting Oxide: F-doped Sb<sub>2</sub>O<sub>5</sub>_** [_Chemistry of Materials_](https://doi.org/10.1021/acs.chemmater.3c03257) 2024
- S. Hachmioune et al. **_Exploring the Thermoelectric Potential of MgB4: Electronic Band Structure, Transport Properties, and Defect Chemistry_** [_Chemistry of Materials_](https://doi.org/10.1021/acs.chemmater.4c00584) 2024
- J. Hu et al. **_Enabling ionic transport in Li3AlP2 the roles of defects and disorder_** [_ChemRxiv_](https://doi.org/10.26434/chemrxiv-2024-3s0kh) 2024
- X. Wang et al. **_Four-electron negative-U vacancy defects in antimony selenide_** [_Physical Review B_](https://journals.aps.org/prb/abstract/10.1103/PhysRevB.108.134102) 2023
- Y. Kumagai et al. **_Alkali Mono-Pnictides: A New Class of Photovoltaic Materials by Element Mutation_** [_PRX Energy_](http://dx.doi.org/10.1103/PRXEnergy.2.043002) 2023
- A. T. J. Nicolson et al. **_Cu<sub>2</sub>SiSe<sub>3</sub> as a promising solar absorber: harnessing cation dissimilarity to avoid killer antisites_** [_Journal of Materials Chemistry A_](https://doi.org/10.1039/D3TA02429F) 2023
Expand Down
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -218,13 +218,13 @@ Studies using ``ShakeNBreak``
- Z\. Yuan & G. Hautier **First-principles study of defects and doping limits in CaO** `Applied Physics Letters <https://doi.org/10.1063/5.0211707>`_ 2024
- B\. E. Murdock et al. **Li-Site Defects Induce Formation of Li-Rich Impurity Phases: Implications for Charge Distribution and Performance of LiNi** :sub:`0.5-x` **M** :sub:`x` **Mn** :sub:`1.5` **O₄ Cathodes (M = Fe and Mg; x = 0.05–0.2)** `Advanced Materials <https://doi.org/10.1002/adma.202400343>`_ 2024
- A\. G. Squires et al. **Oxygen dimerization as a defect-driven process in bulk LiNiO₂** `ACS Energy Letters <https://pubs.acs.org/doi/10.1021/acsenergylett.4c01307>`__ 2024
- Y\. Fu & H. Lohan et al. **Factors Enabling Delocalized Charge-Carriers in Pnictogen-Based Solar Absorbers: In-depth Investigation into CuSbSe<sub>2</sub>** `arXiv <https://doi.org/10.48550/arXiv.2401.02257>`_ 2024
- S\. Hachmioune et al. **Exploring the Thermoelectric Potential of MgB₄: Electronic Band Structure, Transport Properties, and Defect Chemistry** `Chemistry of Materials <https://doi.org/10.1021/acs.chemmater.4c00584>`_ 2024
- J\. Hu et al. **Enabling ionic transport in Li₃AlP₂ the roles of defects and disorder** `ChemRxiv <https://doi.org/10.26434/chemrxiv-2024-3s0kh>`_ 2024
- Y\. Fu & H. Lohan et al. **Factors Enabling Delocalized Charge-Carriers in Pnictogen-Based Solar Absorbers: In-depth Investigation into CuSbSe₂** `arXiv <https://doi.org/10.48550/arXiv.2401.02257>`_ 2024
- X\. Wang et al. **Upper efficiency limit of Sb₂Se₃ solar cells** `Joule <https://doi.org/10.1016/j.joule.2024.05.004>`_ 2024
- I\. Mosquera-Lois et al. **Machine-learning structural reconstructions for accelerated point defect calculations** `npj Computational Materials <https://doi.org/10.1038/s41524-024-01303-9>`_ 2024
- S\. R. Kavanagh et al. **doped: Python toolkit for robust and repeatable charged defect supercell calculations** `Journal of Open Source Software <https://doi.org/10.21105/joss.06433>`_ 2024
- K\. Li et al. **Computational Prediction of an Antimony-based n-type Transparent Conducting Oxide: F-doped Sb₂O₅** `Chemistry of Materials <https://doi.org/10.1021/acs.chemmater.3c03257>`_ 2024
- S\. Hachmioune et al. **Exploring the Thermoelectric Potential of MgB₄: Electronic Band Structure, Transport Properties, and Defect Chemistry** `Chemistry of Materials <https://doi.org/10.1021/acs.chemmater.4c00584>`_ 2024
- J\. Hu et al. **Enabling ionic transport in Li₃AlP₂ the roles of defects and disorder** `ChemRxiv <https://doi.org/10.26434/chemrxiv-2024-3s0kh>`_ 2024
- X\. Wang et al. **Four-electron negative-U vacancy defects in antimony selenide** `Physical Review B <https://journals.aps.org/prb/abstract/10.1103/PhysRevB.108.134102>`_ 2023
- Y\. Kumagai et al. **Alkali Mono-Pnictides: A New Class of Photovoltaic Materials by Element Mutation** `PRX Energy <http://dx.doi.org/10.1103/PRXEnergy.2.043002>`__ 2023
- J\. Willis, K. B. Spooner, D. O. Scanlon. **On the possibility of p-type doping in barium stannate** `Applied Physics Letters <https://doi.org/10.1063/5.0170552>`__ 2023
Expand Down
149 changes: 37 additions & 112 deletions shakenbreak/analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,10 +12,10 @@

import numpy as np
import pandas as pd
from doped.utils.configurations import _scan_sm_stol_till_match
from doped.utils.parsing import get_outcar
from monty.serialization import loadfn
from pymatgen.analysis.local_env import CrystalNN
from pymatgen.analysis.structure_matcher import StructureMatcher
from pymatgen.core.composition import Composition, Element
from pymatgen.core.structure import IStructure, PeriodicSite, Structure
from pymatgen.io.vasp.outputs import Outcar
Expand Down Expand Up @@ -545,80 +545,48 @@ def get_energies(
def _cached_calculate_atomic_disp(
struct1: Structure,
struct2: Structure,
stol: float = 0.5,
ltol: float = 0.3,
angle_tol: float = 5,
) -> tuple:
"""
``_calculate_atomic_disp`` but with lru-caching.
Should only really be used internally in ``ShakeNBreak``
as it relies on the ``Structure`` hashes, which in ``pymatgen``
is just the composition which is unusable here, but this is
monkey-patched in ``calculate_struct_comparison`` below for
fast internal usage.
"""
return _calculate_atomic_disp(struct1, struct2, stol, ltol, angle_tol)


def _calculate_atomic_disp(
struct1: Structure,
struct2: Structure,
stol: float = 0.5,
ltol: float = 0.3,
angle_tol: float = 5,
**sm_kwargs,
) -> tuple:
"""
Calculate root-mean-square displacement and atomic displacements,
normalized by the free length per atom ((Vol/Nsites)^(1/3)) between
two structures.
Should only really be used internally in ``ShakeNBreak``
as the caching in this function relies on the ``Structure`` hashes,
which in ``pymatgen`` is just the composition which is unusable here,
but this is monkey-patched in ``calculate_struct_comparison`` in
``shakenbreak.analysis`` for fast internal usage.
Args:
struct1 (:obj:`Structure`):
Structure to compare to struct2.
struct2 (:obj:`Structure`):
Structure to compare to struct1.
stol (:obj:`float`):
Site tolerance used for structural comparison (via
`pymatgen`'s `StructureMatcher`), as a fraction of the
average free length per atom := ( V / Nsites ) ** (1/3). If
output contains too many 'NaN' values, this likely needs to
be increased.
(Default: 0.5)
ltol (:obj:`float`):
Length tolerance used for structural comparison (via
`pymatgen`'s `StructureMatcher`).
(Default: 0.3)
angle_tol (:obj:`float`):
Angle tolerance used for structural comparison (via
`pymatgen`'s `StructureMatcher`).
(Default: 5)
**sm_kwargs:
Additional keyword arguments to pass to ``_scan_sm_stol_till_match``
in ``doped`` (used for ultra-fast structure matching), such as
``min_stol``, ``max_stol``, ``stol_factor`` etc.
Returns:
:obj:`tuple`:
Tuple of normalized root mean squared displacements and
normalized displacements between the two structures.
"""
# StructureMatcher._cart_dists() is the performance bottleneck for large supercells here. It could
# ``StructureMatcher._cart_dists()`` is the performance bottleneck for large supercells here. It could
# likely be made faster using Cython/numba (see
# https://github.com/materialsproject/pymatgen/issues/2593), caching and/or multiprocessing
sm = StructureMatcher(ltol=ltol, stol=stol, angle_tol=angle_tol, primitive_cell=False, scale=True)
struct1, struct2 = sm._process_species([struct1, struct2])
struct1, struct2, fu, s1_supercell = sm._preprocess(struct1, struct2)
match = sm._match(struct1, struct2, fu, s1_supercell, use_rms=True, break_on_match=False)

return None if match is None else (match[0], match[1])
# https://github.com/materialsproject/pymatgen/issues/2593), caching and/or multiprocessing, but
# ``doped`` efficiency improvements have made it multiple orders of magnitude faster:
return _scan_sm_stol_till_match(struct1, struct2, func_name="_get_atomic_disps", **sm_kwargs)


def calculate_struct_comparison(
defect_structures_dict: dict,
metric: str = "max_dist",
ref_structure: Union[str, float, Structure] = "Unperturbed",
stol: float = 0.5,
ltol: float = 0.3,
angle_tol: float = 5,
min_dist: float = 0.1,
verbose: bool = False,
**sm_kwargs,
) -> dict:
"""
Calculate either the summed atomic displacement, with metric = "disp",
Expand Down Expand Up @@ -646,26 +614,15 @@ def calculate_struct_comparison(
or a pymatgen Structure object (to compare with a specific external
structure).
(Default: "Unperturbed")
stol (:obj:`float`):
Site tolerance used for structural comparison (via
`pymatgen`'s `StructureMatcher`), as a fraction of the
average free length per atom := ( V / Nsites ) ** (1/3). If
output contains too many 'NaN' values, this likely needs to
be increased.
(Default: 0.5)
ltol (:obj:`float`):
Length tolerance used for structural comparison (via
`pymatgen`'s `StructureMatcher`).
(Default: 0.3)
angle_tol (:obj:`float`):
Angle tolerance used for structural comparison (via
`pymatgen`'s `StructureMatcher`).
(Default: 5)
min_dist (:obj:`float`):
Minimum atomic displacement threshold to include in atomic
displacements sum (in Å, default 0.1 Å).
verbose (:obj:`bool`):
Whether to print information message about structures being compared.
**sm_kwargs:
Additional keyword arguments to pass to ``_scan_sm_stol_till_match``
in ``doped`` (used for ultra-fast structure matching), such as
``min_stol``, ``max_stol``, ``stol_factor`` etc.
Returns:
:obj:`dict`:
Expand Down Expand Up @@ -724,9 +681,7 @@ def calculate_struct_comparison(
_, norm_dist = _cached_calculate_atomic_disp(
struct1=ref_structure,
struct2=defect_structures_dict[distortion],
stol=stol,
ltol=ltol,
angle_tol=angle_tol,
**sm_kwargs,
)
if metric == "disp":
disp_dict[distortion] = (
Expand All @@ -737,9 +692,9 @@ def calculate_struct_comparison(
disp_dict[distortion] = max(norm_dist) / normalization # Remove normalization
else:
raise ValueError(f"Invalid metric '{metric}'. Must be one of 'disp' or 'max_dist'.")
except TypeError:
disp_dict[distortion] = None # algorithm couldn't match lattices. Set comparison
# metric to None
except TypeError as exc:
raise exc
disp_dict[distortion] = None # algorithm couldn't match lattices. Set metric to None
# warnings.warn(
# f"pymatgen StructureMatcher could not match lattices between "
# f"{ref_name} and {distortion} structures."
Expand All @@ -752,11 +707,11 @@ def compare_structures(
defect_structures_dict: dict,
defect_energies_dict: dict,
ref_structure: Union[str, float, Structure] = "Unperturbed",
stol: float = 0.5,
units: str = "eV",
min_dist: float = 0.1,
display_df: bool = True,
verbose: bool = True,
**sm_kwargs,
) -> Union[None, pd.DataFrame]:
"""
Compare final bond-distorted structures with either 'Unperturbed' or
Expand All @@ -777,13 +732,6 @@ def compare_structures(
or a pymatgen Structure object (to compare with a specific external
structure).
(Default: "Unperturbed")
stol (:obj:`float`):
Site tolerance used for structural comparison (via
`pymatgen`'s `StructureMatcher`), as a fraction of the
average free length per atom := ( V / Nsites ) ** (1/3). If
structure comparison output contains too many 'NaN' values,
this likely needs to be increased.
(Default: 0.5)
units (:obj:`str`):
Energy units label for outputs (either 'eV' or 'meV').
Should be the same as the units in `defect_energies_dict`,
Expand All @@ -797,6 +745,10 @@ def compare_structures(
interactively in Jupyter/Ipython (Default: True).
verbose (:obj:`bool`):
Whether to print information message about structures being compared.
**sm_kwargs:
Additional keyword arguments to pass to ``_scan_sm_stol_till_match``
in ``doped`` (used for ultra-fast structure matching), such as
``min_stol``, ``max_stol``, ``stol_factor`` etc.
Returns:
:obj:`pd.DataFrame`:
Expand All @@ -808,44 +760,17 @@ def compare_structures(
warnings.warn("All structures in defect_structures_dict are not converged. Returning None.")
return None
df_list = []
disp_dict = calculate_struct_comparison(
defect_structures_dict,
metric="disp",
ref_structure=ref_structure,
stol=stol,
min_dist=min_dist,
verbose=verbose,
)
max_dist_dict = calculate_struct_comparison(
defect_structures_dict,
metric="max_dist",
ref_structure=ref_structure,
stol=stol,
verbose=False, # only print "Comparing to..." once
)
# Check if too many 'NaN' values in disp_dict, if so, try with higher stol
number_of_nan = len([value for value in disp_dict.values() if value is None])
if number_of_nan > len(disp_dict.values()) // 3:
warnings.warn(
f"The specified tolerance {stol} seems to be too tight as"
" too many lattices could not be matched. Will retry with"
f" larger tolerance ({stol+0.4})."
)
max_dist_dict = calculate_struct_comparison(
disp_dict, max_dist_dict = (
calculate_struct_comparison( # cached to avoid redundant calculation time
defect_structures_dict,
metric="max_dist",
metric=i,
ref_structure=ref_structure,
stol=stol + 0.4,
verbose=False,
)
disp_dict = calculate_struct_comparison(
defect_structures_dict,
metric="disp",
ref_structure=ref_structure,
stol=stol + 0.4,
min_dist=min_dist,
verbose=False,
verbose=verbose if i == "disp" else False, # only print "Comparing to..." once
**sm_kwargs,
)
for i in ["disp", "max_dist"]
)

for distortion in defect_energies_dict["distortions"]:
try:
Expand Down
Loading

0 comments on commit e3cce49

Please sign in to comment.