From f881951b95d54e29b8c9aa00b188affb8608339e Mon Sep 17 00:00:00 2001 From: frostedoyster Date: Tue, 5 Nov 2024 14:44:12 +0100 Subject: [PATCH] Add dataset information overview to the dev docs --- docs/src/dev-docs/dataset-information.rst | 38 +++++++++++++++++++++++ docs/src/dev-docs/index.rst | 1 + docs/src/dev-docs/utils/data/index.rst | 2 ++ 3 files changed, 41 insertions(+) create mode 100644 docs/src/dev-docs/dataset-information.rst diff --git a/docs/src/dev-docs/dataset-information.rst b/docs/src/dev-docs/dataset-information.rst new file mode 100644 index 000000000..1db07402e --- /dev/null +++ b/docs/src/dev-docs/dataset-information.rst @@ -0,0 +1,38 @@ +Dataset Information +=================== + +When working with ``metatrain``, you will most likely need to interact with some core +classes which are responsible for storing some information about datasets. All these +classes belong to the ``metatrain.utils.data`` module which can be found in the +:ref:`data` section of the developer documentation. + +These classes are: + +- :py:class:`metatrain.utils.data.DatasetInfo`: This class is responsible for storing + information about a dataset. It contains the length unit used in the dataset, the + atomic types present, as well as information about the dataset's targets as a + ``Dict[str, TargetInfo]`` object. The keys of this dictionary are the names of the + targets in the datasets (e.g., ``energy``, ``mtt::dipole``, etc.). + +- :py:class:`metatrain.utils.data.TargetInfo`: This class is responsible for storing + information about a target in a dataset. It contains the target's physical quantity, + the unit in which the target is expressed, and the ``layout`` of the target. The + ``layout`` is ``TensorMap`` object with zero samples which is used to exemplify + the metadata of each target. + +At the moment, only three types of layouts are supported: + +- scalar: This type of layout is used when the target is a scalar quantity. The + ``layout`` ``TensorMap`` object corresponding to a scalar must have one + ``TensorBlock`` and no ``components``. +- Cartesian tensor: This type of layout is used when the target is a Cartesian tensor. + The ``layout`` ``TensorMap`` object corresponding to a Cartesian tensor must have + one ``TensorBlock`` and as many ``components`` as the tensor's rank. These + components are named ``xyz`` for a tensor of rank 1 and ``xyz_1``, ``xyz_2``, and + so on for higher ranks. +- Spherical tensor: This type of layout is used when the target is a spherical tensor. + The ``layout`` ``TensorMap`` object corresponding to a spherical tensor can have + multiple blocks corresponding to different irreps (irreducible representations) of + the target. The ``keys`` of the ``TensorMap`` object must have the ``o3_lambda`` + and ``o3_sigma`` names, and each ``TensorBlock`` must have a single component named + ``o3_mu``. diff --git a/docs/src/dev-docs/index.rst b/docs/src/dev-docs/index.rst index 9dd337c6d..8dd3d91da 100644 --- a/docs/src/dev-docs/index.rst +++ b/docs/src/dev-docs/index.rst @@ -12,5 +12,6 @@ module. getting-started architecture-life-cycle new-architecture + dataset-information cli/index utils/index diff --git a/docs/src/dev-docs/utils/data/index.rst b/docs/src/dev-docs/utils/data/index.rst index 5f7f80970..a3c3c44c3 100644 --- a/docs/src/dev-docs/utils/data/index.rst +++ b/docs/src/dev-docs/utils/data/index.rst @@ -1,6 +1,8 @@ Data ==== +.. _data: + API for handling data in ``metatrain``. .. toctree::