Skip to content

Commit

Permalink
Add dataset information overview to the dev docs
Browse files Browse the repository at this point in the history
  • Loading branch information
frostedoyster committed Nov 5, 2024
1 parent f7355dd commit f881951
Show file tree
Hide file tree
Showing 3 changed files with 41 additions and 0 deletions.
38 changes: 38 additions & 0 deletions docs/src/dev-docs/dataset-information.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
Dataset Information
===================

When working with ``metatrain``, you will most likely need to interact with some core
classes which are responsible for storing some information about datasets. All these
classes belong to the ``metatrain.utils.data`` module which can be found in the
:ref:`data` section of the developer documentation.

These classes are:

- :py:class:`metatrain.utils.data.DatasetInfo`: This class is responsible for storing
information about a dataset. It contains the length unit used in the dataset, the
atomic types present, as well as information about the dataset's targets as a
``Dict[str, TargetInfo]`` object. The keys of this dictionary are the names of the
targets in the datasets (e.g., ``energy``, ``mtt::dipole``, etc.).

- :py:class:`metatrain.utils.data.TargetInfo`: This class is responsible for storing
information about a target in a dataset. It contains the target's physical quantity,
the unit in which the target is expressed, and the ``layout`` of the target. The
``layout`` is ``TensorMap`` object with zero samples which is used to exemplify
the metadata of each target.

At the moment, only three types of layouts are supported:

- scalar: This type of layout is used when the target is a scalar quantity. The
``layout`` ``TensorMap`` object corresponding to a scalar must have one
``TensorBlock`` and no ``components``.
- Cartesian tensor: This type of layout is used when the target is a Cartesian tensor.
The ``layout`` ``TensorMap`` object corresponding to a Cartesian tensor must have
one ``TensorBlock`` and as many ``components`` as the tensor's rank. These
components are named ``xyz`` for a tensor of rank 1 and ``xyz_1``, ``xyz_2``, and
so on for higher ranks.
- Spherical tensor: This type of layout is used when the target is a spherical tensor.
The ``layout`` ``TensorMap`` object corresponding to a spherical tensor can have
multiple blocks corresponding to different irreps (irreducible representations) of
the target. The ``keys`` of the ``TensorMap`` object must have the ``o3_lambda``
and ``o3_sigma`` names, and each ``TensorBlock`` must have a single component named
``o3_mu``.
1 change: 1 addition & 0 deletions docs/src/dev-docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -12,5 +12,6 @@ module.
getting-started
architecture-life-cycle
new-architecture
dataset-information
cli/index
utils/index
2 changes: 2 additions & 0 deletions docs/src/dev-docs/utils/data/index.rst
Original file line number Diff line number Diff line change
@@ -1,6 +1,8 @@
Data
====

.. _data:

API for handling data in ``metatrain``.

.. toctree::
Expand Down

0 comments on commit f881951

Please sign in to comment.