From f881951b95d54e29b8c9aa00b188affb8608339e Mon Sep 17 00:00:00 2001
From: frostedoyster <bigi.f@libero.it>
Date: Tue, 5 Nov 2024 14:44:12 +0100
Subject: [PATCH] Add dataset information overview to the dev docs

---
 docs/src/dev-docs/dataset-information.rst | 38 +++++++++++++++++++++++
 docs/src/dev-docs/index.rst               |  1 +
 docs/src/dev-docs/utils/data/index.rst    |  2 ++
 3 files changed, 41 insertions(+)
 create mode 100644 docs/src/dev-docs/dataset-information.rst

diff --git a/docs/src/dev-docs/dataset-information.rst b/docs/src/dev-docs/dataset-information.rst
new file mode 100644
index 000000000..1db07402e
--- /dev/null
+++ b/docs/src/dev-docs/dataset-information.rst
@@ -0,0 +1,38 @@
+Dataset Information
+===================
+
+When working with ``metatrain``, you will most likely need to interact with some core
+classes which are responsible for storing some information about datasets. All these
+classes belong to the ``metatrain.utils.data`` module which can be found in the
+:ref:`data` section of the developer documentation.
+
+These classes are:
+
+- :py:class:`metatrain.utils.data.DatasetInfo`: This class is responsible for storing
+  information about a dataset. It contains the length unit used in the dataset, the
+  atomic types present, as well as information about the dataset's targets as a
+  ``Dict[str, TargetInfo]`` object. The keys of this dictionary are the names of the
+  targets in the datasets (e.g., ``energy``, ``mtt::dipole``, etc.).
+
+- :py:class:`metatrain.utils.data.TargetInfo`: This class is responsible for storing
+    information about a target in a dataset. It contains the target's physical quantity,
+    the unit in which the target is expressed, and the ``layout`` of the target. The
+    ``layout`` is ``TensorMap`` object with zero samples which is used to exemplify
+    the metadata of each target.
+
+At the moment, only three types of layouts are supported:
+
+- scalar: This type of layout is used when the target is a scalar quantity. The
+    ``layout`` ``TensorMap`` object corresponding to a scalar must have one
+    ``TensorBlock`` and no ``components``.
+- Cartesian tensor: This type of layout is used when the target is a Cartesian tensor.
+    The ``layout`` ``TensorMap`` object corresponding to a Cartesian tensor must have
+    one ``TensorBlock`` and as many ``components`` as the tensor's rank. These
+    components are named ``xyz`` for a tensor of rank 1 and ``xyz_1``, ``xyz_2``, and
+    so on for higher ranks.
+- Spherical tensor: This type of layout is used when the target is a spherical tensor.
+    The ``layout`` ``TensorMap`` object corresponding to a spherical tensor can have
+    multiple blocks corresponding to different irreps (irreducible representations) of
+    the target. The ``keys`` of the ``TensorMap`` object must have the ``o3_lambda``
+    and ``o3_sigma`` names, and each ``TensorBlock`` must have a single component named
+    ``o3_mu``.
diff --git a/docs/src/dev-docs/index.rst b/docs/src/dev-docs/index.rst
index 9dd337c6d..8dd3d91da 100644
--- a/docs/src/dev-docs/index.rst
+++ b/docs/src/dev-docs/index.rst
@@ -12,5 +12,6 @@ module.
    getting-started
    architecture-life-cycle
    new-architecture
+   dataset-information
    cli/index
    utils/index
diff --git a/docs/src/dev-docs/utils/data/index.rst b/docs/src/dev-docs/utils/data/index.rst
index 5f7f80970..a3c3c44c3 100644
--- a/docs/src/dev-docs/utils/data/index.rst
+++ b/docs/src/dev-docs/utils/data/index.rst
@@ -1,6 +1,8 @@
 Data
 ====
 
+.. _data:
+
 API for handling data in ``metatrain``.
 
 .. toctree::