metatensor · frostedoyster · Feb 26, 2024 · Feb 26, 2024
diff --git a/docs/src/dev-docs/utils/data/readers/index.rst b/docs/src/dev-docs/utils/data/readers/index.rst
@@ -1,10 +1,10 @@
-Structure and Target data Readers
+system and Target data Readers
 =================================
 
-The main entry point for reading structure and target information are the two reader
+The main entry point for reading system and target information are the two reader
 functions
 
-.. autofunction:: metatensor.models.utils.data.read_structures
+.. autofunction:: metatensor.models.utils.data.read_systems
 .. autofunction:: metatensor.models.utils.data.read_targets
 
 Target type specific readers
@@ -28,5 +28,5 @@ these refer to their documentation
 .. toctree::
    :maxdepth: 1
 
-   structure
+   systems
    targets
diff --git a/docs/src/dev-docs/utils/data/readers/structure.rst b/docs/src/dev-docs/utils/data/readers/structure.rst
diff --git a/docs/src/dev-docs/utils/data/readers/systems.rst b/docs/src/dev-docs/utils/data/readers/systems.rst
@@ -0,0 +1,13 @@
+system Readers
+#################
+
+Parsers for obtaining information from systems. All readers return a :py:class:`list`
+of :py:class:`metatensor.torch.atomistic.System`. The mapping which reader is used for
+which file type is stored in
+
+.. autodata:: metatensor.models.utils.data.readers.systems.SYSTEM_READERS
+
+Implemented Readers
+-------------------
+
+.. autofunction:: metatensor.models.utils.data.readers.systems.read_systems_ase
diff --git a/docs/src/getting-started/custom_dataset_conf.rst b/docs/src/getting-started/custom_dataset_conf.rst
@@ -12,7 +12,7 @@ parsing data for training. Mandatory sections in the `options.yaml` file include
 - ``test_set``
 - ``validation_set``
 
-Each section can follow a similar structure, with shorthand methods available to
+Each section can follow a similar system, with shorthand methods available to
 simplify dataset definitions.
 
 Minimal Configuration Example
@@ -36,7 +36,7 @@ format, which is also valid for initial input:
 .. code-block:: yaml
 
     training_set:
-        structures:
+        systems:
             read_from: dataset.xyz
             file_format: .xyz
             length_unit: null
@@ -61,13 +61,13 @@ format, which is also valid for initial input:
 
 Understanding the YAML Block
 ----------------------------
-The ``training_set`` is divided into sections ``structures`` and ``targets``:
+The ``training_set`` is divided into sections ``systems`` and ``targets``:
 
-Structures Section
-^^^^^^^^^^^^^^^^^^
-Describes the structure data like positions and cell information.
+Systems Section
+^^^^^^^^^^^^^^^
+Describes the system data like positions and cell information.
 
-:param read_from: The file containing structure data.
+:param read_from: The file containing system data.
 :param file_format: The file format, guessed from the suffix if ``null`` or not
     provided.
 :param length_unit: The unit of lengths, optional but recommended for simulations.
@@ -93,7 +93,7 @@ Target section parameters include:
 
 :param quantity: The target's quantity (e.g., ``energy``, ``dipole``). Currently only
     ``energy`` is supported.
-:param read_from: The file for target data, defaults to the ``structures.read_from``
+:param read_from: The file for target data, defaults to the ``systems.read_from``
   file if not provided.
 :param file_format: The file format, guessed from the suffix if not provided.
 :param key: The key for reading from the file, defaulting to the target section's name
@@ -135,15 +135,15 @@ starting with a ``"- "`` (a dash and a space)
 .. code-block:: yaml
 
     training_set:
-        - structures:
+        - systems:
               read_from: dataset_0.xyz
               length_unit: angstrom
           targets:
               energy:
                   quantity: energy
                   key: my_energy_label0
                   unit: eV
-        - structures:
+        - systems:
               read_from: dataset_1.xyz
               length_unit: angstrom
           targets:

diff --git a/docs/src/getting-started/override.rst b/docs/src/getting-started/override.rst
@@ -35,7 +35,7 @@ hyperparameters. The adjustments for ``num_epochs`` and ``cutoff`` look like thi
          num_epochs: 200
 
    training_set:
-   structures: "qm9_reduced_100.xyz"
+   systems: "qm9_reduced_100.xyz"
    targets:
       energy:
          key: "U0"

diff --git a/docs/src/getting-started/usage.rst b/docs/src/getting-started/usage.rst
@@ -34,7 +34,7 @@ The sub-command to start a model training is
     metatensor-models train
 
 To train a model you have to define your options. This includes the specific
-architecture you want to use and the data including the training structures and target
+architecture you want to use and the data including the training systems and target
 values
 
 The default model and training hyperparameter for each model are listed in their
@@ -67,7 +67,7 @@ The sub-command to evaluate an already trained model is
     metatensor-models eval
 
 Besides the trained `model`, you will also have to provide a file containing the
-structure and possible target values for evaluation. The structure of this ``eval.yaml``
+system and possible target values for evaluation. The system of this ``eval.yaml``
 is exactly the same as for a dataset in the ``options.yaml`` file.
 
 .. literalinclude:: ../../static/qm9/eval.yaml

diff --git a/docs/static/qm9/eval.yaml b/docs/static/qm9/eval.yaml
@@ -1,4 +1,4 @@
-structures: "qm9_reduced_100.xyz" # file where the positions are stored
+systems: "qm9_reduced_100.xyz" # file where the positions are stored
 targets:
   energy:
     key: "U0" # name of the target value
diff --git a/docs/static/qm9/options.yaml b/docs/static/qm9/options.yaml
@@ -2,10 +2,10 @@
 architecture:
   name: experimental.soap_bpnn
 
-# Mandatory section defining the parameters for structure and target data of the
+# Mandatory section defining the parameters for system and target data of the
 # training set
 training_set:
-  structures: "qm9_reduced_100.xyz" # file where the positions are stored
+  systems: "qm9_reduced_100.xyz" # file where the positions are stored
   targets:
     energy:
       key: "U0" # name of the target value

@@ -1,4 +1,4 @@
-structures: "alchemical_reduced_10.xyz" # file where the positions are stored
+systems: "alchemical_reduced_10.xyz" # file where the positions are stored
 targets:
   energy:
     key: "energy" # name of the target value

@@ -4,10 +4,10 @@ architecture:
   training:
     num_epochs: 10
 
-# Mandatory section defining the parameters for structure and target data of the
+# Mandatory section defining the parameters for system and target data of the
 # training set
 training_set:
-  structures: "alchemical_reduced_10.xyz" # file where the positions are stored
+  systems: "alchemical_reduced_10.xyz" # file where the positions are stored
   targets:
     energy:
       key: "energy" # name of the target value

diff --git a/examples/ase/options.yaml b/examples/ase/options.yaml
@@ -5,9 +5,9 @@ architecture:
     num_epochs: 100
     learning_rate: 0.01
 
-# Section defining the parameters for structure and target data
+# Section defining the parameters for system and target data
 training_set:
-  structures: "ethanol_reduced_100.xyz"
+  systems: "ethanol_reduced_100.xyz"
   targets:
     energy:
       key: "energy"

diff --git a/examples/ase/run_ase.py b/examples/ase/run_ase.py
@@ -4,7 +4,7 @@
 
 This tutorial demonstrates how to use an already trained and exported model to run an
 ASE simulation of a single ethanol molecule in vacuum. We use a model that was trained
-using the :ref:`architecture-soap-bpnn` architecture on 100 ethanol structures
+using the :ref:`architecture-soap-bpnn` architecture on 100 ethanol systems
 containing energies and forces. You can obtain the :download:`dataset file
 <ethanol_reduced_100.xyz>` used in this example from our website. The dataset is a
 subset of the `rMD17 dataset
@@ -148,8 +148,8 @@
 
 # %%
 #
-# Inspect the structures
-# ######################
+# Inspect the systems
+# ###################
 #
 # Even though the total energy is conserved, we also have to verify that the ethanol
 # molecule is stable and the bonds did not break.
@@ -165,7 +165,7 @@
 # As a final analysis we also calculate and plot the carbon-hydrogen radial distribution
 # function (RDF) from the trajectory and compare this to the RDF from the training set.
 #
-# To use the RDF code from ase we first have to define a unit cell for our structures.
+# To use the RDF code from ase we first have to define a unit cell for our systems.
 # We choose a cubic one with a side length of 10 Å.
 
 for atoms in training_frames:

diff --git a/examples/basic_usage/usage.sh b/examples/basic_usage/usage.sh
@@ -13,7 +13,7 @@ metatensor-models train --help
 metatensor-models eval model.pt eval.yaml
 
 # The evaluation command predicts those properties the model was trained against; here
-# "U0". The predictions together with the structures have been written in a file named
+# "U0". The predictions together with the systems have been written in a file named
 # ``output.xyz`` in the current directory. The written file starts with the following
 # lines
 

diff --git a/src/metatensor/models/cli/eval.py b/src/metatensor/models/cli/eval.py
@@ -9,7 +9,7 @@
 from omegaconf import DictConfig, OmegaConf
 
 from ..utils.compute_loss import compute_model_loss
-from ..utils.data import collate_fn, read_structures, read_targets, write_predictions
+from ..utils.data import collate_fn, read_systems, read_targets, write_predictions
 from ..utils.errors import ArchitectureError
 from ..utils.extract_targets import get_outputs_dict
 from ..utils.info import finalize_aggregated_info, update_aggregated_info
@@ -63,18 +63,18 @@ def _add_eval_model_parser(subparser: argparse._SubParsersAction) -> None:
 
 def _eval_targets(model, dataset: Union[_BaseDataset, torch.utils.data.Subset]) -> None:
     """Evaluate an exported model on a dataset and print the RMSEs for each target."""
-    # Attach neighbor lists to the structures:
+    # Attach neighbor lists to the systems:
     requested_neighbor_lists = model.requested_neighbors_lists()
     # working around https://github.com/lab-cosmo/metatensor/issues/521
     # Desired:
-    # for structure, _ in dataset:
-    #     attach_neighbor_lists(structure, requested_neighbors_lists)
+    # for system, _ in dataset:
+    #     attach_neighbor_lists(system, requested_neighbors_lists)
     # Current:
     dataloader = torch.utils.data.DataLoader(
         dataset, batch_size=1, collate_fn=collate_fn
     )
-    for (structure,), _ in dataloader:
-        get_system_with_neighbors_lists(structure, requested_neighbor_lists)
+    for (system,), _ in dataloader:
+        get_system_with_neighbors_lists(system, requested_neighbor_lists)
 
     # Extract all the possible outputs and their gradients from the dataset:
     outputs_dict = get_outputs_dict([dataset])
@@ -103,8 +103,8 @@ def _eval_targets(model, dataset: Union[_BaseDataset, torch.utils.data.Subset])
     # Compute the RMSEs:
     aggregated_info: Dict[str, Tuple[float, int]] = {}
     for batch in dataloader:
-        structures, targets = batch
-        _, info = compute_model_loss(loss_fn, model, structures, targets)
+        systems, targets = batch
+        _, info = compute_model_loss(loss_fn, model, systems, targets)
         aggregated_info = update_aggregated_info(aggregated_info, info)
     finalized_info = finalize_aggregated_info(aggregated_info)
 
@@ -182,45 +182,43 @@ def eval_model(
             file_index_suffix = f"_{i}"
         logger.info(f"Evaulate dataset{extra_log_message}")
 
-        eval_structures = read_structures(
-            filename=options["structures"]["read_from"],
-            fileformat=options["structures"]["file_format"],
+        eval_systems = read_systems(
+            filename=options["systems"]["read_from"],
+            fileformat=options["systems"]["file_format"],
         )
 
         # Predict targets
         if hasattr(options, "targets"):
             eval_targets = read_targets(options["targets"])
-            eval_dataset = Dataset(
-                structure=eval_structures, energy=eval_targets["energy"]
-            )
+            eval_dataset = Dataset(system=eval_systems, energy=eval_targets["energy"])
             _eval_targets(model, eval_dataset)
         else:
             # TODO: batch this
             # TODO: add forces/stresses/virials if requested
-            # Attach neighbors list to structures. This step is only required if no
+            # Attach neighbors list to systems. This step is only required if no
             # targets are present. Otherwise, the neighbors list have been already
             # attached in `_eval_targets`.
-            eval_structures = [
+            eval_systems = [
                 get_system_with_neighbors_lists(
-                    structure, model.requested_neighbors_lists()
+                    system, model.requested_neighbors_lists()
                 )
-                for structure in eval_structures
+                for system in eval_systems
             ]
 
-        # Predict structures
+        # Predict systems
         try:
             # `length_unit` is only required for unit conversions in MD engines and
             # superflous here.
             eval_options = ModelEvaluationOptions(
                 length_unit="", outputs=model.capabilities().outputs
             )
-            predictions = model(eval_structures, eval_options, check_consistency=True)
+            predictions = model(eval_systems, eval_options, check_consistency=True)
         except Exception as e:
             raise ArchitectureError(e)
 
         # TODO: adjust filename accordinglt
         write_predictions(
             filename=f"{output.stem}{file_index_suffix}{output.suffix}",
             predictions=predictions,
-            structures=eval_structures,
+            systems=eval_systems,
         )