chapter 2 of tutorial (HorizonRobotics#945)

* chapter 2 of tutorial * proper cross referencing * fix syntax error * address comments
pd-perry · Dec 11, 2021 · b6e0772 · b6e0772
1 parent c1cb3ea
commit b6e0772
Show file tree

Hide file tree

Showing 11 changed files with 473 additions and 28 deletions.
diff --git a/docs/index.rst b/docs/index.rst
@@ -3,11 +3,11 @@
    You can adapt this file completely to your liking, but it should at least
    contain the root `toctree` directive.
 
-Welcome to ALF's documentation!
-===============================
+ALF documentation
+=================
 
 .. toctree::
-   :maxdepth: 2
+   :maxdepth: 1
 
    overview
    tutorial

diff --git a/docs/tutorial.rst b/docs/tutorial.rst
@@ -4,7 +4,7 @@ Tutorial
 ALF is designed with **modularization** in mind. Unlike most RL libraries or frameworks
 which implement different algorithms by repeating the almost entire RL pipeline in
 separate source code files with little code reuse, ALF categorizes RL algorithms
-and distills the common structure and logic within each caterogy, so
+and distills the common structure and logic within each category, so
 that each algorithm only needs to implement or override its own exclusive logic.
 
 Usually to create an ALF job, a user is expected to:
@@ -32,7 +32,7 @@ provides at least two benefits:
      ensures the remaining part of the pipeline is unaffected.
 
 2. *Reusing ALF's carefully designed training pipeline which contains a ton
-   of crtical details and tricks that help an algorithm's training.* For example,
+   of critical details and tricks that help an algorithm's training.* For example,
 
    * Careful handling of environment step types and their discounts,
    * Temporally independent training of a rollout trajectory if no episodic memory
@@ -42,23 +42,31 @@ provides at least two benefits:
    * Automatically applying various input data transformers during rollout
      and training,
    * Specifying different optimizers for different sub-algorithms,
-   * Exploiting a variety of tensorboard summary utils,
+   * Exploiting a variety of Tensorboard summary utils,
    * and many more...
 
 Below are a series of examples for writing training files using ALF,
-from simple to advanced usage. Each section is a detailed, step-by-step guide
-walking through key ALF cencepts. All the tutorial code files can
+from simple to advanced usage. Each chapter is a detailed, step-by-step guide
+walking through key ALF concepts. All the tutorial code files can
 be found under ``<ALF_ROOT>/alf/examples/tutorial``.
 
+.. note::
+
+    This tutorial won't cover the technical details of different algorithms and
+    models, as we assume the user learns them from other resources, e.g., the
+    original papers. We only focus on how to use ALF as a tool to write them.
+
 ..
-    The following section schedule might evolve as the tutorial proceeds
+    The following chapter schedule might evolve as the tutorial proceeds
 
 .. toctree::
     :maxdepth: 3
 
     tutorial/a_minimal_working_example
     tutorial/understanding_ALF_via_the_minimal_working_example
-    tutorial/configuring_existing_algorithms
-    tutorial/customize_environment_and_wrappers
+    tutorial/algorithm_interfaces
+    tutorial/summary_metrics_and_tensorboard
+    tutorial/customize_environments_and_wrappers
     tutorial/customize_algorithms
+    tutorial/customize_training_pipeline
     tutorial/advanced_play_and_alf_snapshot
diff --git a/docs/tutorial/a_minimal_working_example.rst b/docs/tutorial/a_minimal_working_example.rst
@@ -2,19 +2,16 @@ A minimal working example
 =========================
 
 We start with a minimal working example of ALF. The example, as a pure ALF
-configuration file, is located at ``<ALF_ROOT>/alf/examples/tutorial/minimal_example_conf.py``,
+configuration file, is :mod:`alf.examples.tutorial.minimal_example_conf`,
 and consists of only 8 lines.
 
-Train and play
---------------
-
-Let's ignore its content for a moment (see the next section
+Let's ignore its content for a moment (see the next chapter
 :doc:`./understanding_ALF_via_the_minimal_working_example` for an explanation of
 the configuration content), and just focus on how to launch the training,
 interpret the output training messages, and evaluate a trained model.
 
 Train from scratch
-^^^^^^^^^^^^^^^^^^
+------------------
 
 We can train from scratch by
 
@@ -31,7 +28,7 @@ assuming ``/tmp/alf_tutorial1`` doesn't exist or is empty.
     output log, etc) are stored.
 
 The training will finish in several seconds, but with some informative messages
-shown in the terminal. First of all, you should see a message from ``checkpoint_utils.py``
+shown in the terminal. First of all, you should see a message from :mod:`.checkpoint_utils`
 like
 
 ::
@@ -40,7 +37,7 @@ like
     from scratch
 
 which basically confirms that the training is from scratch and all algorithm parameters
-and states are randomly initialized. Also ``policy_trainer.py`` will output
+and states are randomly initialized. Also :mod:`.policy_trainer` will output
 message lines like
 
 ::
@@ -63,7 +60,7 @@ as the training finishes. Here we have the checkpoint numbered by the training
 iteration, which is '1' because only one iteration is performed by this example.
 
 Train from a checkpoint
-^^^^^^^^^^^^^^^^^^^^^^^
+-----------------------
 
 By launching the same command again, this time the checkpoint messages are different.
 First it should say
@@ -75,7 +72,7 @@ First it should say
 which means the training is no longer from scratch, but instead reads the saved
 checkpoint from the last run. By default ALF reads the most recent checkpoint in
 a training root dir if multiple checkpoints exist. Also at the end of training,
-``checkpoint_utils.py`` outputs:
+:mod:`.checkpoint_utils` outputs:
 
 ::
 
@@ -93,16 +90,17 @@ While the training is ongoing, we can monitor the real-time progress by
 
     tensorboard --logdir /tmp/alf_tutorial1
 
-We leave the interpretation of various Tensorboard statistics to later sections.
+We leave the interpretation of various Tensorboard statistics to a later chapter
+:doc:`./summary_metrics_and_tensorboard`.
 
 Play from a checkpoint
-^^^^^^^^^^^^^^^^^^^^^^
+----------------------
 
 ALF defines the term *play* as evaluating a model on a task and possibly also visualizing
 the evaluation process, for example, by rendering environment frames or various
 inference statistics.
 
-Here we only introduce three basic usages of the ALF ``play`` module. For advanced
+Here we only introduce three basic usages of the ALF :mod:`.play` module. For advanced
 play (e.g., rendering customized model inference results, play from an ALF snapshot,
 headless rendering, etc), we refer the reader to :doc:`./advanced_play_and_alf_snapshot`.
 
@@ -125,15 +123,15 @@ Or you can save the rendered result to a ``mp4`` video file:
 
     python -m alf.bin.play --root_dir /tmp/alf_tutorial1 --record_file /tmp/alf_tutorial1.mp4
 
-We recommend the reader to read the various commandline flags in ``<ALF_ROOT>/alf/bin/play.py``,
+We recommend the reader to read the various commandline flags in :mod:`.play`,
 for specifying different options such as checkpoint number and number of episodes to
 evaluate.
 
 Summary
 -------
 
 So far, we've talked about how to train a conf file and play the trained model,
-with very basic options of ``train.py`` and ``play.py``. This covers a usual
+with very basic options of :mod:`.train` and :mod:`.play.py`. This covers a usual
 command-line usage of ALF. We really haven't explained the content of the
-example and the ALF pipeline yet. In the next section, we will try to get a
+example and the ALF pipeline yet. In the next chapter, we will try to get a
 rough picture of ALF through the lens of this minimal working example.
diff --git a/docs/tutorial/algorithm_interfaces.rst b/docs/tutorial/algorithm_interfaces.rst
@@ -0,0 +1,2 @@
+Algorithm interfaces
+====================
diff --git a/docs/tutorial/customize_algorithms.rst b/docs/tutorial/customize_algorithms.rst
@@ -0,0 +1,2 @@
+Customize algorithms
+====================
diff --git a/docs/tutorial/customize_environments_and_wrappers.rst b/docs/tutorial/customize_environments_and_wrappers.rst
@@ -0,0 +1,2 @@
+Customize environments and wrappers
+===================================
diff --git a/docs/tutorial/customize_training_pipeline.rst b/docs/tutorial/customize_training_pipeline.rst
@@ -0,0 +1,2 @@
+Customize a training pipeline
+=============================
diff --git a/docs/tutorial/images/alf_diagram.png b/docs/tutorial/images/alf_diagram.png
diff --git a/docs/tutorial/images/pipeline.png b/docs/tutorial/images/pipeline.png
diff --git a/docs/tutorial/summary_metrics_and_tensorboard.rst b/docs/tutorial/summary_metrics_and_tensorboard.rst
@@ -0,0 +1,2 @@
+Summary, metrics, and Tensorboard
+=================================
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Customize environments and wrappers
		===================================
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Customize a training pipeline
		=============================
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		Summary, metrics, and Tensorboard
		=================================