Skip to content

Commit

Permalink
chapter 2 of tutorial (HorizonRobotics#945)
Browse files Browse the repository at this point in the history
* chapter 2 of tutorial

* proper cross referencing

* fix syntax error

* address comments
  • Loading branch information
hnyu authored and pd-perry committed Dec 11, 2021
1 parent c1cb3ea commit b6e0772
Show file tree
Hide file tree
Showing 11 changed files with 473 additions and 28 deletions.
6 changes: 3 additions & 3 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,11 @@
You can adapt this file completely to your liking, but it should at least
contain the root `toctree` directive.
Welcome to ALF's documentation!
===============================
ALF documentation
=================

.. toctree::
:maxdepth: 2
:maxdepth: 1

overview
tutorial
Expand Down
24 changes: 16 additions & 8 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@ Tutorial
ALF is designed with **modularization** in mind. Unlike most RL libraries or frameworks
which implement different algorithms by repeating the almost entire RL pipeline in
separate source code files with little code reuse, ALF categorizes RL algorithms
and distills the common structure and logic within each caterogy, so
and distills the common structure and logic within each category, so
that each algorithm only needs to implement or override its own exclusive logic.

Usually to create an ALF job, a user is expected to:
Expand Down Expand Up @@ -32,7 +32,7 @@ provides at least two benefits:
ensures the remaining part of the pipeline is unaffected.

2. *Reusing ALF's carefully designed training pipeline which contains a ton
of crtical details and tricks that help an algorithm's training.* For example,
of critical details and tricks that help an algorithm's training.* For example,

* Careful handling of environment step types and their discounts,
* Temporally independent training of a rollout trajectory if no episodic memory
Expand All @@ -42,23 +42,31 @@ provides at least two benefits:
* Automatically applying various input data transformers during rollout
and training,
* Specifying different optimizers for different sub-algorithms,
* Exploiting a variety of tensorboard summary utils,
* Exploiting a variety of Tensorboard summary utils,
* and many more...

Below are a series of examples for writing training files using ALF,
from simple to advanced usage. Each section is a detailed, step-by-step guide
walking through key ALF cencepts. All the tutorial code files can
from simple to advanced usage. Each chapter is a detailed, step-by-step guide
walking through key ALF concepts. All the tutorial code files can
be found under ``<ALF_ROOT>/alf/examples/tutorial``.

.. note::

This tutorial won't cover the technical details of different algorithms and
models, as we assume the user learns them from other resources, e.g., the
original papers. We only focus on how to use ALF as a tool to write them.

..
The following section schedule might evolve as the tutorial proceeds
The following chapter schedule might evolve as the tutorial proceeds
.. toctree::
:maxdepth: 3

tutorial/a_minimal_working_example
tutorial/understanding_ALF_via_the_minimal_working_example
tutorial/configuring_existing_algorithms
tutorial/customize_environment_and_wrappers
tutorial/algorithm_interfaces
tutorial/summary_metrics_and_tensorboard
tutorial/customize_environments_and_wrappers
tutorial/customize_algorithms
tutorial/customize_training_pipeline
tutorial/advanced_play_and_alf_snapshot
30 changes: 14 additions & 16 deletions docs/tutorial/a_minimal_working_example.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,19 +2,16 @@ A minimal working example
=========================

We start with a minimal working example of ALF. The example, as a pure ALF
configuration file, is located at ``<ALF_ROOT>/alf/examples/tutorial/minimal_example_conf.py``,
configuration file, is :mod:`alf.examples.tutorial.minimal_example_conf`,
and consists of only 8 lines.

Train and play
--------------

Let's ignore its content for a moment (see the next section
Let's ignore its content for a moment (see the next chapter
:doc:`./understanding_ALF_via_the_minimal_working_example` for an explanation of
the configuration content), and just focus on how to launch the training,
interpret the output training messages, and evaluate a trained model.

Train from scratch
^^^^^^^^^^^^^^^^^^
------------------

We can train from scratch by

Expand All @@ -31,7 +28,7 @@ assuming ``/tmp/alf_tutorial1`` doesn't exist or is empty.
output log, etc) are stored.

The training will finish in several seconds, but with some informative messages
shown in the terminal. First of all, you should see a message from ``checkpoint_utils.py``
shown in the terminal. First of all, you should see a message from :mod:`.checkpoint_utils`
like

::
Expand All @@ -40,7 +37,7 @@ like
from scratch

which basically confirms that the training is from scratch and all algorithm parameters
and states are randomly initialized. Also ``policy_trainer.py`` will output
and states are randomly initialized. Also :mod:`.policy_trainer` will output
message lines like

::
Expand All @@ -63,7 +60,7 @@ as the training finishes. Here we have the checkpoint numbered by the training
iteration, which is '1' because only one iteration is performed by this example.

Train from a checkpoint
^^^^^^^^^^^^^^^^^^^^^^^
-----------------------

By launching the same command again, this time the checkpoint messages are different.
First it should say
Expand All @@ -75,7 +72,7 @@ First it should say
which means the training is no longer from scratch, but instead reads the saved
checkpoint from the last run. By default ALF reads the most recent checkpoint in
a training root dir if multiple checkpoints exist. Also at the end of training,
``checkpoint_utils.py`` outputs:
:mod:`.checkpoint_utils` outputs:

::

Expand All @@ -93,16 +90,17 @@ While the training is ongoing, we can monitor the real-time progress by
tensorboard --logdir /tmp/alf_tutorial1
We leave the interpretation of various Tensorboard statistics to later sections.
We leave the interpretation of various Tensorboard statistics to a later chapter
:doc:`./summary_metrics_and_tensorboard`.

Play from a checkpoint
^^^^^^^^^^^^^^^^^^^^^^
----------------------

ALF defines the term *play* as evaluating a model on a task and possibly also visualizing
the evaluation process, for example, by rendering environment frames or various
inference statistics.

Here we only introduce three basic usages of the ALF ``play`` module. For advanced
Here we only introduce three basic usages of the ALF :mod:`.play` module. For advanced
play (e.g., rendering customized model inference results, play from an ALF snapshot,
headless rendering, etc), we refer the reader to :doc:`./advanced_play_and_alf_snapshot`.

Expand All @@ -125,15 +123,15 @@ Or you can save the rendered result to a ``mp4`` video file:
python -m alf.bin.play --root_dir /tmp/alf_tutorial1 --record_file /tmp/alf_tutorial1.mp4
We recommend the reader to read the various commandline flags in ``<ALF_ROOT>/alf/bin/play.py``,
We recommend the reader to read the various commandline flags in :mod:`.play`,
for specifying different options such as checkpoint number and number of episodes to
evaluate.

Summary
-------

So far, we've talked about how to train a conf file and play the trained model,
with very basic options of ``train.py`` and ``play.py``. This covers a usual
with very basic options of :mod:`.train` and :mod:`.play.py`. This covers a usual
command-line usage of ALF. We really haven't explained the content of the
example and the ALF pipeline yet. In the next section, we will try to get a
example and the ALF pipeline yet. In the next chapter, we will try to get a
rough picture of ALF through the lens of this minimal working example.
2 changes: 2 additions & 0 deletions docs/tutorial/algorithm_interfaces.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Algorithm interfaces
====================
2 changes: 2 additions & 0 deletions docs/tutorial/customize_algorithms.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Customize algorithms
====================
2 changes: 2 additions & 0 deletions docs/tutorial/customize_environments_and_wrappers.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Customize environments and wrappers
===================================
2 changes: 2 additions & 0 deletions docs/tutorial/customize_training_pipeline.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Customize a training pipeline
=============================
Binary file added docs/tutorial/images/alf_diagram.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Binary file added docs/tutorial/images/pipeline.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
2 changes: 2 additions & 0 deletions docs/tutorial/summary_metrics_and_tensorboard.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
Summary, metrics, and Tensorboard
=================================
Loading

0 comments on commit b6e0772

Please sign in to comment.