Add CleanRL mutli-agent Atari example (#1033)

Farama-Foundation · Jul 20, 2023 · 6a20a32 · 6a20a32
1 parent 6a04989
commit 6a20a32
Show file tree

Hide file tree

Showing 7 changed files with 411 additions and 18 deletions.
diff --git a/.github/workflows/linux-tutorials-test.yml b/.github/workflows/linux-tutorials-test.yml
@@ -33,5 +33,6 @@ jobs:
           cd tutorials/${{ matrix.tutorial }}
           pip install -r requirements.txt
           pip uninstall -y pettingzoo
-          pip install -e $root_dir
+          pip install -e $root_dir[testing]
+          AutoROM -v
           for f in *.py; do xvfb-run -a -s "-screen 0 1024x768x24" python "$f"; done
diff --git a/docs/tutorials/cleanrl/advanced_PPO.md b/docs/tutorials/cleanrl/advanced_PPO.md
@@ -0,0 +1,26 @@
+---
+title: "CleanRL: Advanced PPO"
+---
+
+# CleanRL: Advanced PPO
+
+This tutorial shows how to train [PPO](https://docs.cleanrl.dev/rl-algorithms/ppo/) agents on [Atari](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environments ([Parallel](https://pettingzoo.farama.org/api/parallel/)).
+This is a full training script including CLI, logging and integration with [TensorBoard](https://www.tensorflow.org/tensorboard) and [WandB](https://wandb.ai/) for experiment tracking.
+
+This tutorial is mirrored from [CleanRL](https://github.com/vwxyzjn/cleanrl)'s examples. Full documentation and experiment results can be found at [https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy)
+
+## Environment Setup
+To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/CleanRL/requirements.txt
+   :language: text
+```
+
+Then, install ROMs using [AutoROM](https://github.com/Farama-Foundation/AutoROM), or specify the path to your Atari rom using the `rom_path` argument (see [Common Parameters](/environments/atari/#common-parameters)).
+
+## Code
+The following code should run without any issues. The comments are designed to help you understand how to use PettingZoo with CleanRL. If you have any questions, please feel free to ask in the [Discord server](https://discord.gg/nhvKkYa6qX), or create an issue on [CleanRL's GitHub](https://github.com/vwxyzjn/cleanrl/issues).
+```{eval-rst}
+.. literalinclude:: ../../../tutorials/CleanRL/cleanrl_advanced.py
+   :language: python
+```
diff --git a/docs/tutorials/cleanrl/implementing_PPO.md b/docs/tutorials/cleanrl/implementing_PPO.md
@@ -4,7 +4,7 @@ title: "CleanRL: Implementing PPO"
 
 # CleanRL: Implementing PPO
 
-This tutorial shows how to train a [PPO](https://docs.cleanrl.dev/rl-algorithms/ppo/) agennt on the [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment ([Parallel](https://pettingzoo.farama.org/api/parallel/)).
+This tutorial shows how to train [PPO](https://docs.cleanrl.dev/rl-algorithms/ppo/) agents on the [Pistonball](https://pettingzoo.farama.org/environments/butterfly/pistonball/) environment ([Parallel](https://pettingzoo.farama.org/api/parallel/)).
 
 ## Environment Setup
 To follow this tutorial, you will need to install the dependencies shown below. It is recommended to use a newly-created virtual environment to avoid dependency conflicts.

diff --git a/docs/tutorials/cleanrl/index.md b/docs/tutorials/cleanrl/index.md
@@ -6,7 +6,9 @@ title: "CleanRL"
 
 This tutorial shows how to use [CleanRL](https://github.com/vwxyzjn/cleanrl) to implement a training algorithm from scratch and train it on the Pistonball environment.
 
-* [Implementing PPO](/tutorials/cleanrl/implementing_PPO.md): _Implement and train an agent using PPO_
+* [Implementing PPO](/tutorials/cleanrl/implementing_PPO.md): _Train an agent using a simple PPO implementation_
+
+* [Advanced PPO](/tutorials/cleanrl/advanced_PPO.md): _CleanRL's official PPO example, with CLI, TensorBoard and WandB integration_
 
 
 ## CleanRL Overview
@@ -16,14 +18,14 @@ This tutorial shows how to use [CleanRL](https://github.com/vwxyzjn/cleanrl) to
 
 See the [documentation](https://docs.cleanrl.dev/) for more information.
 
-## Official examples using PettingZoo:
+## Examples using PettingZoo:
 
 * [PPO PettingZoo Atari example](https://docs.cleanrl.dev/rl-algorithms/ppo/#ppo_pettingzoo_ma_ataripy)
 
 
 ## WandB Integration
 
-A key feature is its tight integration with [Weights & Biases](https://wandb.ai/) (WandB): for experiment tracking, hyperparameter tuning, and benchmarking.
+A key feature is CleanRL's tight integration with [Weights & Biases](https://wandb.ai/) (WandB): for experiment tracking, hyperparameter tuning, and benchmarking.
 The [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark) allows users to view public leaderboards for many tasks, including videos of agents' performance across training timesteps.
 
 
@@ -38,4 +40,5 @@ The [Open RL Benchmark](https://github.com/openrlbenchmark/openrlbenchmark) allo
 :caption: CleanRL
 
 implementing_PPO
+advanced_PPO
 ```
diff --git a/docs/tutorials/sb3/index.md b/docs/tutorials/sb3/index.md
@@ -34,25 +34,13 @@ For non-visual environments, we use [MLP](https://stable-baselines3.readthedocs.
 ```
 
 
-
-
 ## Stable-Baselines Overview
 
 [Stable-Baselines3](https://stable-baselines3.readthedocs.io/en/master/) (SB3) is a library providing reliable implementations of reinforcement learning algorithms in [PyTorch](https://pytorch.org/). It provides a clean and simple interface, giving you access to off-the-shelf state-of-the-art model-free RL algorithms. It allows training of RL agents with only a few lines of code.
 
 For more information, see the [Stable-Baselines3 v1.0 Blog Post](https://araffin.github.io/post/sb3/)
 
 
-[//]: # (```{eval-rst})
-
-[//]: # (.. warning::)
-
-[//]: # ()
-[//]: # (    Note: SB3 is designed for single-agent RL and does not plan on natively supporting multi-agent PettingZoo environments. These tutorials are only intended for demonstration purposes, to show how SB3 can be adapted to work in multi-agent settings.)
-
-[//]: # (```)
-
-
 ```{figure} https://raw.githubusercontent.com/DLR-RM/stable-baselines3/master/docs/_static/img/logo.png
     :alt: SB3 Logo
     :width: 80%