Skip to content

Deep reinforcement learning algorithms implemented in Pytorch.

License

Notifications You must be signed in to change notification settings

lucaslingle/pytorch_drl

Repository files navigation

pytorch_drl

CircleCI Coverage Status

Implementation of Deep Reinforcement Learning algorithms in PyTorch, with support for distributed data collection and data-parallel training.

(In progress...)

This repo contains flexible implementations of several deep reinforcement learning algorithms. It supports the algorithms, architectures, and rewards from several papers, including:

Additionally, support is planned for:

Getting Started

Install the following system dependencies:

Ubuntu

sudo apt-get update
sudo apt-get install -y libglu1-mesa-dev libgl1-mesa-dev libosmesa6-dev xvfb ffmpeg curl patchelf libglfw3 libglfw3-dev cmake libjpeg-dev zlib1g zlib1g-dev swig python3-dev

Mac OS X

Installation of the system packages on Mac requires Homebrew. With Homebrew installed, run the following:

brew update
brew install cmake

Everyone

We recommend creating a conda environment for this project. You can download the miniconda package manager from https://docs.conda.io/en/latest/miniconda.html. Then you can set up the new conda environment as follows:

conda create --name pytorch_drl python=3.9.2
conda activate pytorch_drl
git clone https://github.com/lucaslingle/pytorch_drl
cd pytorch_drl
pip install -e .

Usage

Structure

This repo comes in two parts: a python package and a script.

Package

Support for readthedocs integration will be added in the future.
To generate the documentation as HTML locally, you can follow the instructions here.

Script

To use the script correctly, you can refer to the script usage docs.

Config

A formal description of how the configuration files are structured can be found in the config usage docs.
Some example config files for different algorithms are also provided in the subdirectories of models_dir.

Reproducing Papers

In this section, we describe the algorithms whose published results we've replicated using our codebase.

Proximal Policy Optimization Algorithms

Game OpenAI Baselines Schulman et al., 2017 Ours
Beamrider 1299.3 1590.0 3406.5
Breakout 114.3 274.8 424.4
Enduro 350.2 758.3 749.4
Pong 13.7 20.7 19.8
Qbert 7012.1 14293.3 16600.8
Seaquest 1218.9 1204.5 947.3
Space Invaders 557.3 942.5 1151.9
  • For computational efficiency, we tested only the seven Atari games first examined by Mnih et al., 2013.
  • For consistency with Schulman et al., 2017, each of our results above is the mean performance over the last 100 real episodes of training, averaged over three random seeds.
  • The OpenAI baselines results were obtained here.
  • As can be seen above, our implementation closely reproduces the results of the paper.