Skip to content

Latest commit



267 lines (166 loc) Β· 16 KB

File metadata and controls

267 lines (166 loc) Β· 16 KB

Awesome Reinforcement Learning

Click here to see icon descriptions.
  • πŸš€ - state-of-the-art agent/technique at the moment of paper publication.
  • ⭐ - valuable paper.
  • model-based - Model-based RL.
  • multi-agent-rl - Multi-Agent RL.
  • self-play - Self-Play.
  • evolution - Evolutionary & Genetic Algorithms.
  • generalization - Generalization on unseen environments.
  • auto-ml - Auto ML - Architecture search.
  • manipulation - Manipulation tasks.
  • locomotion - Locomotion: MuJoCo, Roboschool, etc.
  • navigation - Navigation tasks.
  • plan - Strategy Planning Problems.
  • transfer - Transfer learning.
  • inverse-rl - Inverse Reinforcement Learning.
  • meta-learning - Meta-Learning.
  • exploration - Curiosity Learning, Advanced Exploration.
  • table - Table games (Table).
  • atari - Atari game (Atari).
  • doom - Doom game (Doom).
  • sc - Starcraft game (Starcraft).
  • go - Go game (Go).

Table of Contents

RL Frameworks & Implementations

[Stable Baselines3] PyTorch: MaskablePPO, PPO, A2C, DQN, etc

[Baselines @ OpenAI] TensorFlow: PPO, A2C, DQN, TRPO, ACKTR, DDPG, HER, GAIL, etc

[Baselines @ DLR-RM] Pytorch: Custom envs, custom policies

[RLlib @ Ray Pytorch / TensorFlow]

[Dopamine @ Google] TensorFlow: Rainbow, c51, IQN, DQN, etc

[TensorForce] TensorFlow: A3C, PPO, TRPO, DQN, etc

[pytorch-a2c-ppo-acktr] PyTorch: A2C, ACKTR, PPO, GAIL, etc

RL Benchmarks

[OpenAI Benchmarks for PPO, A2C, ACKTR, ACER]

[OpenAI Benchmarks for DQN, Double DQN, Dueling DQN, Prioritized DQN]

[Google Benchmarks for Rainbow, c51, IQN, DQN]

Policy-Based Generic Agents

πŸš€ [Soft Actor Critic] [blog] [code] 2018 @ Google Brain, UC Berkeley

πŸš€ [IMPALA] 2018 @ Uber AI Labs

πŸš€ [Scalable trust-region method for deep reinforcement learning using Kronecker-factored approximation (ACKTR, A2C)] 2018; Univ. of Toronto, New York Univ.

πŸš€ [Proximal Policy Optimization Algorithms (PPO)] [blog] 2017 @ OpenAI

πŸš€ πŸ“ Notes [Asynchronous Methods for Deep Reinforcement Learning (A3C)] 2016 @ Google Deepmind

[High-dimensional continuous control using generalized advantage estimation (GAE)] 2015 @ Berkeley

⭐ [Trust Region Policy Optimization (TRPO)] 2015 @ UC Berkeley

⭐ [Actor-Critic Algorithms, pdf] Konda and Tsitsiklis, 2003

⭐ [Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning (REINFORCE), pdf] Ronald J. Williams, 1992 @ Northeastern Univ.

Value-Based Generic Agents

πŸš€ [Implicit Quantile Networks for Distributional Reinforcement Learning (IQN)] Dabney et al., 2018 @ Google Deepmind

πŸš€ [A Distributional Perspective on Reinforcement Learning (c51)] Bellemare et al., 2018 @ Google Deepmind

πŸš€ [Rainbow: Combining Improvements in Deep Reinforcement Learning] Hessel et al., 2017 @ Google Deepmind

πŸš€ [Dueling Network Architectures for Deep Reinforcement Learning (Dueling DQN)] Wang et al., 2015 @ Google Deepmind

πŸš€ πŸ“ Notes [Prioritized Experience Replay] Schaul et al., 2015 @ Google Deepmind

πŸš€ [Deep Reinforcement Learning with Double Q-learning (Double DQN)] Hasselt et al., 2015 @ Google Deepmind

πŸš€ πŸ“ Notes [Human-level control through deep reinforcement learning (DQN)] [pdf] Mnih et al., 2015 @ Google Deepmind

πŸš€ [Playing Atari with Deep Reinforcement Learning** (DQN)] Mnih et al., 2013 @ DeepMind Technologies

⭐ [Temporal Difference Learning and TD-Gammon, pdf] Gerald Tesauro, 1995

model-based Model-Based Generic Agents

[Model-Based Reinforcement Learning for Atari] 2019 @ Google Brain, etc

⭐ navigation [World Models] [blog] 2018 @ IDSIA, Google Brain, NNAISENSE

locomotion [Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning] [blog] [code] 2017 @ Berkeley

locomotion [Learning model-based planning from scratch], [blog] 2017 @ Google DeepMind

navigation [The Predictron: End-To-End Learning and Planning] 2016 @ Google Deepmind

evolution Evolutionary Algorithms

[Back to Basics: Benchmarking Canonical Evolution Strategies for Playing Atari] 2018 @ Univ. of Freiburg

⭐ locomotion [Deep Neuroevolution] 2017 @ Uber AI Labs

⭐ [Evolution Strategies as a Scalable Alternative to Reinforcement Learning] 2017 @ OpenAI

[Evolving Large-Scale Neural Networks for Vision-Based Reinforcement Learning, pdf] 2013 @ IDSIA, USI-SUPSI

exploration Exploration

πŸš€ [Go-Explore] 2019 @ Uber AI Labs

[Exploration by Random Network Distillation (RND)] [blog] [code] 2018 @ OpenAI

navigation [Large-Scale Study of Curiosity-Driven Learning] [blog] 2018 @ OpenAI, Berkeley, Univ. of Edinburgh

⭐ [RUDDER: Return Decomposition for Delayed Rewards] [code] 2018 @ Johannes Kepler Univ. Linz

[Deep Curiosity Search] 2018 @ Univ. of Wyoming

locomotion [Parameter Space Noise for Exploration] 2017 @ OpenAI, Karlsruhe Inst. of Tech.

⭐ transfer [Imagination-Augmented Agents for Deep Reinforcement Learning (I2As)] [blog] 2017 @ DeepMind

self-play Self-Play

⭐ table [Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm] Silver et al., 2017 @ Google Deepmind

⭐ table [Mastering the Game of Go without Human Knowledge (AlphaGo Zero), pdf], [blog] Silver et al., 2017 @ Deepmind

table [Mastering the game of Go with deep neural networks and tree search (AlphaGo Master)], [reddit] Silver et al., 2017 @ Deepmind, Google

meta-learning Meta-Learning

locomotion [Meta Learning Shared Hierarchies] [blog] Frans et al., 2017 @ OpenAI, Berkeley.

[Hybrid Reward Architecture for Reinforcement Learning (HRA)] van Seijen et al., 2017 @ Microsoft Maluuba, McGill Univ.

multi-agent-rl Multi-Agent RL

[Learning with Opponent-Learning Awareness (LOLA)] [blog] Foerster et al., 2017 @ OpenAI, Oxford, Berkeley, CMU

inverse-rl Inverse RL

manipulation [SFV: Reinforcement Learning of Physical Skills from Videos] [blog] Peng et al., 2018; Berkeley

manipulation [One-Shot Imitation from Observing Humans via Domain-Adaptive Meta-Learning] Finn et al., 2018 @ UC Berkeley

manipulation [One-Shot Visual Imitation Learning via Meta-Learning] Finn et al., 2017 @ UC Berkeley, OpenAI

navigation Navigation

[Learning to Navigate in Cities Without a Map] Mirowski et al, 2019 @ Deepmind

[Human-level performance in first-person multiplayer games with population-based deep reinforcement learning] [blog] Jaderberg et al, 2018 @ DeepMind

generalization [Building Generalizable Agents with a Realistic and Rich 3D Environment] Wu et al, 2018 @ Berkeley, FAIR

πŸš€ [Learning to Navigate in Complex Environments] Mirowski et al., 2017 @ Deepmind

transfer Distral: Robust Multitask Reinforcement Learning] Teh et al, 2017 @ Deepmind

meta-learning [RL2: Fast Reinforcement Learning via Slow Reinforcement Learning] Duan et al., 2016 @ Berkeley, OpenAI

⭐ πŸ“ Notes locomotion [Reinforcement Learning with unsupervised auxiliary tasks (UNREAL)] Jaderberg et al., 2016 @ Google DeepMind

πŸš€ [Learning to act by predicting the future (VizDoom 2016 Full DM Winner)] Dosovitskiy, Koltun, 2016 @ Intel Labs

[Playing FPS Games with Deep Reinforcement Learning (VizDoom 2016 Limited DM 2nd place)] Lample, Chaplot, 2016 @ CMU

manipulation Manipulation

generalization [Learning Dexterous In-Hand Manipulation] [blog] Andrychowicz et al., 2018 @ OpenAI

generalization [Asymmetric Actor Critic for Image-Based Robot Learning] [blog] Pinto et al., 2017 @ OpenAI, CMU

generalization [Sim-to-Real Transfer of Robotic Control with Dynamics Randomization], [blog] Peng et al., 2017 @ OpenAI, Berkeley

locomotion Locomotion

[Emergence of Locomotion Behaviours in Rich Environments] [blog] Heess et al., 2017 @ DeepMind

[Programmable Agents] Denil et al., 2017 @ Google Deepmind

auto-ml Auto ML

[AutoAugment: Learning Augmentation Policies from Data] Cubuk et al., 2018 @ Google Brain

⭐ evolution [Regularized Evolution for Image Classifier Architecture Search] Real et al., 2018 @ Google Brain

⭐ [Learning Transferable Architectures for Scalable Image Recognition] Zoph et al., 2017 @ Google Brain

[Neural Optimizer Search with Reinforcement Learning, pdf] Bello et al., 2017 @ Google Brain

[Neural Architecture Search with Reinforcement Learning] B. Zoph and Quoc V. Le, 2016 @ Google Brain

Other Domains

[A Deep Reinforcement Learning Chatbot] Serban et al., 2017 @ MILA


⭐ [Reinforcement Learning: An Introduction, pdf] Richard S. Sutton and Andrew G. Barto, 2018

Search for new Papers

[A Brief Survey of Deep Reinforcement Learning] Arulkumaran et al., 2017

Another Awesome Deep RL list:

Awesome Offline RL:

ArXiv Sanity Preserver:



[How to Read a Paper] S. Keshav, 2007 @ Univ. of Waterloo

[Transfromers: Attention is all you need] Vaswani et al. 2017 @ Google Brain/Research