List of papers in RL that are worth reading. This is a brief, and curated list from already existing list by Spinning-Up to reduce their work to (in my opinion) most important top-50 papers. Each week, I will be reading at least one of these papers, and share my insights on the same. I will also attempt to demonstrate that algorithm/research, depending on the nature of the work. This will help me breakdown complex topics into smaller bites for easier understanding of self, as well as the readers of these blog-ish tutorials.
-
Playing Atari with Deep Reinforcement Learning
Mnih et al., 2013.
Algorithm: DQN.Challenge? The paper tackles the long-standing challenge of enabling agents to learn control policies directly from high-dimensional sensory inputs (here, vision), using RL. Previous RL relied heavily on hand-crafted features, limiting its applicability to complex real-world scenarios.
Proposed Approach? A system using a CNN trained with a variant of Q-learning. The network takes raw pixel data as input and outputs a value function estimating future rewards.
Key results? Outperformed previous methods on six out of the seven games. It also surpassed human expert performance on three of them.
(BONUS) DeepSeekMath : <TODO-Add content here>
-
Deep Recurrent Q-Learning for Partially Observable MDPs
Hausknecht and Stone, 2015.
Algorithm: Deep Recurrent Q-Learning.Challenge? Standard Deep Q-Networks (DQNs) struggle with Partially Observable Markov Decision Processes (POMDPs) because they rely on complete state information (the game screen) at each decision point. DQNs use a limited history (typically 4 frames) and cannot remember events further in the past, making them unsuitable for games where long-term memory is crucial.
Proposed Approach? introduce Deep Recurrent Q-Network (DRQN) which incorporates an LSTM (Long Short-Term Memory) layer to integrate information over time and address partial observability.
Key results? DRQN handles noisy observations in POMDPs better by using an LSTM to integrate information through time. DRQN can perform well even when receiving only a single frame at each timestep. It provides a strong alternative to using frame stacking in regular DQN networks. While DRQN shows better generalization between MDPs and POMDPs, it does not show a systematic performance improvement versus regular DQNs.
-
Dueling Network Architectures for Deep Reinforcement Learning
Wang et al., 2015.
Algorithm: Dueling DQN. -
Deep Reinforcement Learning with Double Q-learning
Van Hasselt et al., 2015.
Algorithm: Double DQN. -
Prioritized Experience Replay
Schaul et al., 2015.
Algorithm: Prioritized Experience Replay (PER). -
Rainbow: Combining Improvements in Deep Reinforcement Learning
Hessel et al., 2017.
Algorithm: Rainbow DQN. -
Asynchronous Methods for Deep Reinforcement Learning
Mnih et al., 2016.
Algorithm: A3C.
-
Trust Region Policy Optimization
Schulman et al., 2015.
Algorithm: TRPO. -
High-Dimensional Continuous Control Using Generalized Advantage Estimation
Schulman et al., 2015.
Algorithm: GAE. -
Proximal Policy Optimization Algorithms
Schulman et al., 2017.
Algorithm: PPO-Clip, PPO-Penalty. -
Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
Haarnoja et al., 2018.
Algorithm: SAC. -
Deterministic Policy Gradient Algorithms
Silver et al., 2014.
Algorithm: DPG. -
Continuous Control with Deep Reinforcement Learning
Lillicrap et al., 2015.
Algorithm: DDPG. -
Addressing Function Approximation Error in Actor-Critic Methods
Fujimoto et al., 2018.
Algorithm: TD3.
-
A Distributional Perspective on Reinforcement Learning
Bellemare et al., 2017.
Algorithm: C51. -
Distributional Reinforcement Learning with Quantile Regression
Dabney et al., 2017.
Algorithm: QR-DQN. -
Implicit Quantile Networks for Distributional Reinforcement Learning
Dabney et al., 2018.
Algorithm: IQN.
-
Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
Gu et al., 2016.
Algorithm: Q-Prop. -
Bridging the Gap Between Value and Policy Based Reinforcement Learning
Nachum et al., 2017.
Algorithm: PCL. -
Combining Policy Gradient and Q-learning
O'Donoghue et al., 2016.
Algorithm: PGQL. -
Evolution Strategies as a Scalable Alternative to Reinforcement Learning
Salimans et al., 2017.
Algorithm: Evolution Strategies (ES).
-
VIME: Variational Information Maximizing Exploration
Houthooft et al., 2016.
Algorithm: VIME. -
Unifying Count-Based Exploration and Intrinsic Motivation
Bellemare et al., 2016.
Algorithm: CTS-based Pseudocounts. -
Curiosity-driven Exploration by Self-supervised Prediction
Pathak et al., 2017.
Algorithm: Intrinsic Curiosity Module (ICM). -
Exploration by Random Network Distillation
Burda et al., 2018.
Algorithm: RND. -
Hindsight Experience Replay
Andrychowicz et al., 2017.
Algorithm: HER.
-
Model-Free Episodic Control
Blundell et al., 2016.
Algorithm: MFEC. -
Neural Episodic Control
Pritzel et al., 2017.
Algorithm: NEC.
-
Neural Architecture Search with Reinforcement Learning
Zoph and Le, 2016.
Algorithm: NAS-RL. -
AlphaGo Zero: Mastering the Game of Go without Human Knowledge
Silver et al., 2017.
Algorithm: AlphaGo Zero. -
Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
Silver et al., 2018.
Algorithm: AlphaZero. -
Mastering Atari, Go, Chess, and Shogi by Planning with a Learned Model
Schrittwieser et al., 2019.
Algorithm: MuZero.
-
Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
Nagabandi et al., 2018.
Algorithm: Model-Based Dyna. -
World Models
Ha and Schmidhuber, 2018.
Algorithm: World Models. -
Imagination-Augmented Agents for Deep Reinforcement Learning
Racanière et al., 2017.
Algorithm: Imagination-Augmented Agents (I2A). -
Learning to Think—Model-Based Control
Weber et al., 2017.
Algorithm: Value Prediction Networks (VPN).
-
FeUdal Networks for Hierarchical Reinforcement Learning
Vezhnevets et al., 2017.
Algorithm: Feudal Networks. -
Hierarchical Reinforcement Learning with Timed Subgoals
Kulkarni et al., 2016.
Algorithm: Hierarchical-DQN (h-DQN). -
Learning with Hierarchical Attention
Nachum et al., 2018.
Algorithm: HIRO. -
Options Framework for Hierarchical Reinforcement Learning
Sutton et al., 1999.
Algorithm: Options. -
Playing with Options
Bacon et al., 2017.
Algorithm: Option-Critic. -
Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
Kulkarni et al., 2016.
Algorithm: h-DQN with Intrinsic Motivation. -
Intrinsic Motivation for Hierarchical Reinforcement Learning
Nachum et al., 2019.
Algorithm: Multi-level HIRO.
-
Batch Constrained Q-Learning
Fujimoto et al., 2019.
Algorithm: BCQ. -
Conservative Q-Learning for Offline Reinforcement Learning
Kumar et al., 2020.
Algorithm: CQL. -
Behavior Regularized Actor Critic
Wu et al., 2019.
Algorithm: BRAC. -
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
Kumar et al., 2020.
Algorithm: Bootstrapped Q-Learning. -
Offline Meta-Reinforcement Learning with Advantage Weighting
Fu et al., 2020.
Algorithm: AWAC.
-
Safe Exploration in Reinforcement Learning
Berkenkamp et al., 2017.
Algorithm: SafeRL. -
Efficient Exploration through Bayesian Optimization for RL
Srinivas et al., 2010.
Algorithm: BO-RL.