Skip to content

keivalya/key-deep-rl-papers

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

68 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Key Deep Reinforcement Learning Research Papers & Algorithms

List of papers in RL that are worth reading. This is a brief, and curated list from already existing list by Spinning-Up to reduce their work to (in my opinion) most important top-50 papers. Each week, I will be reading at least one of these papers, and share my insights on the same. I will also attempt to demonstrate that algorithm/research, depending on the nature of the work. This will help me breakdown complex topics into smaller bites for easier understanding of self, as well as the readers of these blog-ish tutorials.

Foundational Reinforcement Learning Algorithms

  1. Playing Atari with Deep Reinforcement Learning
    Mnih et al., 2013.
    Algorithm: DQN.

    Challenge? The paper tackles the long-standing challenge of enabling agents to learn control policies directly from high-dimensional sensory inputs (here, vision), using RL. Previous RL relied heavily on hand-crafted features, limiting its applicability to complex real-world scenarios.

    Proposed Approach? A system using a CNN trained with a variant of Q-learning. The network takes raw pixel data as input and outputs a value function estimating future rewards.

    Key results? Outperformed previous methods on six out of the seven games. It also surpassed human expert performance on three of them.

(BONUS) DeepSeekMath : <TODO-Add content here>

  1. Deep Recurrent Q-Learning for Partially Observable MDPs
    Hausknecht and Stone, 2015.
    Algorithm: Deep Recurrent Q-Learning.

    Challenge? Standard Deep Q-Networks (DQNs) struggle with Partially Observable Markov Decision Processes (POMDPs) because they rely on complete state information (the game screen) at each decision point. DQNs use a limited history (typically 4 frames) and cannot remember events further in the past, making them unsuitable for games where long-term memory is crucial.

    Proposed Approach? introduce Deep Recurrent Q-Network (DRQN) which incorporates an LSTM (Long Short-Term Memory) layer to integrate information over time and address partial observability.

    Key results? DRQN handles noisy observations in POMDPs better by using an LSTM to integrate information through time. DRQN can perform well even when receiving only a single frame at each timestep. It provides a strong alternative to using frame stacking in regular DQN networks. While DRQN shows better generalization between MDPs and POMDPs, it does not show a systematic performance improvement versus regular DQNs.

  2. Dueling Network Architectures for Deep Reinforcement Learning
    Wang et al., 2015.
    Algorithm: Dueling DQN.

  3. Deep Reinforcement Learning with Double Q-learning
    Van Hasselt et al., 2015.
    Algorithm: Double DQN.

  4. Prioritized Experience Replay
    Schaul et al., 2015.
    Algorithm: Prioritized Experience Replay (PER).

  5. Rainbow: Combining Improvements in Deep Reinforcement Learning
    Hessel et al., 2017.
    Algorithm: Rainbow DQN.

  6. Asynchronous Methods for Deep Reinforcement Learning
    Mnih et al., 2016.
    Algorithm: A3C.

Policy Optimization and Actor-Critic Methods

  1. Trust Region Policy Optimization
    Schulman et al., 2015.
    Algorithm: TRPO.

  2. High-Dimensional Continuous Control Using Generalized Advantage Estimation
    Schulman et al., 2015.
    Algorithm: GAE.

  3. Proximal Policy Optimization Algorithms
    Schulman et al., 2017.
    Algorithm: PPO-Clip, PPO-Penalty.

  4. Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor
    Haarnoja et al., 2018.
    Algorithm: SAC.

  5. Deterministic Policy Gradient Algorithms
    Silver et al., 2014.
    Algorithm: DPG.

  6. Continuous Control with Deep Reinforcement Learning
    Lillicrap et al., 2015.
    Algorithm: DDPG.

  7. Addressing Function Approximation Error in Actor-Critic Methods
    Fujimoto et al., 2018.
    Algorithm: TD3.

Distributional Reinforcement Learning

  1. A Distributional Perspective on Reinforcement Learning
    Bellemare et al., 2017.
    Algorithm: C51.

  2. Distributional Reinforcement Learning with Quantile Regression
    Dabney et al., 2017.
    Algorithm: QR-DQN.

  3. Implicit Quantile Networks for Distributional Reinforcement Learning
    Dabney et al., 2018.
    Algorithm: IQN.

Hybrid and Alternative Approaches

  1. Q-Prop: Sample-Efficient Policy Gradient with An Off-Policy Critic
    Gu et al., 2016.
    Algorithm: Q-Prop.

  2. Bridging the Gap Between Value and Policy Based Reinforcement Learning
    Nachum et al., 2017.
    Algorithm: PCL.

  3. Combining Policy Gradient and Q-learning
    O'Donoghue et al., 2016.
    Algorithm: PGQL.

  4. Evolution Strategies as a Scalable Alternative to Reinforcement Learning
    Salimans et al., 2017.
    Algorithm: Evolution Strategies (ES).

Exploration and Intrinsic Motivation

  1. VIME: Variational Information Maximizing Exploration
    Houthooft et al., 2016.
    Algorithm: VIME.

  2. Unifying Count-Based Exploration and Intrinsic Motivation
    Bellemare et al., 2016.
    Algorithm: CTS-based Pseudocounts.

  3. Curiosity-driven Exploration by Self-supervised Prediction
    Pathak et al., 2017.
    Algorithm: Intrinsic Curiosity Module (ICM).

  4. Exploration by Random Network Distillation
    Burda et al., 2018.
    Algorithm: RND.

  5. Hindsight Experience Replay
    Andrychowicz et al., 2017.
    Algorithm: HER.

Episodic Control and Memory-Based Methods

  1. Model-Free Episodic Control
    Blundell et al., 2016.
    Algorithm: MFEC.

  2. Neural Episodic Control
    Pritzel et al., 2017.
    Algorithm: NEC.

Game Applications and Breakthrough Systems

  1. Neural Architecture Search with Reinforcement Learning
    Zoph and Le, 2016.
    Algorithm: NAS-RL.

  2. AlphaGo Zero: Mastering the Game of Go without Human Knowledge
    Silver et al., 2017.
    Algorithm: AlphaGo Zero.

  3. Mastering Chess and Shogi by Self-Play with a General Reinforcement Learning Algorithm
    Silver et al., 2018.
    Algorithm: AlphaZero.

  4. Mastering Atari, Go, Chess, and Shogi by Planning with a Learned Model
    Schrittwieser et al., 2019.
    Algorithm: MuZero.

Model-Based Reinforcement Learning

  1. Neural Network Dynamics for Model-Based Deep Reinforcement Learning with Model-Free Fine-Tuning
    Nagabandi et al., 2018.
    Algorithm: Model-Based Dyna.

  2. World Models
    Ha and Schmidhuber, 2018.
    Algorithm: World Models.

  3. Imagination-Augmented Agents for Deep Reinforcement Learning
    Racanière et al., 2017.
    Algorithm: Imagination-Augmented Agents (I2A).

  4. Learning to Think—Model-Based Control
    Weber et al., 2017.
    Algorithm: Value Prediction Networks (VPN).

Hierarchical Reinforcement Learning and Temporal Abstractions

  1. FeUdal Networks for Hierarchical Reinforcement Learning
    Vezhnevets et al., 2017.
    Algorithm: Feudal Networks.

  2. Hierarchical Reinforcement Learning with Timed Subgoals
    Kulkarni et al., 2016.
    Algorithm: Hierarchical-DQN (h-DQN).

  3. Learning with Hierarchical Attention
    Nachum et al., 2018.
    Algorithm: HIRO.

  4. Options Framework for Hierarchical Reinforcement Learning
    Sutton et al., 1999.
    Algorithm: Options.

  5. Playing with Options
    Bacon et al., 2017.
    Algorithm: Option-Critic.

  6. Hierarchical Deep Reinforcement Learning: Integrating Temporal Abstraction and Intrinsic Motivation
    Kulkarni et al., 2016.
    Algorithm: h-DQN with Intrinsic Motivation.

  7. Intrinsic Motivation for Hierarchical Reinforcement Learning
    Nachum et al., 2019.
    Algorithm: Multi-level HIRO.

Offline Reinforcement Learning

  1. Batch Constrained Q-Learning
    Fujimoto et al., 2019.
    Algorithm: BCQ.

  2. Conservative Q-Learning for Offline Reinforcement Learning
    Kumar et al., 2020.
    Algorithm: CQL.

  3. Behavior Regularized Actor Critic
    Wu et al., 2019.
    Algorithm: BRAC.

  4. Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction
    Kumar et al., 2020.
    Algorithm: Bootstrapped Q-Learning.

  5. Offline Meta-Reinforcement Learning with Advantage Weighting
    Fu et al., 2020.
    Algorithm: AWAC.

Safe and Constrained Reinforcement Learning

  1. Safe Exploration in Reinforcement Learning
    Berkenkamp et al., 2017.
    Algorithm: SafeRL.

  2. Efficient Exploration through Bayesian Optimization for RL
    Srinivas et al., 2010.
    Algorithm: BO-RL.

Languages