Skip to content

Source codes for the book "Reinforcement Learning: Theory and Python Implementation"

Notifications You must be signed in to change notification settings

1024wangxiao/rl-book

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

93 Commits
 
 
 
 
 
 
 
 

Repository files navigation

Reinforcement Learning: Theory and Python Implementation

The First Reinforcement Learning Tutorial Book with one-on-one mapping TensorFlow 2 and PyTorch 1 Implementation

English Edition 中文版
Book Book

Please email me if you are interested in publishing this book in other languages.

Features

This is a tutorial book on reinforcement learning, with explanation of theory and Python implementation.

  • Theory: Starting from a uniform mathematical framework, this book derives the theory and algorithms of reinforcement learning, including all major algorithms such as eligibility traces and soft actor-critic algorithms.
  • Practice: Every chapter is accompanied by high quality implementation based on Python 3.10, Gym 0.26, and TensorFlow 2 / PyTorch 1. All codes are compatible with Windows, Linux, and macOS, can be run in a laptop.

Supporting contents for English version

Check here for codes, exercise answers, etc.

Table of Codes

All codes have been saved as a .ipynb file and a .html file in the same directory.

Chapter Environment & Closed-Form Policy Agent
2 CliffWalking-v0 Bellman
3 FrozenLake-v1 DP
4 Blackjack-v1 MC
5 Taxi-v3 SARSA, ExpectedSARSA, QL, DoubleQL, SARSA(λ)
6 MountainCar-v0 SARSA, SARSA(λ), DQN tf torch, DoubleDQN tf torch, DuelDQN tf torch
7 CartPole-0 VPG tf torch, VPGwBaseline tf torch, OffPolicyVPG tf torch, OffPolicyVPGwBaseline tf torch
8 Acrobot-v1 QAC tf torch, AdvantageAC tf torch, EligibilityTraceAC tf torch, PPO tf torch, NPG tf torch, TRPO tf torch, OffPAC tf torch
9 Pendulum-v1 DDPG tf torch, TD3 tf torch
10 LunarLander-v2 SQL tf torch, SAC tf torch, SACwA tf torch
10 LunarLanderContinuous-v2 SACwA tf torch
11 BipedalWalker-v3 ES, ARS
12 PongNoFrameskip-v4 CategoricalDQN tf torch, QR-DQN tf torch, IQN tf torch
13 BernoulliMAB-v0 UCB
13 GaussianMAB-v0 UCB
14 TicTacToe-v0 AlphaZero tf torch
15 note HumanoidBulletEnv-v0 BehaviorClone tf torch, GAIL tf torch
16 Tiger-v0 VI

Note:

  1. It does not work with Gym >=0.25 and PyBullet 3.2.4. It is because Gym 0.25 changed metadata["render.modes"] to metadata["render_modes"], but PyBullet releases have not updated accordingly yet.

强化学习:原理与Python实现

全球第一本配套 TensorFlow 2 代码的强化学习教程书

中国第一本配套 TensorFlow 2 代码的纸质算法书

现已提供 TensorFlow 2 和 PyTorch 1 对照代码

中文版书籍支持内容

  • 代码、勘误更新等见这里

本书特色

本书介绍强化学习理论及其 Python 实现。

  • 理论完备:全书用一套完整的数学体系,严谨地讲授强化学习的理论基础,主要定理均给出证明过程。各章内容循序渐进,覆盖了所有主流强化学习算法,包括资格迹等非深度强化学习算法和柔性执行者/评论者等深度强化学习算法。
  • 案例丰富:在您最爱的操作系统(包括 Windows、macOS、Linux)上,基于 Python 3.10、Gym 0.25.2 和 TensorFlow 2 / PyTorch 1,实现强化学习算法。全书实现统一规范,体积小、重量轻。第 1~9 章给出了算法的配套实现,环境部分只依赖于 Gym 的最小安装,在没有 GPU 的计算机上也可运行;第 10~12 章介绍了多个热门综合案例,涵盖 Gym 的完整安装和自定义扩展,在有普通 GPU 的计算机上即可运行。

TensorFlow 2 和 PyTorch 1 对照代码

  • 本书深度强化学习部分新增基于 TensorFlow 2 和 PyTorch 1 的 对照实现。两个版本实现均和正文伪代码严格对应,两个版本仅在智能体部分实现不同,程序结构和智能体参数完全相同。ipynb格式见notebooks文件夹,HTML网页格式见html文件夹,两个版本内容相同。

  • 代码已经过Python 3.10、Gym 0.26、TensorFlow 2和PyTorch 1验证。有错误请报错。

QQ群

  • QQ群:722846914(勘误报错可发此群,其他问题提问前请先Google,群主和管理员不提供免费咨询服务)
  • 多任务群:696984257(非小白群,多任务强化学习+强化元学习+终身强化学习+迁移强化学习,勘误报错勿发此群,提问前请先Google)
  • 关于入群验证问题:由于QQ的bug,即使正确输入答案,也可能会验证失败。这时更换设备重试、更换输入法重试、改日重试均可能解决问题。如果答案中有英文字母,清注意大小写。
  • 中文版书前言中给出的QQ群(935702193、243613392和948110103)已满,不再新增群成员,谢谢理解。

About

Source codes for the book "Reinforcement Learning: Theory and Python Implementation"

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 80.2%
  • Jupyter Notebook 19.8%