Reinforcement Learning: Theory and Python Implementation

The First Reinforcement Learning Tutorial Book with one-on-one mapping TensorFlow 2 and PyTorch 1 Implementation

English Edition	中文版

Please email me if you are interested in publishing this book in other languages.

Features

This is a tutorial book on reinforcement learning, with explanation of theory and Python implementation.

Theory: Starting from a uniform mathematical framework, this book derives the theory and algorithms of reinforcement learning, including all major algorithms such as eligibility traces and soft actor-critic algorithms.
Practice: Every chapter is accompanied by high quality implementation based on Python 3.10, Gym 0.26, and TensorFlow 2 / PyTorch 1. All codes are compatible with Windows, Linux, and macOS, can be run in a laptop.

Supporting contents for English version

Check here for codes, exercise answers, etc.

Table of Codes

All codes have been saved as a .ipynb file and a .html file in the same directory.

Chapter	Environment & Closed-Form Policy	Agent
2	CliffWalking-v0	Bellman
3	FrozenLake-v1	DP
4	Blackjack-v1	MC
5	Taxi-v3	SARSA, ExpectedSARSA, QL, DoubleQL, SARSA(λ)
6	MountainCar-v0	SARSA, SARSA(λ), DQN tf torch, DoubleDQN tf torch, DuelDQN tf torch
7	CartPole-0	VPG tf torch, VPGwBaseline tf torch, OffPolicyVPG tf torch, OffPolicyVPGwBaseline tf torch
8	Acrobot-v1	QAC tf torch, AdvantageAC tf torch, EligibilityTraceAC tf torch, PPO tf torch, NPG tf torch, TRPO tf torch, OffPAC tf torch
9	Pendulum-v1	DDPG tf torch, TD3 tf torch
10	LunarLander-v2	SQL tf torch, SAC tf torch, SACwA tf torch
10	LunarLanderContinuous-v2	SACwA tf torch
11	BipedalWalker-v3	ES, ARS
12	PongNoFrameskip-v4	CategoricalDQN tf torch, QR-DQN tf torch, IQN tf torch
13	BernoulliMAB-v0	UCB
13	GaussianMAB-v0	UCB
14	TicTacToe-v0	AlphaZero tf torch
15 note	HumanoidBulletEnv-v0	BehaviorClone tf torch, GAIL tf torch
16	Tiger-v0	VI

Note:

It does not work with Gym >=0.25 and PyBullet 3.2.4. It is because Gym 0.25 changed metadata["render.modes"] to metadata["render_modes"], but PyBullet releases have not updated accordingly yet.

强化学习：原理与Python实现

全球第一本配套 TensorFlow 2 代码的强化学习教程书

中国第一本配套 TensorFlow 2 代码的纸质算法书

现已提供 TensorFlow 2 和 PyTorch 1 对照代码

中文版书籍支持内容

代码、勘误更新等见这里。

本书特色

本书介绍强化学习理论及其 Python 实现。

理论完备：全书用一套完整的数学体系，严谨地讲授强化学习的理论基础，主要定理均给出证明过程。各章内容循序渐进，覆盖了所有主流强化学习算法，包括资格迹等非深度强化学习算法和柔性执行者/评论者等深度强化学习算法。
案例丰富：在您最爱的操作系统（包括 Windows、macOS、Linux）上，基于 Python 3.10、Gym 0.25.2 和 TensorFlow 2 / PyTorch 1，实现强化学习算法。全书实现统一规范，体积小、重量轻。第 1～9 章给出了算法的配套实现，环境部分只依赖于 Gym 的最小安装，在没有 GPU 的计算机上也可运行；第 10～12 章介绍了多个热门综合案例，涵盖 Gym 的完整安装和自定义扩展，在有普通 GPU 的计算机上即可运行。

TensorFlow 2 和 PyTorch 1 对照代码

本书深度强化学习部分新增基于 TensorFlow 2 和 PyTorch 1 的对照实现。两个版本实现均和正文伪代码严格对应，两个版本仅在智能体部分实现不同，程序结构和智能体参数完全相同。ipynb格式见notebooks文件夹，HTML网页格式见html文件夹，两个版本内容相同。
代码已经过Python 3.10、Gym 0.26、TensorFlow 2和PyTorch 1验证。有错误请报错。

QQ群

QQ群：722846914（勘误报错可发此群，其他问题提问前请先Google，群主和管理员不提供免费咨询服务）
多任务群：696984257（非小白群，多任务强化学习+强化元学习+终身强化学习+迁移强化学习，勘误报错勿发此群，提问前请先Google）
关于入群验证问题：由于QQ的bug，即使正确输入答案，也可能会验证失败。这时更换设备重试、更换输入法重试、改日重试均可能解决问题。如果答案中有英文字母，清注意大小写。
中文版书前言中给出的QQ群（935702193、243613392和948110103）已满，不再新增群成员，谢谢理解。

Name		Name	Last commit message	Last commit date
Latest commit History 93 Commits
en2022		en2022
zh2019		zh2019
.gitattributes		.gitattributes
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Reinforcement Learning: Theory and Python Implementation

Supporting contents for English version

Table of Codes

强化学习：原理与Python实现

About

Releases

Packages

Languages

1024wangxiao/rl-book

Folders and files

Latest commit

History

Repository files navigation

Reinforcement Learning: Theory and Python Implementation

Supporting contents for English version

Table of Codes

强化学习：原理与Python实现

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages