________ _______ ___ ___ ___ ___ ___ |\ __ \|\ ___ \ |\ \|\ \ |\ \ |\ \ / /| \ \ \|\ \ \ __/|\ \ \ \ \ \ \ \ \ \ \/ / / \ \ _ _\ \ \_|/_\ \ \ \ \ \ \ \ \ \ / / \ \ \\ \\ \ \_|\ \ \ \ \ \____\ \ \____ \/ / / \ \__\\ _\\ \_______\ \__\ \_______\ \_______\__/ / / \|__|\|__|\|_______|\|__|\|_______|\|_______|\___/ / \|___|/
Clone the repository including submodules:
git clone --recurse-submodules -j8 https://github.com/CavenaghiEmanuele/REILLY.git
Build the package with C++ backend and install:
cd REILLY && sudo python3 setup.py install
- empty - Not implemented
- ✔️ - Already implemented
- ❌ - Non-existent
Name | On-Policy | Off-Policy | Python | C/C++ |
---|---|---|---|---|
MonteCarlo (First Visit) | ✔️ | ✔️ | ✔️ | |
MonteCarlo (Every Visit) | ✔️ | ✔️ | ✔️ |
Name | On-Policy | Off-Policy | Python | C/C++ |
---|---|---|---|---|
Sarsa | ✔️ | ✔️ | ✔️ | |
Q-learning | ❌ | ✔️ | ✔️ | ✔️ |
Expected Sarsa | ✔️ | ✔️ | ✔️ |
Name | On-Policy | Off-Policy | Python | C/C++ |
---|---|---|---|---|
Double Sarsa | ✔️ | ✔️ | ✔️ | |
Double Q-learning | ❌ | ✔️ | ✔️ | ✔️ |
Double Expected Sarsa | ✔️ | ✔️ | ✔️ |
Name | On-Policy | Off-Policy | Python | C/C++ |
---|---|---|---|---|
n-step Sarsa | ✔️ | ✔️ | ✔️ | |
n-step Expected Sarsa | ✔️ | ✔️ | ✔️ | |
n-step Tree Backup | ❌ | ✔️ | ✔️ | |
n-step Q(σ) |
Name | Python | C/C++ |
---|---|---|
Random-sample one-step tabular Q-planning | ✔️ | |
Tabular Dyna-Q | ✔️ | |
Tabular Dyna-Q+ | ✔️ | |
Prioritized sweeping | ✔️ |
Name | Python | C/C++ |
---|---|---|
1-D Tiling | ✔️ | ✔️ |
n-D Tiling | ✔️ | ✔️ |
Tiling offset | ✔️ | ✔️ |
Different tiling dimensions | ✔️ | ✔️ |
Name | Python | C/C++ |
---|---|---|
Base implementation | ✔️ | ✔️ |
With trace | ✔️ |
Name | On-Policy | Off-Policy | Python | C/C++ |
---|---|---|---|---|
Semi-gradient MonteCarlo | ✔️ | ✔️ |
Name | On-Policy | Off-Policy | Differential | Python | C/C++ |
---|---|---|---|---|---|
Semi-gradient Sarsa | ✔️ | ✔️ | ✔️ | ||
Semi-gradient Expected Sarsa | ✔️ | ✔️ | ✔️ |
Name | On-Policy | Off-Policy | Differential | Python | C/C++ |
---|---|---|---|---|---|
Semi-gradient n-step Sarsa | ✔️ | ✔️ | ✔️ | ||
Semi-gradient n-step Expected Sarsa | ✔️ | ✔️ | ✔️ |
Name | On-Policy | Off-Policy | Python | C/C++ |
---|---|---|---|---|
Accumulating Trace | ✔️ | ✔️ | ||
Replacing Trace | ✔️ | ✔️ | ||
Dutch Trace |
Name | On-Policy | Off-Policy | Python | C/C++ |
---|---|---|---|---|
Temporal difference (λ) | ||||
True Online TD(λ) | ||||
Sarsa(λ) | ✔️ | ✔️ | ||
True Online Sarsa(λ) | ||||
Forward Sarsa(λ) | ||||
Watkins’s Q(λ) | ||||
Tree-Backup Q(λ) |
Name | Discrete State? | Discrete Action? | Linear State? | Multi-Agent? |
---|---|---|---|---|
FrozenLake4x4 | Yes | Yes | Yes | No |
FrozenLake8x8 | Yes | Yes | Yes | No |
Taxi | Yes | Yes | Yes | No |
MountainCar | No | Yes | No | No |
Name | Discrete State? | Discrete Action? | Linear State? | Multi-Agent? |
---|---|---|---|---|
Text | Yes | Yes | No | Yes |
Name | Multi-Agent? | Joint Train? | Joint Test? |
---|---|---|---|
Session | No | No | No |
JointSession | Yes | Optional | Yes |