Overview | Installation | Agents | Examples |
This repository is a part of my master thesis project at UCL. It builds upon the acme framework and implements two new offline RL algorithms.
The experiments here are run on the MiniGrid
environemnt, but the code is modular and a new environemnt can be tested simply by
implementing a new _build_environment()
func that returns an environment in
appropriate wrappers.
An example of a working environment is set up in each of the example colaboratory notebooks provided.
This repo implements 3 different algorithms:
- Conservative Q-learning (CQL)
- Critic Regularized Regression (CRR)
- Behavioural Cloning adopted from acme
after setting up a wandb account, all the results of our experiments along with the versioned datasets can be accessed here
New datasets can be easily collected using the dataset_collection_pipeline colab notebook.
Experiments can be run from run_experiment_pipeline notebook.
Both of these notebooks are well documented. Each new experiment that is run is tracked and checkpointed to WandB.
If you'd like to resume an existing run, it is sufficient to pass the specific run_id
as a '--wandb_id' flag to any of the algorithm run scripts.