Official Repository for "A Joint Imitation-Reinforcement Learning (JIRL) Framework for Reduced Baseline Regret"
The report contains a detailed description of the experimental settings and hyperparameters used to obtain the results reported in our paper.
- Leveraging a baseline’s online demonstrations to minimize the regret w.r.t the baseline policy during training
- Eventually surpassing the baseline performance
- Access to a baseline policy at every time step
- Uses an off-policy RL algorithm