This is an implementation of code for a reinforcement learning course.
This repository implements a set of algorithms to solve the multi-armed bandit problem:
- Epsilon Greedy (epsilon_greedy.py)
- Optimistic Initial Value (optimistic_initial_value.py)
- Upper Confidence Bound (ucb.py)
- Thompson Sampling (thompson.py)
Furthermore, we implemented 2 sample bandit interfaces as examples of how the algorithms (agent) can interact with bandits (environment).