Deep Contextual Bandits

Date: January 2019

In this repository, I benchmark different Deep Reinforcement Learning (Deep RL) algorithms for the problem of contextual bandits.

1. What is the Deep Contextual Bandits problem?

Contextual Bandits is a RL problem without any state where a given context/features vector is given. In Deep Contextual Bandits, a neural network estimates the reward of an action, given a context. At each RL iteration, the action with the highest reward -as estimated by the neural network- is chosen. Once the action has been performed, the actual reward is received by he agent. The deep learning model is subsequently retrained, based on this real (ground truth) reward.

2. Getting started

First create a results folder at the same level of the README.md to run these scripts:

mkdir results

Two scripts can now be used to reproduce the benchmark results.

The first one is used to compare a variety of models on linear/wheel/covertype/mushrooms datasets (note: change the name attribute in script to select the one you want):

python run_full_analysis.py

The second one is used to compare the performance of a neural greedy model with different hyperparameters:

python run_nn_analysis.py

3. Sources

Some of the files in this repo come from a fork of a github implementation of the Deep Bayesian Bandits Showdown: An Empirical Comparison of Bayesian Deep Networks for Thompson Sampling paper, published in ICLR 2018. The forked github is available at https://github.com/pedevineau/models. The other files are mine.

Additions from the original project are :

LinUCB, neuralLinUCB and Lin Epsilon algorithms
CovertypeGAN and MushroomGAN for general context vectors and categorical context vectors.
Use of an artifical data generator in neural network based algorithms
Custom mushroom and covertype dataset readers
Benchmarker class : allows to run exxperiments several times, display results in a png, store results in pickle, store algo performance in csv format (csv for later tex export) - DataReader class is the associated class that allows to read again the pickle file and process data once again.

4. References

Some references:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Deep Contextual Bandits

1. What is the Deep Contextual Bandits problem?

2. Getting started

3. Sources

4. References

About

Releases

Packages

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
bandits		bandits
results		results
README.md		README.md
license.md		license.md
run_full_analysis.py		run_full_analysis.py
run_nn_analysis.py		run_nn_analysis.py

License

pedevineau/deep_contextual_bandits_reinforcement_learning

Folders and files

Latest commit

History

Repository files navigation

Deep Contextual Bandits

1. What is the Deep Contextual Bandits problem?

2. Getting started

3. Sources

4. References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages