This is the piece of code I used for my thesis, so I could get to know TensorFlow and OpenAI gym. In this thesis I explore the tradeoff between exploration of new actions and exploitation of known working strategies. I compare several different strategies such as argmax (only exploitation), e-greedy, decaying e-greedy and softmax. The agent is a multilayer perceptron.
Requirements for the code to run:
- TensorFlow
- openAI gym