New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Asynchronous Methods for Deep Reinforcement Learning #14

Open

flrngel opened this issue Mar 19, 2018 · 0 comments

Labels

Actor Critic Reinforcement Learning

Owner

flrngel commented Mar 19, 2018 •

edited

Loading

https://arxiv.org/abs/1602.01783
aka A3C by Google

This paper introduces Asynchronous 1-step Q-Learning, n-step Q-Learning, Sarsa, A3C
A3C is best

(image originally from openresearch.ai)

A3C is on-policy method (compare to Q-Learning is off-policy)

Loss = Policy Loss + 0.5 * Value Loss

\pi (x) has (typically) one softmax output for the policy with convolution network

one linear output for value function V with non-output layers shared

flrngel added Reinforcement Learning Actor Critic labels

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment