Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Asynchronous Methods for Deep Reinforcement Learning #14

Open
flrngel opened this issue Mar 19, 2018 · 0 comments
Open

Asynchronous Methods for Deep Reinforcement Learning #14

flrngel opened this issue Mar 19, 2018 · 0 comments

Comments

@flrngel
Copy link
Owner

flrngel commented Mar 19, 2018

https://arxiv.org/abs/1602.01783
aka A3C by Google

This paper introduces Asynchronous 1-step Q-Learning, n-step Q-Learning, Sarsa, A3C
A3C is best

image
(image originally from openresearch.ai)

A3C is on-policy method (compare to Q-Learning is off-policy)
image

Loss = Policy Loss + 0.5 * Value Loss
image
image

\pi (x) has (typically) one softmax output for the policy with convolution network

one linear output for value function V with non-output layers shared

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant