We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
https://arxiv.org/abs/1602.01783 aka A3C by Google
This paper introduces Asynchronous 1-step Q-Learning, n-step Q-Learning, Sarsa, A3C A3C is best
(image originally from openresearch.ai)
A3C is on-policy method (compare to Q-Learning is off-policy)
Loss = Policy Loss + 0.5 * Value Loss
\pi (x) has (typically) one softmax output for the policy with convolution network
one linear output for value function V with non-output layers shared
The text was updated successfully, but these errors were encountered:
No branches or pull requests
https://arxiv.org/abs/1602.01783
aka A3C by Google
This paper introduces Asynchronous 1-step Q-Learning, n-step Q-Learning, Sarsa, A3C
A3C is best
(image originally from openresearch.ai)
A3C is on-policy method (compare to Q-Learning is off-policy)
Loss = Policy Loss + 0.5 * Value Loss
\pi (x) has (typically) one softmax output for the policy with convolution network
one linear output for value function V with non-output layers shared
The text was updated successfully, but these errors were encountered: