You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Sep 24, 2023. It is now read-only.
The loss may be negative number in the model. The reason is that the reinforce loss is often to be a negative number since the reward is the larger the better. But I am very confusing about how negative numbers affect gradient descent.
I also notice that the hybrid loss tend to be zero eventually. How can loss increase with gradient descent?
The text was updated successfully, but these errors were encountered:
This is a standard approach to use negative loss values in reinforcement learning to turn gradient descent into gradient ascent. Minimizing negative loss is maximizing the same loss without minus sign. To my knowledge there is no issues in pytorch with this.
In A3C algorithm (used in this project) the loss can increase during training. The reason is that reinforcement loss is measured as the advantage over baseline prediction. Baseline is network that is learned during training and in the start of the training, it's prediction are poor and it's very easy to have an advantage over it. At least this how I see what is going on here.
@malashinroman Hi, May I ask why is this an A3C algorithm?
To me, all the images in a batch share the same agent. And the update is not asynchronous. While in A3C, the agents are different in different processes, and they update asynchronously to the central network. Please let me know if I'm wrong. I'm new to RL. Thanks!
I think you're right. I thought about different environments, but there are no asynchronous agents
--
Roman среда, 17 марта 2021г., 22:50 +03:00 от litingfeng ***@***.*** :
***@***.*** Hi, May I ask why is this an A3C algorithm?
To me, all the images in a batch share the same agent. And the update is not asynchronous. While in A3C, the agents are different in different processes, and they update asynchronously to the central network. Please let me know if I'm wrong. I'm new to RL. Thanks!
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub , or unsubscribe .
Sign up for freeto subscribe to this conversation on GitHub.
Already have an account?
Sign in.
The loss may be negative number in the model. The reason is that the reinforce loss is often to be a negative number since the reward is the larger the better. But I am very confusing about how negative numbers affect gradient descent.
I also notice that the hybrid loss tend to be zero eventually. How can loss increase with gradient descent?
The text was updated successfully, but these errors were encountered: