Skip to content
This repository has been archived by the owner on Sep 24, 2023. It is now read-only.

how negative numbers affect gradient descent. #31

Open
yxiao54 opened this issue Mar 25, 2020 · 3 comments
Open

how negative numbers affect gradient descent. #31

yxiao54 opened this issue Mar 25, 2020 · 3 comments

Comments

@yxiao54
Copy link

yxiao54 commented Mar 25, 2020

The loss may be negative number in the model. The reason is that the reinforce loss is often to be a negative number since the reward is the larger the better. But I am very confusing about how negative numbers affect gradient descent.

I also notice that the hybrid loss tend to be zero eventually. How can loss increase with gradient descent?

@malashinroman
Copy link
Contributor

malashinroman commented May 19, 2020

This is a standard approach to use negative loss values in reinforcement learning to turn gradient descent into gradient ascent. Minimizing negative loss is maximizing the same loss without minus sign. To my knowledge there is no issues in pytorch with this.

In A3C algorithm (used in this project) the loss can increase during training. The reason is that reinforcement loss is measured as the advantage over baseline prediction. Baseline is network that is learned during training and in the start of the training, it's prediction are poor and it's very easy to have an advantage over it. At least this how I see what is going on here.

@litingfeng
Copy link

@malashinroman Hi, May I ask why is this an A3C algorithm?

To me, all the images in a batch share the same agent. And the update is not asynchronous. While in A3C, the agents are different in different processes, and they update asynchronously to the central network. Please let me know if I'm wrong. I'm new to RL. Thanks!

@malashinroman
Copy link
Contributor

malashinroman commented Mar 17, 2021 via email

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants