Why are Chat's Critic model parameters updated with Actor's parameters? #3400

yynil · 2023-04-02T08:27:02Z

yynil
Apr 2, 2023

Hi, all developers,
In the current PPO's trainer implementation, Critic model is almost the same with the reward model. During training, the critic model is used to give the response sequences values, reward model is used to give reward scores,
However, currently the critic and reward model is almost identical from the beginning of the training,
So the difference between value and reward is the same from beginning. After several updates after training, the parameters of critic model are updated while reward model's parameters are kept frozen,

How does that mechanism work?

Richard7327 · 2023-04-02T09:58:42Z

Richard7327
Apr 2, 2023

Just few weeks of trading with a professional broker, I was able to withdraw $15,400, on which my starting capital was
$1,500. @hannahmiafx on Instagram is the best account manager in the crypto market and forex trading. Get in contact with her and start earning.

0 replies

yynil · 2023-04-02T10:07:55Z

yynil
Apr 2, 2023
Author

For your information, I'm familiar with TRL's PPO implementation. The original PPO's algorithm has only the Critic Model(aka. reward model). I can understand the Critic model separated from Reward model to control output better, I just don't understand why they are identical from the begging of the training and then Critic model is updated and Reward model remains unchanged. Especially the final exported model is still the language model. Why are these two models merged as one just like TRL's PPOTrainer?
Sadly with my graphic card, TRL can't even finish training PPO with batch_size 1. Hopefully ColossalAI could help to finish the training.

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Why are Chat's Critic model parameters updated with Actor's parameters? #3400

{{title}}

Replies: 2 comments

{{title}}

{{title}}

Select a reply

Why are Chat's Critic model parameters updated with Actor's parameters? #3400

yynil Apr 2, 2023

Replies: 2 comments

Richard7327 Apr 2, 2023

yynil Apr 2, 2023 Author

yynil
Apr 2, 2023

Richard7327
Apr 2, 2023

yynil
Apr 2, 2023
Author