Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about kl_penalty #211

Open
StarDewXXX opened this issue Feb 6, 2025 · 1 comment
Open

Question about kl_penalty #211

StarDewXXX opened this issue Feb 6, 2025 · 1 comment
Assignees
Labels
question Further information is requested

Comments

@StarDewXXX
Copy link

kl_penalty is calculated into batch['token_level_reward'] in function apply_kl_penalty() (trainer/ppo/ray_trainer.py). But in function update_policy(), kl_loss is added to the final loss again. (workers/actor/dp_actor.py). So KL penalty might be applied twice?

@PeterSH6
Copy link
Collaborator

PeterSH6 commented Feb 6, 2025

Hi @StarDewXXX, if kl_loss is used, the kl_penalty will not be applied to the reward. You can see the code: https://github.com/volcengine/verl/blob/main/verl/trainer/ppo/ray_trainer.py#L717-L723

@PeterSH6 PeterSH6 added the question Further information is requested label Feb 9, 2025
@PeterSH6 PeterSH6 self-assigned this Feb 9, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

No branches or pull requests

2 participants