You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
kl_penalty is calculated into batch['token_level_reward'] in function apply_kl_penalty() (trainer/ppo/ray_trainer.py). But in function update_policy(), kl_loss is added to the final loss again. (workers/actor/dp_actor.py). So KL penalty might be applied twice?
The text was updated successfully, but these errors were encountered:
kl_penalty is calculated into batch['token_level_reward'] in function apply_kl_penalty() (trainer/ppo/ray_trainer.py). But in function update_policy(), kl_loss is added to the final loss again. (workers/actor/dp_actor.py). So KL penalty might be applied twice?
The text was updated successfully, but these errors were encountered: