Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we use VeRL to train the reward models #197

Open
YSLIU627 opened this issue Feb 4, 2025 · 2 comments
Open

Can we use VeRL to train the reward models #197

YSLIU627 opened this issue Feb 4, 2025 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@YSLIU627
Copy link
Contributor

YSLIU627 commented Feb 4, 2025

Hi Authors,
Your repo is really good and I wonder if we could use VeRL to train the reward models just like OpenRLHF.

@vermouth1992 vermouth1992 added the enhancement New feature or request label Feb 5, 2025
@vermouth1992
Copy link
Collaborator

Actually, we have a DPO implementation already on a branch. Training ORM should be pretty similar to training DPO. Do you also need training PRM?

@YSLIU627
Copy link
Contributor Author

YSLIU627 commented Feb 5, 2025

No, I only need to train the ORM. However, the DPO implementation requires the same number of positive and negative samples, while ORM training in math can tolerate such an imbalance. It would help a lot if you could have a script for the ORM training!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants