Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The response in the SFT data is not fully utilized. #290

Open
HAOChuzhan opened this issue Feb 12, 2025 · 0 comments
Open

The response in the SFT data is not fully utilized. #290

HAOChuzhan opened this issue Feb 12, 2025 · 0 comments

Comments

@HAOChuzhan
Copy link

Recent experiments have found that in the training data used by the GRPOTrainer, only the query from the SFT data is passed in, while the response or solution is discarded. I would like to know if, under these circumstances, can the RL training outperform the results of SFT? Additionally, why don’t we add the response from the SFT data into the list of multiple generated responses in training process?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant