Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RL: Add GRPO as Policy to the Pool of Objects #1130

Open
2 tasks
steveyuwono opened this issue Feb 11, 2025 · 0 comments
Open
2 tasks

RL: Add GRPO as Policy to the Pool of Objects #1130

steveyuwono opened this issue Feb 11, 2025 · 0 comments
Assignees
Labels
enhancement New feature or request RL Reinforcement Learning

Comments

@steveyuwono
Copy link
Contributor

steveyuwono commented Feb 11, 2025

Description/Motivation
GRPO is the cutting-edge RL algorithm, developed by the DeepSeek team (https://arxiv.org/abs/2402.03300). Hence, it is not available in Stable-Baselines3. Hence, it would be an advantage to have GRPO in our library.

Consider the following file, which is a dummy file generated by ChatGPT and DeepSeek. This will not work but giving a hint of how the algorithm will work:
ChatGPT: grpo.txt
ChatGPT+DeepSeek: grpo_2.txt

Task list

  • 1. Do this
  • 2. Do that

Related issues
#...

Cross references
...

@steveyuwono steveyuwono added enhancement New feature or request RL Reinforcement Learning labels Feb 11, 2025
@steveyuwono steveyuwono self-assigned this Feb 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request RL Reinforcement Learning
Projects
None yet
Development

No branches or pull requests

1 participant