RL: Add GRPO as Policy to the Pool of Objects #1130

steveyuwono · 2025-02-11T09:53:44Z

Description/Motivation
GRPO is the cutting-edge RL algorithm, developed by the DeepSeek team (https://arxiv.org/abs/2402.03300). Hence, it is not available in Stable-Baselines3. Hence, it would be an advantage to have GRPO in our library.

Consider the following file, which is a dummy file generated by ChatGPT and DeepSeek. This will not work but giving a hint of how the algorithm will work:
ChatGPT: grpo.txt
ChatGPT+DeepSeek: grpo_2.txt

Task list

1. Do this
2. Do that

Related issues
#...

Cross references
...

steveyuwono added enhancement New feature or request RL Reinforcement Learning labels Feb 11, 2025

steveyuwono self-assigned this Feb 11, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RL: Add GRPO as Policy to the Pool of Objects #1130

RL: Add GRPO as Policy to the Pool of Objects #1130

steveyuwono commented Feb 11, 2025 •

edited

Loading

RL: Add GRPO as Policy to the Pool of Objects #1130

RL: Add GRPO as Policy to the Pool of Objects #1130

Comments

steveyuwono commented Feb 11, 2025 • edited Loading

steveyuwono commented Feb 11, 2025 •

edited

Loading