You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Description/Motivation
GRPO is the cutting-edge RL algorithm, developed by the DeepSeek team (https://arxiv.org/abs/2402.03300). Hence, it is not available in Stable-Baselines3. Hence, it would be an advantage to have GRPO in our library.
Consider the following file, which is a dummy file generated by ChatGPT and DeepSeek. This will not work but giving a hint of how the algorithm will work:
ChatGPT: grpo.txt
ChatGPT+DeepSeek: grpo_2.txt
Task list
1. Do this
2. Do that
Related issues
#...
Cross references
...
The text was updated successfully, but these errors were encountered:
Description/Motivation
GRPO is the cutting-edge RL algorithm, developed by the DeepSeek team (https://arxiv.org/abs/2402.03300). Hence, it is not available in Stable-Baselines3. Hence, it would be an advantage to have GRPO in our library.
Consider the following file, which is a dummy file generated by ChatGPT and DeepSeek. This will not work but giving a hint of how the algorithm will work:
ChatGPT: grpo.txt
ChatGPT+DeepSeek: grpo_2.txt
Task list
Related issues
#...
Cross references
...
The text was updated successfully, but these errors were encountered: