-
Notifications
You must be signed in to change notification settings - Fork 308
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add stronger reward verification sandbox #207
Conversation
metadata_list.append({}) | ||
|
||
|
||
def check_correctness(in_outs: Optional[dict], generation, timeout=10, debug=True): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add a test case to show the usage of this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
done in 7f031b9
verl/trainer/main_ppo.py
Outdated
import asyncio | ||
from functools import partial | ||
|
||
from tqdm.asyncio import tqdm | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the ProcessPoolExecutor / asyncio libs are no longer needed in this file
@@ -142,6 +142,7 @@ reward_model: | |||
ulysses_sequence_parallel_size: 1 # sp size | |||
use_dynamic_bsz: ${critic.use_dynamic_bsz} | |||
forward_max_token_len_per_gpu: ${critic.forward_max_token_len_per_gpu} | |||
reward_manager: naive |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you add this option to https://github.com/volcengine/verl/blob/main/docs/examples/config.rst as well? thanks!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left some minor comments. No major concern from my side.
Add stronger verification support as is used in https://github.com/PRIME-RL/PRIME - [x] Batched verification - [x] Python interpreter - [x] Stronger math verifier - [x] Continuous score for code test Re-opening #207 to trigger automatic workflows
Add stronger verification support as is used in https://github.com/PRIME-RL/PRIME - [x] Batched verification - [x] Python interpreter - [x] Stronger math verifier - [x] Continuous score for code test Re-opening volcengine#207 to trigger automatic workflows
Add stronger verification support as is used in https://github.com/PRIME-RL/PRIME