Add stronger reward verification sandbox #207

ZefanW · 2025-02-05T10:33:10Z

Add stronger verification support as is used in https://github.com/PRIME-RL/PRIME

Batched verification
Python interpreter
Stronger math verifier
Continuous score for code test

verl/trainer/main_ppo.py

vermouth1992 · 2025-02-06T14:25:49Z

verl/utils/reward_score/prime_code/utils.py

+            metadata_list.append({})
+
+
+def check_correctness(in_outs: Optional[dict], generation, timeout=10, debug=True):


Could you add a test case to show the usage of this function?

done in 7f031b9

verl/trainer/main_ppo.py

eric-haibin-lin · 2025-02-09T09:15:52Z

verl/trainer/main_ppo.py

+import asyncio
+from functools import partial
+
+from tqdm.asyncio import tqdm



the ProcessPoolExecutor / asyncio libs are no longer needed in this file

eric-haibin-lin · 2025-02-09T09:17:33Z

verl/trainer/config/ppo_trainer.yaml

@@ -142,6 +142,7 @@ reward_model:
  ulysses_sequence_parallel_size: 1 # sp size
  use_dynamic_bsz: ${critic.use_dynamic_bsz}
  forward_max_token_len_per_gpu: ${critic.forward_max_token_len_per_gpu}
+  reward_manager: naive


could you add this option to https://github.com/volcengine/verl/blob/main/docs/examples/config.rst as well? thanks!

eric-haibin-lin

Left some minor comments. No major concern from my side.

Add stronger verification support as is used in https://github.com/PRIME-RL/PRIME - [x] Batched verification - [x] Python interpreter - [x] Stronger math verifier - [x] Continuous score for code test Re-opening #207 to trigger automatic workflows

Add stronger verification support as is used in https://github.com/PRIME-RL/PRIME - [x] Batched verification - [x] Python interpreter - [x] Stronger math verifier - [x] Continuous score for code test Re-opening volcengine#207 to trigger automatic workflows

ZefanW added 2 commits February 5, 2025 17:07

prime sandbox

7638e21

minor fixes

e5fd604

eric-haibin-lin reviewed Feb 5, 2025

View reviewed changes

verl/trainer/main_ppo.py Outdated Show resolved Hide resolved

ZefanW added 4 commits February 6, 2025 10:07

formating

64012ab

gsm/math compatible

9244c8a

continuous score

35101b5

minor fix

6c76fc3

vermouth1992 reviewed Feb 6, 2025

View reviewed changes

verl/trainer/main_ppo.py Outdated Show resolved Hide resolved

ZefanW added 7 commits February 7, 2025 19:39

validation func argument; sandbox test

7f031b9

formatting

c4b435a

asyncio starvation handling

ad8bc46

add CI test

3f8eefc

requirements

ea05184

split reward managers

6362a04

add package init

afb34eb

eric-haibin-lin reviewed Feb 9, 2025

View reviewed changes

eric-haibin-lin approved these changes Feb 9, 2025

View reviewed changes

vermouth1992 changed the title ~~[WIP] Add stronger reward verification sandbox~~ Add stronger reward verification sandbox Feb 9, 2025

reformat

cd49c13

ZefanW closed this Feb 9, 2025

ZefanW mentioned this pull request Feb 9, 2025

Add stronger reward verification sandbox #233

Merged

4 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add stronger reward verification sandbox #207

Add stronger reward verification sandbox #207

ZefanW commented Feb 5, 2025 •

edited

Loading

vermouth1992 Feb 6, 2025

ZefanW Feb 7, 2025

eric-haibin-lin Feb 9, 2025

eric-haibin-lin Feb 9, 2025

eric-haibin-lin left a comment •

edited

Loading

		metadata_list.append({})


		def check_correctness(in_outs: Optional[dict], generation, timeout=10, debug=True):

Add stronger reward verification sandbox #207

Add stronger reward verification sandbox #207

Conversation

ZefanW commented Feb 5, 2025 • edited Loading

vermouth1992 Feb 6, 2025

Choose a reason for hiding this comment

ZefanW Feb 7, 2025

Choose a reason for hiding this comment

eric-haibin-lin Feb 9, 2025

Choose a reason for hiding this comment

eric-haibin-lin Feb 9, 2025

Choose a reason for hiding this comment

eric-haibin-lin left a comment • edited Loading

Choose a reason for hiding this comment

ZefanW commented Feb 5, 2025 •

edited

Loading

eric-haibin-lin left a comment •

edited

Loading