Replies: 2 comments
-
Seems like could be done on Modal Sandbox |
Beta Was this translation helpful? Give feedback.
0 replies
-
Here's a PoC for what's described above! Still some details to iron out namely prompting format, but the foundations are there: #376 |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Thinking of verifiable reward models: we have a program verifier that would accept Python program strings as well as pre-and post-condition predicates, and return execution traces as well as counterexamples.
We are exploring the possibility of opening up this functionality as a REST API, as a simpler integration path than reimplementing this functionality in Python for the open-r1 repo.
Questions:
Thanks!
Beta Was this translation helpful? Give feedback.
All reactions