issue of format reward #15

hqhQAQ · 2025-02-10T15:14:14Z

I find that the format_reward function in grpo.py fails for me.

For example, when the content is "<think>Thinking process\n</think>\n\n<answer>\nFinal answer.\n</answer>", the current format_reward function determines it as unmatched. However, it should be in the format actually. As a result, my format_reward for all samples are almost all 0 along model training.

I solve this problem by modifying matches = [re.match(pattern, content) for content in completion_contents] to matches = [re.match(pattern, content, re.DOTALL) for content in completion_contents].

My re package version is 2.2.1, is this my fault?

The text was updated successfully, but these errors were encountered:

hqhQAQ · 2025-02-10T15:50:37Z

I find that this issue has been solved in R1-V.

hqhQAQ closed this as completed Feb 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

issue of format reward #15

issue of format reward #15

hqhQAQ commented Feb 10, 2025 •

edited

Loading

hqhQAQ commented Feb 10, 2025

issue of format reward #15

issue of format reward #15

Comments

hqhQAQ commented Feb 10, 2025 • edited Loading

hqhQAQ commented Feb 10, 2025

hqhQAQ commented Feb 10, 2025 •

edited

Loading