You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I find that the format_reward function in grpo.py fails for me.
For example, when the content is "<think>Thinking process\n</think>\n\n<answer>\nFinal answer.\n</answer>", the current format_reward function determines it as unmatched. However, it should be in the format actually. As a result, my format_reward for all samples are almost all 0 along model training.
I solve this problem by modifying matches = [re.match(pattern, content) for content in completion_contents] to matches = [re.match(pattern, content, re.DOTALL) for content in completion_contents].
My re package version is 2.2.1, is this my fault?
The text was updated successfully, but these errors were encountered:
I find that the
format_reward
function ingrpo.py
fails for me.For example, when the content is "<think>Thinking process\n</think>\n\n<answer>\nFinal answer.\n</answer>", the current
format_reward
function determines it as unmatched. However, it should be in the format actually. As a result, my format_reward for all samples are almost all 0 along model training.I solve this problem by modifying
matches = [re.match(pattern, content) for content in completion_contents]
tomatches = [re.match(pattern, content, re.DOTALL) for content in completion_contents]
.My
re
package version is2.2.1
, is this my fault?The text was updated successfully, but these errors were encountered: