dpo训练出现重复和随机的英文乱码 #6679

liuanping · 2025-01-17T03:26:52Z

Reminder

I have read the above rules and searched the existing issues.

System Info

利用模型生成的多次输出，并基于大模型完成优和差的筛选对比构建了dpo训练数据，训练后发现出现了重复（特别是训练数据出现的重复更多），看了看loss感觉不能保证模型安按照target进行输出。是否应该改进下目前的loss 例如 simpo orpo sigmoid 这些loss我感觉应该都加一个sft（ce loss）来保证输出不要偏离了

Reproduction

Put your message here.

Others

No response

The text was updated successfully, but these errors were encountered:

liuanping added bug Something isn't working pending This problem is yet to be addressed labels Jan 17, 2025

Repository owner locked and limited conversation to collaborators Jan 17, 2025

hiyouga converted this issue into discussion #6682 Jan 17, 2025

hiyouga added wontfix This will not be worked on and removed bug Something isn't working pending This problem is yet to be addressed labels Jan 17, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

dpo训练出现重复和随机的英文乱码 #6679

dpo训练出现重复和随机的英文乱码 #6679

liuanping commented Jan 17, 2025

This issue was moved to a discussion.

This issue was moved to a discussion.

dpo训练出现重复和随机的英文乱码 #6679

dpo训练出现重复和随机的英文乱码 #6679

Comments

liuanping commented Jan 17, 2025

Reminder

System Info

Reproduction

Others

This issue was moved to a discussion.