Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

dpo训练出现重复和随机的英文乱码 #6679

Closed
1 task done
liuanping opened this issue Jan 17, 2025 · 0 comments
Closed
1 task done

dpo训练出现重复和随机的英文乱码 #6679

liuanping opened this issue Jan 17, 2025 · 0 comments
Labels
wontfix This will not be worked on

Comments

@liuanping
Copy link

Reminder

  • I have read the above rules and searched the existing issues.

System Info

利用模型生成的多次输出,并基于大模型完成优和差的筛选对比构建了dpo训练数据,训练后发现出现了重复(特别是训练数据出现的重复更多),看了看loss感觉不能保证模型安按照target进行输出。是否应该改进下目前的loss 例如 simpo orpo sigmoid 这些loss我感觉应该都加一个sft(ce loss)来保证输出不要偏离了

Reproduction

Put your message here.

Others

No response

@liuanping liuanping added bug Something isn't working pending This problem is yet to be addressed labels Jan 17, 2025
Repository owner locked and limited conversation to collaborators Jan 17, 2025
@hiyouga hiyouga converted this issue into discussion #6682 Jan 17, 2025
@hiyouga hiyouga added wontfix This will not be worked on and removed bug Something isn't working pending This problem is yet to be addressed labels Jan 17, 2025

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
wontfix This will not be worked on
Projects
None yet
Development

No branches or pull requests

2 participants