We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
hi, 我想基于qwen2.5-72b模型进行继续训练,请教一下训练参数的经验 1、什么样的数据量适合只训练sft,什么样的数据量适合continue pretrain 2、想达到一个较优的mfu,并行设置和数据相关的参数有什么推荐呀(在计算资源够用的情况下)
最近老板给了一个任务训练垂类数据,关于是否要进行continue pretrain还是直接sft,以及训练如何达到最好的mfu,这些初次接触还是有些不太好掌握,请各位大佬指点帮助,非常感谢~
The text was updated successfully, but these errors were encountered:
No branches or pull requests
hi,
我想基于qwen2.5-72b模型进行继续训练,请教一下训练参数的经验
1、什么样的数据量适合只训练sft,什么样的数据量适合continue pretrain
2、想达到一个较优的mfu,并行设置和数据相关的参数有什么推荐呀(在计算资源够用的情况下)
最近老板给了一个任务训练垂类数据,关于是否要进行continue pretrain还是直接sft,以及训练如何达到最好的mfu,这些初次接触还是有些不太好掌握,请各位大佬指点帮助,非常感谢~
The text was updated successfully, but these errors were encountered: