Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

请问咱们数据集 增量预训练代码以及sft代码都没有开源吗? #20

Open
Maekfei opened this issue Sep 8, 2024 · 2 comments

Comments

@Maekfei
Copy link

Maekfei commented Sep 8, 2024

No description provided.

@Furyton
Copy link
Member

Furyton commented Sep 18, 2024

您好,
我们已经将训练使用的公开数据集和自主构建的数据集整理上传到了 HuggingFace/SDUIRLab魔搭社区。您可以参考这两个链接下载数据集,具体的训练代码和数据使用方法也可以参考 LLaMA-Factory,我们的数据集格式可以直接应用于该框架。

@milktean
Copy link

请问中文无监督司法语料这部分已经开源了吗,我看数据集介绍里面没有提到

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants