Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

【not bug】how to distill qwen2.5 7b to qwen2.5 3b. #2416

Open
whk6688 opened this issue Feb 20, 2025 · 1 comment
Open

【not bug】how to distill qwen2.5 7b to qwen2.5 3b. #2416

whk6688 opened this issue Feb 20, 2025 · 1 comment
Assignees
Labels
triaged This issue has been assigned an owner and appropriate label

Comments

@whk6688
Copy link

whk6688 commented Feb 20, 2025

i want to distill qwen2.5 7b to qwen2.5 3b. but they have different vocab size. now i choice to crop tensor (teacher_logits[:,:151936]). is there better way to solve the issue?

thanks

@ebsmothers
Copy link
Contributor

Hi @whk6688 thanks for creating the issue. In general it's recommended to distill model with only the same vocab sizes, however in this case I believe you may be able to get away with the approach you're suggesting. You can see the following excerpt from this comment:

You can always resize the embedding, as the index over 151646 is meaningless.

So based on that I think your proposed approach should work.

@joecummings joecummings added the triaged This issue has been assigned an owner and appropriate label label Feb 25, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
triaged This issue has been assigned an owner and appropriate label
Projects
None yet
Development

No branches or pull requests

3 participants