You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
i want to distill qwen2.5 7b to qwen2.5 3b. but they have different vocab size. now i choice to crop tensor (teacher_logits[:,:151936]). is there better way to solve the issue?
thanks
The text was updated successfully, but these errors were encountered:
Hi @whk6688 thanks for creating the issue. In general it's recommended to distill model with only the same vocab sizes, however in this case I believe you may be able to get away with the approach you're suggesting. You can see the following excerpt from this comment:
You can always resize the embedding, as the index over 151646 is meaningless.
So based on that I think your proposed approach should work.
i want to distill qwen2.5 7b to qwen2.5 3b. but they have different vocab size. now i choice to crop tensor (teacher_logits[:,:151936]). is there better way to solve the issue?
thanks
The text was updated successfully, but these errors were encountered: