-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support for Qwen models #21
Comments
Hi @gsarti, I had the same thought and had a go at it today. I mostly just adapted the existing LLama implementation. I am not sure if it is working correctly though as the heatmap from the example doesn't seem to be showing very relevant tokens (see attached pdf). @rachtibat if you get some time it would be great if you could see if this looks like how you might approach adding the Qwen 2 models :). R1-Distill-Qwen-1.5B_heatmap.pdf My Fork: |
Hey guys, thank you alot @samlewislim for providing the implementation! The heatmap seems off? It is not looking at the correct token '320', right? Good news! For this, I'd like to add your contribution @samlewislim first of course! (: |
Great news @rachtibat, looking forward to the new implementation! |
Hey both, That is great news, looking forward to the new implementation too! Yeah I thought it seemed off as it wasn't looking at the '320' token and other models of similar size (e.g. LLama 3.2 1B or tiny llama) seemed to look at this token. But this may just be because of model difference?? @rachtibat would you like me to create a PR for Qwen? |
@samlewislim I think, there might be an error in the Qwen implementation.
When it works, you can create a pull-request. Best greetings |
@rachtibat thank you so much for taking a look! I have made those fixes and just made a PR. The heatmap is also looking correct now. |
Hi @rachtibat,
I was wondering if you'd consider adding support for Qwen 2 models in LXT!
Judging by the modular implementation in HF, the differences should be minimal (no bias in MLP, sliding window attention). It would be very cool to have this, as it would enable attribution on the capable Qwen 2.5 family and the R1 reasoning models derived from it!
The text was updated successfully, but these errors were encountered: