You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, opening an issue as I can't PM you on twitter.
First: I think the project is really cool.
Second: I just larp as a redteamer and recently started working on an LLM project(tldw), so by all means, I am far from being knowledgeable.
Also, why Q4 quant instead of a Q8 quant? Since you're using a 7B model, the quantization has a much larger effect on the reasoning capability of the model, than it would if it were say a 70B.
Paper discussing the impact of quantization on Llama3: https://arxiv.org/abs/2404.14047
Granted, you're using Mistral, but still, according to your blogpost(don't have the URL handy) if I remember correctly, it said that you also finetuned Mistral at Q4 quantization, and (again, talking out of my ass) I would assume you would suffer similar issues with that level of quantization with Mistral as you would Llama3.
Pointing this out in case it wasn't known at the time, or if it was, you might be able to help point me to some new info.
Thanks again for the project, I think it's pretty neat.
The text was updated successfully, but these errors were encountered:
Hi, opening an issue as I can't PM you on twitter.
First: I think the project is really cool.
Second: I just larp as a redteamer and recently started working on an LLM project(tldw), so by all means, I am far from being knowledgeable.
That being said, why ollama instead of plain llama.cpp or llamafile? https://github.com/Mozilla-Ocho/llamafile
Also, why Q4 quant instead of a Q8 quant? Since you're using a 7B model, the quantization has a much larger effect on the reasoning capability of the model, than it would if it were say a 70B.
Paper discussing the impact of quantization on Llama3: https://arxiv.org/abs/2404.14047
Granted, you're using Mistral, but still, according to your blogpost(don't have the URL handy) if I remember correctly, it said that you also finetuned Mistral at Q4 quantization, and (again, talking out of my ass) I would assume you would suffer similar issues with that level of quantization with Mistral as you would Llama3.
Pointing this out in case it wasn't known at the time, or if it was, you might be able to help point me to some new info.
Thanks again for the project, I think it's pretty neat.
The text was updated successfully, but these errors were encountered: