Information about hypothesis on Inference #92
-
Hi, Quick questions about the hypothesis you took on inference parameters, especially the quantization. If I am not mistaken, quantization has been set to 4 bits (BnB.4bits I guess), and this value is then applicable to OpenAI, Google, Anthropic... I am wondering why you chose 4 bits ? It is likely that this piece of information is not disclosed by the AI vendors... and I read that 8 bits might be more often used than 4. What do you think ? Thanks in advance, Regards, Damien |
Beta Was this translation helpful? Give feedback.
Replies: 1 comment 2 replies
-
Hello @djuillard, Thanks a lot for the feedback, we really appreciate it! 💚 The 4 bits hypothesis comes from the data extraction we did in early 2024 of the LLM-Perf Leaderboard. To get the full range from ~1B to 70B models, we had to look at the data with 4 bits quantization. So it is not really a choice, but more a constraint from the data source. We plan to integrate more data sources in the near future to better estimate the linear regression model of energy/token function of the model size, including our own experiments. Hopefully, we will be able to conduct more experiments with different sets of optimizations and hardware configurations, thus progressively gaining in precision/robustness. Happy to answer any further questions on the topic! 😉 |
Beta Was this translation helpful? Give feedback.
Hello @djuillard,
Thanks a lot for the feedback, we really appreciate it! 💚
The 4 bits hypothesis comes from the data extraction we did in early 2024 of the LLM-Perf Leaderboard. To get the full range from ~1B to 70B models, we had to look at the data with 4 bits quantization. So it is not really a choice, but more a constraint from the data source.
We plan to integrate more data sources in the near future to better estimate the linear regression model of energy/token function of the model size, including our own experiments. Hopefully, we will be able to conduct more experiments with different sets of optimizations and hardware configurations, thus progressively gaining in precision/robu…