-
Notifications
You must be signed in to change notification settings - Fork 9.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Refactor/tinyblas #10343
base: master
Are you sure you want to change the base?
Refactor/tinyblas #10343
Conversation
I am not convinced that this is a good idea. The maintenance cost of keeping tinyblas in the CPU backend is effectively negligible, however, moving it to a separate backend has a significant cost:
I believe we should go the other way and instead remove the AMX backend and add its code to the CPU backend. At the time the AMX backend was created this was not feasible due to the weight repacking that it does, but this is no longer a problem and can be implemented in a similar way to the aarch64 online repacking. |
fd08ab8
to
a3822fb
Compare
That the main point...
have a mega CPU backend that can use integrated accelerator will make the CPU backend a lot more complicated. So, for me if we do not allow for a backend to share OP with the CPU backend, we need to add something new, like allow to register OP in the CPU backend.
If we add the possibility to register OP, this is not needed, we can build all possible version, and use the best possible at init time. llamafile does it in a way, we can use gcc target, Or create a more featuring register service.
If online repacking is nice, it would be better to have static repacking for the weight. 😎 Some benchmark with this backend on AMD Ryzen 9 7940HS with Mistral-7B-Instruct-v0.3
|
This makes a lot of sense. Will close the #10183 issue and create a new one to track the AMX backend integration in the CPU backend. |
This is a sample of create a full backend for "LLAMAFILE". #10183
TODO:
Note: it is possible to split the sgemm.cpp (float / QN_0 / x86 ...)
I did not see slow down on my AMD Ryzen 9 7940HS (zen4).