v2.2.0 - x2 speed for Qx_K and Qx_0 quantization
BIG release is dropped! Biggest changes including:
- x2 speed for Qx_K and Qx_0 quantization 🚀 ref this PR: ggml-org/llama.cpp#11453 (while it's not merged yet on upstream, I included it inside wllama as a patch) - IQx quants will still be slow, but upcoming work is already planned
- Switched to binary protocol for the connection between JS <==> WASM. The
json.hpp
dependency is now gone! Callingwllama.tokenize()
on a long text now faster than ever! 🎉
Debut at FOSDEM 2025
Last week, I gave a 15-minute talk at FOSDEM 2025 which, for the first time, introduces wllama to the real world!
Watch the talk here: https://fosdem.org/2025/schedule/event/fosdem-2025-5154-wllama-bringing-llama-cpp-to-the-web/
What's Changed
- add benchmark function, used internally by @ngxson in #151
- switch to binary protocol between JS and WASM world (glue.cpp) by @ngxson in #154
- Remove json.hpp dependency by @ngxson in #155
- temporary apply that viral x2 speedup PR by @ngxson in #156
- Fix a bug with kv_remove, release v2.2.0 by @ngxson in #157
Full Changelog: 2.1.3...2.2.0