2.2.0

Latest

Latest

ngxson released this 08 Feb 23:21

d72123c

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

BIG release is dropped! Biggest changes including:

x2 speed for Qx_K and Qx_0 quantization 🚀 ref this PR: ggml-org/llama.cpp#11453 (while it's not merged yet on upstream, I included it inside wllama as a patch) - IQx quants will still be slow, but upcoming work is already planned
Switched to binary protocol for the connection between JS <==> WASM. The json.hpp dependency is now gone! Calling wllama.tokenize() on a long text now faster than ever! 🎉

Debut at FOSDEM 2025

Last week, I gave a 15-minute talk at FOSDEM 2025 which, for the first time, introduces wllama to the real world!

Watch the talk here: https://fosdem.org/2025/schedule/event/fosdem-2025-5154-wllama-bringing-llama-cpp-to-the-web/

What's Changed

add benchmark function, used internally by @ngxson in #151
switch to binary protocol between JS and WASM world (glue.cpp) by @ngxson in #154
Remove json.hpp dependency by @ngxson in #155
temporary apply that viral x2 speedup PR by @ngxson in #156
Fix a bug with kv_remove, release v2.2.0 by @ngxson in #157

Full Changelog: 2.1.3...2.2.0

Contributors

ngxson

Assets 2