Skip to content

2.2.0

Latest
Compare
Choose a tag to compare
@ngxson ngxson released this 08 Feb 23:21
d72123c

v2.2.0 - x2 speed for Qx_K and Qx_0 quantization

BIG release is dropped! Biggest changes including:

  • x2 speed for Qx_K and Qx_0 quantization 🚀 ref this PR: ggml-org/llama.cpp#11453 (while it's not merged yet on upstream, I included it inside wllama as a patch) - IQx quants will still be slow, but upcoming work is already planned
  • Switched to binary protocol for the connection between JS <==> WASM. The json.hpp dependency is now gone! Calling wllama.tokenize() on a long text now faster than ever! 🎉

Debut at FOSDEM 2025

Last week, I gave a 15-minute talk at FOSDEM 2025 which, for the first time, introduces wllama to the real world!

Watch the talk here: https://fosdem.org/2025/schedule/event/fosdem-2025-5154-wllama-bringing-llama-cpp-to-the-web/

image

What's Changed

  • add benchmark function, used internally by @ngxson in #151
  • switch to binary protocol between JS and WASM world (glue.cpp) by @ngxson in #154
  • Remove json.hpp dependency by @ngxson in #155
  • temporary apply that viral x2 speedup PR by @ngxson in #156
  • Fix a bug with kv_remove, release v2.2.0 by @ngxson in #157

Full Changelog: 2.1.3...2.2.0