Skip to content

Releases: OpenBMB/llama.cpp

b3848

30 Sep 08:26
c919d5d
Compare
Choose a tag to compare
ggml : define missing HWCAP flags (#9684)

ggml-ci

Co-authored-by: Willy Tarreau <[email protected]>

b3669

05 Sep 15:16
4db0478
Compare
Choose a tag to compare
cuda : fix defrag with quantized KV (#9319)

b3662

04 Sep 04:31
7605ae7
Compare
Choose a tag to compare
flake.lock: Update (#9261)

Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/8471fe90ad337a8074e957b69ca4d0089218391d?narHash=sha256-XOQkdLafnb/p9ij77byFQjDf5m5QYl9b2REiVClC%2Bx4%3D' (2024-08-01)
  → 'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/c374d94f1536013ca8e92341b540eba4c22f9c62?narHash=sha256-Z/ELQhrSd7bMzTO8r7NZgi9g5emh%2BaRKoCdaAv5fiO0%3D' (2024-08-21)
  → 'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

b3660

03 Sep 08:16
b69a480
Compare
Choose a tag to compare
readme : refactor API section + remove old hot topics

b3645

30 Aug 08:08
7ea8d80
Compare
Choose a tag to compare
llava : the function "clip" should be int (#9237)

b3615

22 Aug 10:22
1731d42
Compare
Choose a tag to compare
[SYCL] Add oneDNN primitive support (#9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

b3621

10 Aug 09:43
fc1c860
Compare
Choose a tag to compare
Merge branch 'prepare-PR-of-minicpm-v2.6' into master

b3209

24 Jun 04:04
95f57bb
Compare
Choose a tag to compare
ggml : remove ggml_task_type and GGML_PERF (#8017)

* ggml : remove ggml_task_type and GGML_PERF

* check abort_callback on main thread only

* vulkan : remove usage of ggml_compute_params

* remove LLAMA_PERF

b3078

04 Jun 07:37
bde7cd3
Compare
Choose a tag to compare
llama : offload to RPC in addition to other backends (#7640)

* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

b3026

28 May 20:05
5442939
Compare
Choose a tag to compare
llama : support small Granite models (#7481)

* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <[email protected]>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <[email protected]>

---------

Signed-off-by: Giuseppe Scrivano <[email protected]>
Co-authored-by: Steffen Roecker <[email protected]>