Releases · OpenBMB/llama.cpp

30 Sep 08:26

c919d5d

b3848 Latest

Latest

ggml : define missing HWCAP flags (#9684)

ggml-ci

Co-authored-by: Willy Tarreau <[email protected]>

Assets 22

cudart-llama-bin-win-cu11.7.1-x64.zip

293 MB 2024-09-30T08:26:51Z
cudart-llama-bin-win-cu12.2.0-x64.zip

413 MB 2024-09-30T08:26:57Z
llama-b1-bin-win-hip-x64-gfx1030.zip

236 MB 2024-09-30T08:27:06Z
llama-b1-bin-win-hip-x64-gfx1100.zip

238 MB 2024-09-30T08:27:10Z
llama-b1-bin-win-hip-x64-gfx1101.zip

238 MB 2024-09-30T08:27:16Z
llama-b3848-bin-macos-arm64.zip

55.8 MB 2024-09-30T08:27:21Z
llama-b3848-bin-macos-x64.zip

55.8 MB 2024-09-30T08:27:23Z
llama-b3848-bin-ubuntu-x64.zip

60.7 MB 2024-09-30T08:27:24Z
llama-b3848-bin-win-avx-x64.zip

8.04 MB 2024-09-30T08:27:26Z
llama-b3848-bin-win-avx2-x64.zip

8.04 MB 2024-09-30T08:27:26Z
Source code (zip)

2024-09-29T18:18:23Z
Source code (tar.gz)

2024-09-29T18:18:23Z

05 Sep 15:16

github-actions

b3669

4db0478

b3669

cuda : fix defrag with quantized KV (#9319)

Assets 19

04 Sep 04:31

github-actions

b3662

7605ae7

b3662

flake.lock: Update (#9261)

Flake lock file updates:

• Updated input 'flake-parts':
    'github:hercules-ci/flake-parts/8471fe90ad337a8074e957b69ca4d0089218391d?narHash=sha256-XOQkdLafnb/p9ij77byFQjDf5m5QYl9b2REiVClC%2Bx4%3D' (2024-08-01)
  → 'github:hercules-ci/flake-parts/af510d4a62d071ea13925ce41c95e3dec816c01d?narHash=sha256-ODYRm8zHfLTH3soTFWE452ydPYz2iTvr9T8ftDMUQ3E%3D' (2024-08-30)
• Updated input 'nixpkgs':
    'github:NixOS/nixpkgs/c374d94f1536013ca8e92341b540eba4c22f9c62?narHash=sha256-Z/ELQhrSd7bMzTO8r7NZgi9g5emh%2BaRKoCdaAv5fiO0%3D' (2024-08-21)
  → 'github:NixOS/nixpkgs/71e91c409d1e654808b2621f28a327acfdad8dc2?narHash=sha256-GnR7/ibgIH1vhoy8cYdmXE6iyZqKqFxQSVkFgosBh6w%3D' (2024-08-28)

Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Assets 19

03 Sep 08:16

github-actions

b3660

b69a480

b3660

readme : refactor API section + remove old hot topics

Assets 19

30 Aug 08:08

github-actions

b3645

7ea8d80

b3645

llava : the function "clip" should be int (#9237)

Assets 19

22 Aug 10:22

github-actions

b3615

1731d42

b3615

[SYCL] Add oneDNN primitive support (#9091)

* add onednn

* add sycl_f16

* add dnnl stream

* add engine map

* use dnnl for intel only

* use fp16fp16fp16

* update doc

Assets 19

10 Aug 09:43

github-actions

b3621

fc1c860

b3621

Merge branch 'prepare-PR-of-minicpm-v2.6' into master

Assets 20

24 Jun 04:04

github-actions

b3209

95f57bb

b3209

ggml : remove ggml_task_type and GGML_PERF (#8017)

* ggml : remove ggml_task_type and GGML_PERF

* check abort_callback on main thread only

* vulkan : remove usage of ggml_compute_params

* remove LLAMA_PERF

Assets 20

04 Jun 07:37

github-actions

b3078

bde7cd3

b3078

llama : offload to RPC in addition to other backends (#7640)

* llama : offload to RPC in addition to other backends

* - fix copy_tensor being called on the src buffer instead of the dst buffer

- always initialize views in the view_src buffer

- add RPC backend to Makefile build

- add endpoint to all RPC object names

* add rpc-server to Makefile

* Update llama.cpp

Co-authored-by: slaren <[email protected]>

---------

Co-authored-by: slaren <[email protected]>

Assets 21

28 May 20:05

github-actions

b3026

5442939

b3026

llama : support small Granite models (#7481)

* Add optional MLP bias for Granite models

Add optional MLP bias for ARCH_LLAMA to support Granite models.
Partially addresses ggerganov/llama.cpp/issues/7116
Still needs some more changes to properly support Granite.

* llama: honor add_space_prefix from the model configuration

propagate the add_space_prefix configuration from the HF model
configuration to the gguf file and honor it with the gpt2 tokenizer.

Signed-off-by: Giuseppe Scrivano <[email protected]>

* llama: add support for small granite models

it works only for the small models 3b and 8b.

The convert-hf-to-gguf.py script uses the vocabulary size of the
granite models to detect granite and set the correct configuration.

Signed-off-by: Giuseppe Scrivano <[email protected]>

---------

Signed-off-by: Giuseppe Scrivano <[email protected]>
Co-authored-by: Steffen Roecker <[email protected]>

Assets 21

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: OpenBMB/llama.cpp

b3848

b3669

b3662

b3660

b3645

b3615

b3621

b3209

b3078

b3026