Activity
Alternative implementation using ggml_flash_attn (~16% slower than fa…
Alternative implementation using ggml_flash_attn (~16% slower than fa…
Removed unnecessary reshapes when retrieving kv from cache
Removed unnecessary reshapes when retrieving kv from cache
Merge remote-tracking branch 'origin/pr/224' into falcon40b
Merge remote-tracking branch 'origin/pr/224' into falcon40b
Merge branch 'master' into falcon40b
Merge branch 'master' into falcon40b
Fixed regression because of incorrect ctx_size calculation
Fixed regression because of incorrect ctx_size calculation
Added Kerfuffle's magic context size fix
Added Kerfuffle's magic context size fix
Added rearrange of qkv weight memory layout to convert-hf-to-ggml.py …
Added rearrange of qkv weight memory layout to convert-hf-to-ggml.py …
Fixed offset calculation bug during extraction of query vectors
Fixed offset calculation bug during extraction of query vectors
Fixed quantized version not working due to wrong data type
Fixed quantized version not working due to wrong data type
Updated falcon-quantize to match 7B/40B format produced by convert-hf…
Updated falcon-quantize to match 7B/40B format produced by convert-hf…
Version which exactly reproduces outputs of the Python implementation…
Version which exactly reproduces outputs of the Python implementation…
Experimental support for Falcon-40B (and Falcon-7B); breaks 7B GGML c…
Experimental support for Falcon-40B (and Falcon-7B); breaks 7B GGML c…
Added mention of missing ALiBi to README.md
Added mention of missing ALiBi to README.md
Fixed obvious typo in layer mapping
Fixed obvious typo in layer mapping
Fixed syntax error, added comment about ALiBi
Fixed syntax error, added comment about ALiBi