Skip to content

Navigation Menu

Explore
By company size
By use case
By industry
View all solutions
Topics
- AI
- DevOps
- Security
- Software Development
- View all
Explore
- GitHub Sponsors
  Fund open source developers
- The ReadME Project
  GitHub community articles
Repositories
- Enterprise platform
  AI-powered developer platform
Available add-ons
Pricing

Search code, repositories, users, issues, pull requests...

Search

Clear

Search syntax tips

Provide feedback

We read every piece of feedback, and take your input very seriously.

Include my email address so I can be contacted

Saved searches

Use saved searches to filter your results more quickly

Name

Query

To see all available qualifiers, see our documentation.

You signed in with another tab or window. Reload to refresh your session. You signed out in another tab or window. Reload to refresh your session. You switched accounts on another tab or window. Reload to refresh your session.

Dismiss alert

turboderp / exllamav2 Public

Notifications You must be signed in to change notification settings
Fork 282
Star 3.7k

Code
Issues 78
Pull requests 14
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Releases: turboderp/exllamav2

Releases · turboderp/exllamav2

0.2.4

12 Nov 03:21

Compare

Choose a tag to compare

Loading

0.2.4 Latest

Latest

Support Pixtral
Refactoring for more multimodal support
Faster filter evaluation
Various optimizations and bugfixes
Various quality of life improvements

Full Changelog: v0.2.3...v0.2.4

Assets 88

exllamav2-0.2.4+cu117.torch2.0.1-cp310-cp310-linux_x86_64.whl

98.3 MB 2024-11-12T03:52:50Z
exllamav2-0.2.4+cu117.torch2.0.1-cp310-cp310-win_amd64.whl

98.3 MB 2024-11-12T05:05:33Z
exllamav2-0.2.4+cu117.torch2.0.1-cp311-cp311-linux_x86_64.whl

98.3 MB 2024-11-12T04:00:05Z
exllamav2-0.2.4+cu117.torch2.0.1-cp311-cp311-win_amd64.whl

98.3 MB 2024-11-12T05:09:14Z
exllamav2-0.2.4+cu117.torch2.0.1-cp38-cp38-linux_x86_64.whl

98.3 MB 2024-11-12T03:50:52Z
exllamav2-0.2.4+cu117.torch2.0.1-cp38-cp38-win_amd64.whl

98.3 MB 2024-11-12T04:29:32Z
exllamav2-0.2.4+cu117.torch2.0.1-cp39-cp39-linux_x86_64.whl

98.3 MB 2024-11-12T03:52:52Z
exllamav2-0.2.4+cu117.torch2.0.1-cp39-cp39-win_amd64.whl

98.3 MB 2024-11-12T04:35:45Z
exllamav2-0.2.4+cu118.torch2.2.0-cp310-cp310-win_amd64.whl

129 MB 2024-11-12T06:50:19Z
exllamav2-0.2.4+cu118.torch2.2.0-cp311-cp311-win_amd64.whl

129 MB 2024-11-12T06:58:17Z
Source code (zip)

2024-11-12T03:13:33Z
Source code (tar.gz)

2024-11-12T03:13:33Z

firengate, ThomasBaruzier, JoeySalmons, hacksmith-CA, flflow, and Ednaordinary reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

Icemaster-Eric, rwwrwr, firengate, ThomasBaruzier, JoeySalmons, flflow, and Ednaordinary reacted with hooray emoji

firengate, LemgonUltimate, WouterGlorieux, flflow, and Ednaordinary reacted with heart emoji

firengate and flflow reacted with rocket emoji

xonfour and flflow reacted with eyes emoji

All reactions

👍 6 reactions
😄 2 reactions
🎉 7 reactions
❤️ 5 reactions
🚀 2 reactions
👀 2 reactions

11 people reacted

0.2.3

29 Sep 11:04

Compare

Choose a tag to compare

Loading

0.2.3

No longer use safetensors for loading weights (fix virtual memory issues on Windows especially)
Disable fasttensors option (now redundant)
Prioritize HF Tokenizers model when both HF and SPM models available
Add XTC sampler
Add YaRN support
Various fixes and QoL improvements

Full Changelog: v0.2.2...v0.2.3

Assets 70

Loading

firengate, Thireus, and flflow reacted with thumbs up emoji

firengate, Originalimoc, and flflow reacted with laugh emoji

firengate, mamei16, and flflow reacted with hooray emoji

flflow, firengate, MikeLP, LemgonUltimate, Julianlaue, and matthu017 reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 3 reactions
😄 3 reactions
🎉 3 reactions
❤️ 6 reactions
🚀 2 reactions

9 people reacted

0.2.2

14 Sep 19:20

Compare

Choose a tag to compare

Loading

0.2.2

small fixes related to LMFE
allow SDPA during normal inference with custom bias

Full Changelog: v0.2.1...v0.2.2

Assets 69

Loading

firengate, MikeLP, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, gittb, and flflow reacted with hooray emoji

firengate and flflow reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 3 reactions
😄 2 reactions
🎉 3 reactions
❤️ 2 reactions
🚀 2 reactions

4 people reacted

0.2.1

08 Sep 17:26

Compare

Choose a tag to compare

Loading

0.2.1

TP: fallback SDPA mode when flash-attn is unavailable
Faster filter/grammar path
Add DRY
Fix issues since 0.1.9 (streams/graphs) when loading certain models via Tabby
Banish Râul

Full Changelog: v0.2.0...v0.2.1

Assets 68

Loading

firengate and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate and flflow reacted with hooray emoji

ThomasBaruzier, nktice, AgeOfAlgorithms, Icemaster-Eric, firengate, and flflow reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 2 reactions
😄 2 reactions
🎉 2 reactions
❤️ 6 reactions
🚀 2 reactions

6 people reacted

0.2.0

28 Aug 21:00

Compare

Choose a tag to compare

Loading

0.2.0

Small release to fix various issues in 0.1.9

Full Changelog: v0.1.9...v0.2.0

Assets 68

Loading

ColumbusAI, AlanDoesCS, ovowei, mamei16, RichardFevrier, and flflow reacted with heart emoji

All reactions

❤️ 6 reactions

6 people reacted

0.1.9

22 Aug 11:54

Compare

Choose a tag to compare

Loading

0.1.9

Add experimental tensor-parallel mode. Currently supports Llama(1+2+3), Qwen2 and Mistral models
CUDA Graphs to reduce overhead and CPU bottlenecking
Various other optimizations
Some bugfixes

Full Changelog: v0.1.8...v0.1.9

Assets 68

Loading

firengate, Trapper4888, avidwriter, and flflow reacted with thumbs up emoji

firengate, Trapper4888, and flflow reacted with laugh emoji

gittb, RachidAR, firengate, and Trapper4888 reacted with hooray emoji

firengate, Trapper4888, and flflow reacted with heart emoji

LemgonUltimate, firengate, Trapper4888, and flflow reacted with rocket emoji

All reactions

👍 4 reactions
😄 3 reactions
🎉 4 reactions
❤️ 3 reactions
🚀 4 reactions

7 people reacted

0.1.8

24 Jul 06:36

Compare

Choose a tag to compare

Loading

0.1.8

Support Llama 3.1 (correct RoPE scaling etc.)
Support IndexTeam architecture
Some bugfixes and QoL improvements

Full Changelog: v0.1.7...v0.1.8

Assets 68

Loading

firengate and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

GrennKren, nktice, flflow, ccrvlh, mamei16, firengate, pabl-o-ce, and gittb reacted with hooray emoji

flflow, firengate, and pabl-o-ce reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 2 reactions
😄 2 reactions
🎉 8 reactions
❤️ 3 reactions
🚀 2 reactions

8 people reacted

0.1.7

11 Jul 13:20

Compare

Choose a tag to compare

Loading

0.1.7

Support Gemma2
Support InternLM2
Various bugfixes and optimizations

Full Changelog: v0.1.6...v0.1.7

Assets 47

Loading

dancemanUK, pabl-o-ce, jepjoo, firengate, GralchemOz, dillonroach, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, anxiangyipiao, and flflow reacted with hooray emoji

pabl-o-ce, flflow, beep39, gittb, firengate, dillonroach, and Djahal reacted with heart emoji

firengate, dillonroach, and flflow reacted with rocket emoji

All reactions

👍 7 reactions
😄 2 reactions
🎉 3 reactions
❤️ 7 reactions
🚀 3 reactions

11 people reacted

0.1.6

24 Jun 00:36

Compare

Choose a tag to compare

Loading

0.1.6

Fix dynamic generator fallback mode (was broken for prompts longer than max_input_len)
Fix inference on ROCm wave64 devices
Made model conversion script part of exllamav2 package
CPU optimizations

Full Changelog: v0.1.5...v0.1.6

Assets 46

Loading

Thireus, firengate, drxmy, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, RichardFevrier, and flflow reacted with hooray emoji

firengate, RichardFevrier, and flflow reacted with heart emoji

firengate and flflow reacted with rocket emoji

All reactions

👍 4 reactions
😄 2 reactions
🎉 3 reactions
❤️ 3 reactions
🚀 2 reactions

5 people reacted

0.1.5

09 Jun 00:19

Compare

Choose a tag to compare

Loading

0.1.5

Added Q6 and Q8 cache modes
Defragment cache in dynamic generator
Use SDPA with Torch 2.3.0+
Updated wheels to Torch 2.3.1
Added Python 3.12 wheels, plus Python 3.9 for ROCm

Full Changelog: v0.1.4...v0.1.5

Assets 46

Loading

firengate, remichu-ai, and flflow reacted with thumbs up emoji

firengate and flflow reacted with laugh emoji

firengate, RichardFevrier, AgeOfAlgorithms, akaszynski, and flflow reacted with hooray emoji

firengate, epicfilemcnulty, and flflow reacted with heart emoji

firengate, ramzeez88, iamwavecut, and flflow reacted with rocket emoji

All reactions

👍 3 reactions
😄 2 reactions
🎉 5 reactions
❤️ 3 reactions
🚀 4 reactions

9 people reacted

Previous 1 2 3 4 Next

Previous Next

Footer

© 2024 GitHub, Inc.

Footer navigation

Terms
Privacy
Security
Status
Docs
Contact

You can’t perform that action at this time.