Releases · oobabooga/text-generation-webui

01 Oct 17:48

oobabooga

v1.15

3b06cb4

v1.15 Latest

Latest

Backend updates

Transformers: bump to 4.45.
ExLlamaV2: bump to 0.2.3.
- ExllamaV2 tensor parallelism to increase multi gpu inference speeds (#6356). Thanks @RandomInternetPreson.
flash-attention: bump to 2.6.3.
llama-cpp-python: bump to 0.3.1.
bitsandbytes: bump to 0.44.
PyTorch: bump to 2.4.1.
ROCm: bump wheels to 6.1.2.
Remove AutoAWQ, AutoGPTQ, HQQ, and AQLM from requirements.txt:
- AutoAWQ and AutoGPTQ were removed due to lack of support for PyTorch 2.4.1 and CUDA 12.1.
- HQQ and AQLM were removed to make the project leaner since they're experimental with limited use.
- You can still install those libraries manually if you are interested.

Changes

Exclude Top Choices (XTC): A sampler that boosts creativity, breaks writing clichés, and inhibits non-verbatim repetition (#6335). Thanks @p-e-w.
Make it possible to sort repetition penalties with "Sampler priority". The new keywords are:
- repetition_penalty
- presence_penalty
- frequency_penalty
- dry
- encoder_repetition_penalty
- no_repeat_ngram
- xtc (not a repetition penalty but also added in this update)
Don't import PEFT unless necessary. This makes the web UI launch faster.
Add beforeunload event to add confirmation dialog when leaving page (#6279). Thanks @leszekhanusz.
update API documentation with examples to list/load models (#5902). Thanks @joachimchauvet.
Training pro update script.py (#6359). Thanks @FartyPants.

Bug fixes

Fix UnicodeDecodeError for BPE-based Models (especially GLM-4) (#6357). Thanks @GralchemOz.
API: Relax multimodal format, fixes HuggingFace Chat UI (#6353). Thanks @Papierkorb.
Force /bin/bash shell for conda (#6386). Thanks @Thireus.
Do not set value for histories in chat when --multi-user is used (#6317). Thanks @mashb1t.
typo in OpenAI response format (#6365). Thanks @jsboige.

Contributors

Thireus, Papierkorb, and 8 other contributors

Assets 2

20 Aug 04:29

oobabooga

v1.14

073694b

v1.14

Backend updates

llama-cpp-python: bump to 0.2.89.
Transformers: bump to 4.44.

Other changes

Model downloader: use a single session for all downloaded files to reduce the time to start each download.
Add a --tokenizer-dir flag to be used with llamacpp_HF.

Assets 2

01 Aug 05:28

oobabooga

v1.13

d011040

v1.13

Backend updates

llama-cpp-python: bump to 0.2.85 (adds Llama 3.1 support).

UI updates

Make compress_pos_emb float (#6276). Thanks @hocjordan.
Make n_ctx, max_seq_len, and truncation_length numbers rather than sliders, to make it possible to type the context length manually.
Improve the style of headings in chat messages.
LaTeX rendering:
- Add back single $ for inline equations.
- Fix rendering for equations enclosed between \[ and \].
- Fix rendering for multiline equations.

Bug fixes

Fix saving characters through the UI.
Fix instruct mode displaying "quotes" as ""double quotes"".
Fix chat sometimes not scrolling down after sending a message.
Fix the chat "stop" event.
Make --idle-timeout work for API requests.

Other changes

Model downloader: improve the progress bar by adding the filename, size, and download speed for each downloaded file.
Better handle the Llama 3.1 Jinja2 template by not including its optional "tools" headers.

Contributors

hocjordan

Assets 2

25 Jul 15:19

oobabooga

v1.12

dd97a83

v1.12

Backend updates

Transformers: bump to 4.43 (adds Llama 3.1 support).
ExLlamaV2: bump to 0.1.8 (adds Llama 3.1 support).
AutoAWQ: bump to 0.2.6 (adds Llama 3.1 support).
- Remove AutoAWQ as a standalone loader. I found that hugging-quants/Meta-Llama-3.1-70B-Instruct-AWQ-INT4 works better when loaded directly through Transformers, and that's what the README recommends. AutoAWQ is still used in the background.

UI updates

Make text between quote characters colored in chat and chat-instruct modes.
Prevent LaTeX from being rendered for inline "$", as that caused problems for phrases like "apples cost $1, oranges cost $2".
Make the markdown cache infinite and clear it when switching to another chat. This cache exists because the markdown conversion is CPU-intensive. By making it infinite, messages in a full 128k context will be cached, making the UI more responsive for long conversations.

Bug fixes

Fix a race condition that caused the default character to not be loaded correctly on startup.
Fix Linux shebangs (#6110). Thanks @LuNeder.

Other changes

Make the Google Colab notebook use the one-click installer instead of its own Python environment for better stability.
Disable flash-attention on Google Colab by default, as its GPU models do not support it.

Contributors

LuNeder

Assets 2

23 Jul 05:34

oobabooga

v1.11

d1115f1

v1.11

UI updates

Optimize the UI: events triggered by clicking on buttons, selecting values from dropdown menus, etc have been refactored to minimize the number of connections made between the UI and the server. As a result, the UI is now significantly faster and more responsive.
Use chat-instruct mode by default: most models nowadays are instruction-following models, and this mode automatically uses the model's Jinja2 template to generate the prompt, leading to higher-quality outputs.
Improve the style of code blocks in light mode.
Increase the font weight of chat messages (for chat and chat-instruct modes).
Use gr.Number for RoPE scaling parameters (#6233). Thanks @Vhallo.
Don't export the instruction template to settings.yaml on "Save UI defaults to settings.yaml" (it gets ignored and replaced with the model template).

Backend updates

llama-cpp-python: bump to 0.2.83 (adds Mistral-Nemo support).

Other changes

training: Added ChatML-format.json format example (#5899). Thanks @FartyPants.
Customize the subpath for gradio, use with reverse proxy (#5106). Thanks @canoalberto.

Bug fixes

Fix an issue where the chat contents sometimes disappear for a split second during streaming (#6247). Thanks @Patronics.
Fix the chat UI losing its vertical scrolling position when the input area grows to more than 1 line.

Contributors

Patronics, FartyPants, and 2 other contributors

Assets 2

13 Jul 17:56

oobabooga

v1.10.1

0315122

v1.10.1

Library updates

FlashAttention: bump to v2.6.1. Now Gemma-2 works in ExLlamaV2 with FlashAttention without any quality loss.

Bug fixes

Fix for MacOS users encountering model load errors with llama.cpp (#6227). Thanks @InvectorGator.

Contributors

InvectorGator

Assets 2

11 Jul 23:43

oobabooga

v1.10

d01c68f

v1.10

Library updates

llama-cpp-python: bump to 0.2.82.
ExLlamaV2: bump to 0.1.7 (adds Gemma-2 support).

Changes

Add new --no_xformers and --no_sdpa flags for ExLlamaV2.
- Note: to use Gemma-2 with ExLlamaV2, you currently must use the --no_flash_attn --no_xformers --no_sdpa flags, or check the corresponding checkboxes in the UI before loading the model, otherwise it will perform very badly.
Minor UI updates.

Assets 2

05 Jul 10:38

oobabooga

v1.9.1

e813b32

v1.9.1

Bug fixes

UI: Fix some broken chat histories not showing in the "Past chats" menu.
Prevent llama.cpp from being monkey patched more than once, avoiding an infinite recursion error.

Assets 2

05 Jul 03:24

oobabooga

v1.9

3315d00

v1.9

Backend updates

4-bit and 8-bit kv cache options have been added to llama.cpp and llamacpp_HF. They reuse the existing --cache_8bit and --cache_4bit flags. Thanks @GodEmperor785 for figuring out what values to pass to llama-cpp-python.
Transformers:
- Add eager attention option to make Gemma-2 work correctly (#6188). Thanks @GralchemOz.
- Automatically detect bfloat16/float16 precision when loading models in 16-bit precision.
- Automatically apply eager attention to models with Gemma2ForCausalLM architecture.
- Gemma-2 support: Automatically detect and apply the optimal settings for this model with the two changes above. No need to set --bf16 --use_eager_attention manually.
Automatically obtain the EOT token from Jinja2 templates and add it to the stopping strings, fixing Llama-3-Instruct not stopping. No need to add <eot> to the custom stopping strings anymore.

UI updates

Whisper STT overhaul: this extension has been rewritten, replacing the Gradio microphone component with a custom microphone element that is much more reliable (#6194). Thanks @RandomInternetPreson, @TimStrauven, and @mamei16.
Make the character dropdown menu coexist in the "Chat" tab and the "Parameters > Character" tab, after some people pointed out that moving it entirely to the Chat tab makes it harder to edit characters.
Colors in the light theme have been improved, making it a bit more aesthetic.
Increase the chat area on mobile devices.

Bug fixes

Fix the API request to AUTOMATIC1111 in the sd-api-pictures extension.
Fix a glitch when switching tabs with "Show controls" unchecked in the chat tab and extensions loaded.

Library updates

llama-cpp-python: bump to 0.2.81 (adds Gemma-2 support).
Transformers: bump to 4.42 (adds Gemma-2 support).

Support

GitHub Sponsors: https://github.com/sponsors/oobabooga
ko-fi: https://ko-fi.com/oobabooga

Contributors

RandomInternetPreson, mamei16, and 3 other contributors

Assets 2

27 Jun 02:38

oobabooga

v1.8

6915c50

v1.8

Releases with version numbers are back! The last one was v1.7 in October 8th, 2023, so I am calling this one v1.8.

From this release on, it will be possible to install past releases by downloading the .zip source and running the start_ script in it. The installation script no longer updates to the latest version automatically. This doesn't apply to snapshots/releases before this one.

New backend

Add TensorRT-LLM support.
- That's now the fastest backend in the project.
- It currently has to be installed in a separate Python 3.10 environment.
- A Dockerfile is provided.
- For instructions on how to convert models, consult #5715 and https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md.

UI updates

Improved "past chats" menu: this menu is now a vertical list of text items instead of a dropdown menu, making it a lot easier to switch between past conversations. Only one click is required instead of two.
Store the chat history in the browser: if you restart the server and do not refresh the browser, your conversation will not be accidentally erased anymore.
Avoid some unnecessary calls to the backend, making the UI faster and more responsive.
Move the "Character" droprown menu to the main Chat tab, to make it faster to switch between different characters.
Change limits of RoPE scaling sliders in UI (#6142). Thanks @GodEmperor785.
Do not expose "alpha_value" for llama.cpp and "rope_freq_base" for transformers to keep things simple and avoid conversions.
Remove an obsolete info message intended for GPTQ-for-LLaMa.
Remove the "Tab" shortcut to switch between the generation tabs and the "Parameter" tabs, as it was awkward.
Improved streaming of lists, which would flicker and temporarily display horizontal lines sometimes.

Bug fixes

Revert the reentrant generation lock to a simple lock, fixing an issue caused by the change.
Fix GGUFs with no BOS token present, mainly qwen2 models. (#6119). Thanks @Ph0rk0z.
Fix "500 error" issue caused by block_requests.py (#5976). Thanks @nero-dv.
Setting default alpha_value and fixing loading some newer DeepSeekCoder GGUFs (#6111). Thanks @mefich.

Library updates

llama-cpp-python: bump to 0.2.79 (after a month of wrestling with GitHub Actions).
ExLlamaV2: bump to 0.1.6.
flash-attention: bump to 2.5.9.post1.
PyTorch: bump to 2.2.2. That's the last 2.2 patch version.
HQQ: bump to 0.1.7.post3. Makes HQQ functional again.

Other updates

Do not "git pull" during installation, allowing previous releases (from this one on) to be installed.
Make logs more readable, no more \u7f16\u7801 (#6127). Thanks @Touch-Night.

Support this project

Become a GitHub Sponsor ❤️
Buy me a ko-fi ☕

Contributors

mefich, GodEmperor785, and 3 other contributors

Assets 2

Releases: oobabooga/text-generation-webui

v1.15

Backend updates

Changes

Bug fixes

Contributors

v1.14

Backend updates

Other changes

v1.13

Backend updates

UI updates

Bug fixes

Other changes

Contributors

v1.12

Backend updates

UI updates

Bug fixes

Other changes

Contributors

v1.11

UI updates

Backend updates

Other changes

Bug fixes

Contributors

v1.10.1

Library updates

Bug fixes

Contributors

v1.10

Library updates

Changes

v1.9.1

Bug fixes

v1.9

Backend updates

UI updates

Bug fixes

Library updates

Support

Contributors

v1.8

New backend

UI updates

Bug fixes

Library updates

Other updates

Support this project

Contributors