👋🤗🤗👋 Join our WeChat.
中文 | English
LLamaTuner is an efficient, flexible and full-featured toolkit for fine-tuning LLM (Llama3, Phi3, Qwen, Mistral, ...)
Efficient
- Support LLM, VLM pre-training / fine-tuning on almost all GPUs. LLamaTuner is capable of fine-tuning 7B LLM on a single 8GB GPU, as well as multi-node fine-tuning of models exceeding 70B.
- Automatically dispatch high-performance operators such as FlashAttention and Triton kernels to increase training throughput.
- Compatible with DeepSpeed 🚀, easily utilizing a variety of ZeRO optimization techniques.
Flexible
- Support various LLMs (Llama 3, Mixtral, Llama 2, ChatGLM, Qwen, Baichuan, ...).
- Support VLM (LLaVA).
- Well-designed data pipeline, accommodating datasets in any format, including but not limited to open-source and custom formats.
- Support various training algorithms (QLoRA, LoRA, full-parameter fune-tune), allowing users to choose the most suitable solution for their requirements.
Full-featured
- Support continuous pre-training, instruction fine-tuning, and agent fine-tuning.
- Support chatting with large models with pre-defined templates.
Model | Model size | Default module | Template |
---|---|---|---|
Baichuan | 7B/13B | W_pack | baichuan |
Baichuan2 | 7B/13B | W_pack | baichuan2 |
BLOOM | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - |
BLOOMZ | 560M/1.1B/1.7B/3B/7.1B/176B | query_key_value | - |
ChatGLM3 | 6B | query_key_value | chatglm3 |
Command-R | 35B/104B | q_proj,v_proj | cohere |
DeepSeek (MoE) | 7B/16B/67B/236B | q_proj,v_proj | deepseek |
Falcon | 7B/11B/40B/180B | query_key_value | falcon |
Gemma/CodeGemma | 2B/7B | q_proj,v_proj | gemma |
InternLM2 | 7B/20B | wqkv | intern2 |
LLaMA | 7B/13B/33B/65B | q_proj,v_proj | - |
LLaMA-2 | 7B/13B/70B | q_proj,v_proj | llama2 |
LLaMA-3 | 8B/70B | q_proj,v_proj | llama3 |
LLaVA-1.5 | 7B/13B | q_proj,v_proj | vicuna |
Mistral/Mixtral | 7B/8x7B/8x22B | q_proj,v_proj | mistral |
OLMo | 1B/7B | q_proj,v_proj | - |
PaliGemma | 3B | q_proj,v_proj | gemma |
Phi-1.5/2 | 1.3B/2.7B | q_proj,v_proj | - |
Phi-3 | 3.8B | qkv_proj | phi |
Qwen | 1.8B/7B/14B/72B | c_attn | qwen |
Qwen1.5 (Code/MoE) | 0.5B/1.8B/4B/7B/14B/32B/72B/110B | q_proj,v_proj | qwen |
StarCoder2 | 3B/7B/15B | q_proj,v_proj | - |
XVERSE | 7B/13B/65B | q_proj,v_proj | xverse |
Yi (1/1.5) | 6B/9B/34B | q_proj,v_proj | yi |
Yi-VL | 6B/34B | q_proj,v_proj | yi_vl |
Yuan | 2B/51B/102B | q_proj,v_proj | yuan |
Approach | Full-tuning | Freeze-tuning | LoRA | QLoRA |
---|---|---|---|---|
Pre-Training | ✅ | ✅ | ✅ | ✅ |
Supervised Fine-Tuning | ✅ | ✅ | ✅ | ✅ |
Reward Modeling | ✅ | ✅ | ✅ | ✅ |
PPO Training | ✅ | ✅ | ✅ | ✅ |
DPO Training | ✅ | ✅ | ✅ | ✅ |
KTO Training | ✅ | ✅ | ✅ | ✅ |
ORPO Training | ✅ | ✅ | ✅ | ✅ |
As of now, we support the following datasets, most of which are all available in the Hugging Face datasets library.
Supervised fine-tuning dataset
- Stanford Alpaca
- Stanford Alpaca (Chinese)
- Hello-SimpleAI/HC3
- BELLE 2M (zh)
- BELLE 1M (zh)
- BELLE 0.5M (zh)
- BELLE Dialogue 0.4M (zh)
- BELLE School Math 0.25M (zh)
- BELLE Multiturn Chat 0.8M (zh)
- databricks-dolly-15k
- mosaicml/dolly_hhrlhf
- GPT-4 Generated Data
- Alpaca CoT
- UltraChat
- OpenAssistant/oasst1
- ShareGPT_Vicuna_unfiltered
- BIAI/OL-CC
- timdettmers/openassistant-guanaco
- Evol-Instruct
- OpenOrca
- Platypus
- OpenHermes
Preference datasets
Please refer to data/README.md to learn how to use these datasets. If you want to explore more datasets, please refer to the awesome-instruction-datasets. Some datasets require confirmation before using them, so we recommend logging in with your Hugging Face account using these commands.
pip install --upgrade huggingface_hub
huggingface-cli login
We provide a number of data preprocessing tools in the data folder. These tools are intended to be a starting point for further research and development.
- data_utils.py : Data preprocessing and formatting
- sft_dataset.py : Supervised fine-tuning dataset class and collator
- conv_dataset.py : Conversation dataset class and collator
We provide a number of models in the Hugging Face model hub. These models are trained with QLoRA and can be used for inference and finetuning. We provide the following models:
Base Model | Adapter | Instruct Datasets | Train Script | Log | Model on Huggingface |
---|---|---|---|---|---|
llama-7b | FullFinetune | - | - | - | |
llama-7b | QLoRA | openassistant-guanaco | finetune_lamma7b | wandb log | GaussianTech/llama-7b-sft |
llama-7b | QLoRA | OL-CC | finetune_lamma7b | ||
baichuan7b | QLoRA | openassistant-guanaco | finetune_baichuan7b | wandb log | GaussianTech/baichuan-7b-sft |
baichuan7b | QLoRA | OL-CC | finetune_baichuan7b | wandb log | - |
Mandatory | Minimum | Recommend |
---|---|---|
python | 3.8 | 3.10 |
torch | 1.13.1 | 2.2.0 |
transformers | 4.37.2 | 4.41.0 |
datasets | 2.14.3 | 2.19.1 |
accelerate | 0.27.2 | 0.30.1 |
peft | 0.9.0 | 0.11.1 |
trl | 0.8.2 | 0.8.6 |
Optional | Minimum | Recommend |
---|---|---|
CUDA | 11.6 | 12.2 |
deepspeed | 0.10.0 | 0.14.0 |
bitsandbytes | 0.39.0 | 0.43.1 |
vllm | 0.4.0 | 0.4.2 |
flash-attn | 2.3.0 | 2.5.8 |
* estimated
Method | Bits | 7B | 13B | 30B | 70B | 110B | 8x7B | 8x22B |
---|---|---|---|---|---|---|---|---|
Full | AMP | 120GB | 240GB | 600GB | 1200GB | 2000GB | 900GB | 2400GB |
Full | 16 | 60GB | 120GB | 300GB | 600GB | 900GB | 400GB | 1200GB |
Freeze | 16 | 20GB | 40GB | 80GB | 200GB | 360GB | 160GB | 400GB |
LoRA/GaLore/BAdam | 16 | 16GB | 32GB | 64GB | 160GB | 240GB | 120GB | 320GB |
QLoRA | 8 | 10GB | 20GB | 40GB | 80GB | 140GB | 60GB | 160GB |
QLoRA | 4 | 6GB | 12GB | 24GB | 48GB | 72GB | 30GB | 96GB |
QLoRA | 2 | 4GB | 8GB | 16GB | 24GB | 48GB | 18GB | 48GB |
Clone this repository and navigate to the Efficient-Tuning-LLMs folder
git clone https://github.com/jianzhnie/LLamaTuner.git
cd LLamaTuner
main function | Useage | Scripts |
---|---|---|
train_full.py | Full finetune LLMs on SFT datasets | full_finetune |
train_lora.py | Finetune LLMs by using Lora (Low-Rank Adaptation of Large Language Models finetune) | lora_finetune |
train_qlora.py | Finetune LLMs by using QLora (QLoRA: Efficient Finetuning of Quantized LLMs) | qlora_finetune |
LLamaTuner
is released under the Apache 2.0 license.
We thank the Huggingface team, in particular Younes Belkada, for their support integrating QLoRA with PEFT and transformers libraries.
We appreciate the work by many open-source contributors, especially:
- LLaMa
- Vicuna
- xTuring
- Alpaca-LoRA
- Stanford Alpaca
- LLaMA-Factory
- Hugging Face
- Peft
- axolotl
- deepspeed
- Unsloth
- qlora
- bitsandbytes
Please cite the repo if you use the data or code in this repo.
@misc{Chinese-Guanaco,
author = {jianzhnie},
title = {LLamaTuner: Easy and Efficient Fine-tuning LLMs},
year = {2023},
publisher = {GitHub},
journal = {GitHub repository},
howpublished = {\url{https://github.com/jianzhnie/LLamaTuner}},
}