diff --git a/docs/source/sft_trainer.mdx b/docs/source/sft_trainer.mdx index f2ace18b8c..0f9e145019 100644 --- a/docs/source/sft_trainer.mdx +++ b/docs/source/sft_trainer.mdx @@ -415,12 +415,12 @@ Note however, that the amount of performance gain is _dataset dependent_ and in You can further accelerate QLoRA / LoRA (2x faster, 60% less memory) using the [`unsloth`](https://github.com/unslothai/unsloth) library that is fully compatible with `SFTTrainer`. Currently `unsloth` supports only Llama (Yi, TinyLlama, Qwen, Deepseek etc) and Mistral architectures. Some benchmarks on 1x A100 listed below: -| 1 A100 40GB | Dataset | 🤗 | 🤗 + Flash Attention | 🦥 Unsloth | 🦥 VRAM saved | -|-----------------|-----------|-----|----------------------|-----------------|----------------| -| Code Llama 34b | Slim Orca | 1x | 1.01x | **1.94x** | -22.7% | -| Llama-2 7b | Slim Orca | 1x | 0.96x | **1.87x** | -39.3% | -| Mistral 7b | Slim Orca | 1x | 1.17x | **1.88x** | -65.9% | -| Tiny Llama 1.1b | Alpaca | 1x | 1.55x | **2.74x** | -57.8% | +| 1 A100 40GB | Dataset | 🤗 | 🤗 + Flash Attention 2 | 🦥 Unsloth | 🦥 VRAM saved | +|-----------------|-----------|-----|-------------------------|-----------------|----------------| +| Code Llama 34b | Slim Orca | 1x | 1.01x | **1.94x** | -22.7% | +| Llama-2 7b | Slim Orca | 1x | 0.96x | **1.87x** | -39.3% | +| Mistral 7b | Slim Orca | 1x | 1.17x | **1.88x** | -65.9% | +| Tiny Llama 1.1b | Alpaca | 1x | 1.55x | **2.74x** | -57.8% | First install `unsloth` according to the [official documentation](https://github.com/unslothai/unsloth). Once installed, you can incorporate unsloth into your workflow in a very simple manner; instead of loading `AutoModelForCausalLM`, you just need to load a `FastLanguageModel` as follows: