Skip to content

v1.9: Llama2-70B, Falcon-180B, Mistral, fp8, SynapseAI v1.13

Compare
Choose a tag to compare
@regisss regisss released this 04 Dec 14:36
· 778 commits to main since this release

SynapseAI v1.13

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.13.

Fine-tuning Llama2-70B, Falcon-180B and BLOOM-7B

Added examples for fine-tuning Llama2-70B and Falcon-180B on Gaudi2 and BLOOM-7B on first-gen Gaudi.

  • Enable llama2-70b LoRA finetuning #527 @mandy-li
  • Add Deepspeed zero3 configuration to run bloom-7b on Gaudi1 #487
  • Enable Falcon 180B #537 @hlahkar

Llama2 fp8 inference

Mistral

Optimizations

  • Remove GPTJ dma before mha #468 @BaihuiJin
  • Enable llama attention softmax in bf16 #521 @schoi-habana
  • Add load_meta_device option to reduce host RAM #529 @jiminha
  • Improve llama performance and reduce memory consumption by updating sin/cos cache when inferring more than max position embeddings (4096) #532 @puneeshkhanna
  • Add hash_with_views arg for Falcon inference perf #534 @schoi-habana
  • Automate skip_hash_with_views for text generation with Falcon #544 @regisss

Improved text generation

Support for Transformers v4.34 and Diffusers v0.23

This version has been validated for Transformers v4.34 and Diffusers v0.23.

TGI

Dynamic shape support

  • Add infra to enable/disable dynamic shapes feature through gaudi_config #513 @vivekgoe

Habana Mixed Precision was removed in favor of Torch Autocast

Various fixes

Others

The regression tests associated to this release are here: https://github.com/huggingface/optimum-habana/actions/runs/7085551714