Release v1.9: Llama2-70B, Falcon-180B, Mistral, fp8, SynapseAI v1.13 · huggingface/optimum-habana

SynapseAI v1.13

The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.13.

Added examples for fine-tuning Llama2-70B and Falcon-180B on Gaudi2 and BLOOM-7B on first-gen Gaudi.

Remove GPTJ dma before mha #468 @BaihuiJin
Enable llama attention softmax in bf16 #521 @schoi-habana
Add load_meta_device option to reduce host RAM #529 @jiminha
Improve llama performance and reduce memory consumption by updating sin/cos cache when inferring more than max position embeddings (4096) #532 @puneeshkhanna
Add hash_with_views arg for Falcon inference perf #534 @schoi-habana
Automate skip_hash_with_views for text generation with Falcon #544 @regisss

This version has been validated for Transformers v4.34 and Diffusers v0.23.

Add infra to enable/disable dynamic shapes feature through gaudi_config #513 @vivekgoe

Fix for SegFault during FT #483 @MohitIntel
Enable/disable gradient_checkpointing as per training_args.gradient_checkpointing value #484 @vivekgoe
Fix split validation dataset problem #489 @mandy-li
Fix validate dataset problem for openassistant-guanaco #498 @mandy-li
Fix for Accelerate #500 @regisss
Fix deepspeed init issue when using external launcher #497 @yuanwu2017
Update Transformers dependency in setup.py #504 @regisss
Fix token transmission in text-generation example #509 @regisss
Merge LoRA model before initializing DS inference in text-generation example #515 @regisss
Fix for Falcon-40b inference with deepspeed #502 @schoi-habana
Fixing FusedSDPA recompute bug #512 @skaulintel
Fixing update method - avoid copy idx to cpu which splitting the graph #524 @bgoldberg-habana
Fix missing max_position_embeddings in model config in run_clm.py #530 @regisss
Fix for attn_softmax_bf16 when generation_config is None #531 @schoi-habana
Fix loading on meta device for PEFT models with DS-inference #528 @regisss
Fix split by whitespaces not a single space #540 @oelayan7
Fix stable diffusion pipelines #548 @regisss
Update trainer.py #549 @skaulintel
Add fallback for PEFT when the base model doesn't exist #557 @regisss

Update GaudiNIC multi-node-training dockerfile and setup #477 @yeonsily
Adding ignore_eos flag to use in generation #469 @bhargaveede
Add maximum hpugraphs and disable_tensor_cache arguments to GaudiTrainer #493 @skaulintel
Update BridgeTower example #561 @regisss
Remove mention of eager in readme. set use_lazy_mode to true by default #486 @skaulintel
Add another tokenizer to multilingual list #550 @ssarkar2
Specify problem type for classification #551 @ssarkar2