v1.9: Llama2-70B, Falcon-180B, Mistral, fp8, SynapseAI v1.13
SynapseAI v1.13
The codebase is fully validated for the latest version of Habana SDK, SynapseAI v1.13.
Fine-tuning Llama2-70B, Falcon-180B and BLOOM-7B
Added examples for fine-tuning Llama2-70B and Falcon-180B on Gaudi2 and BLOOM-7B on first-gen Gaudi.
- Enable llama2-70b LoRA finetuning #527 @mandy-li
- Add Deepspeed zero3 configuration to run bloom-7b on Gaudi1 #487
- Enable Falcon 180B #537 @hlahkar
Llama2 fp8 inference
- Add llamav2 fp8 inference #542 @bgoldberg-habana
Mistral
Optimizations
- Remove GPTJ dma before mha #468 @BaihuiJin
- Enable llama attention softmax in bf16 #521 @schoi-habana
- Add load_meta_device option to reduce host RAM #529 @jiminha
- Improve llama performance and reduce memory consumption by updating sin/cos cache when inferring more than max position embeddings (4096) #532 @puneeshkhanna
- Add hash_with_views arg for Falcon inference perf #534 @schoi-habana
- Automate skip_hash_with_views for text generation with Falcon #544 @regisss
Improved text generation
- Allow multi prompts #479 @ssarkar2
- Growing bucket for beam #450 @ssarkar2
- Some models have extra inputs, pad them too #488 @ssarkar2
- Refactor run generation #523 @bgoldberg-habana
- Fix setting of reuse cache #553 @puneeshkhanna
- No need to unsqueeze input_id in prepare_inputs_for_generation #559 @sywangyi
- Adding lm eval script #541 @bgoldberg-habana
Support for Transformers v4.34 and Diffusers v0.23
This version has been validated for Transformers v4.34 and Diffusers v0.23.
- Upgrade to Transformers 4.34 #475 @regisss
- Upgrade to Diffusers 0.23 #516 @regisss
- Pin Diffusers #565 @regisss
TGI
Dynamic shape support
Habana Mixed Precision was removed in favor of Torch Autocast
- Remove HMP from optimum-habana #349 @jwieczorekhabana
Various fixes
- Fix for SegFault during FT #483 @MohitIntel
- Enable/disable gradient_checkpointing as per training_args.gradient_checkpointing value #484 @vivekgoe
- Fix split validation dataset problem #489 @mandy-li
- Fix validate dataset problem for openassistant-guanaco #498 @mandy-li
- Fix for Accelerate #500 @regisss
- Fix deepspeed init issue when using external launcher #497 @yuanwu2017
- Update Transformers dependency in setup.py #504 @regisss
- Fix token transmission in text-generation example #509 @regisss
- Merge LoRA model before initializing DS inference in text-generation example #515 @regisss
- Fix for Falcon-40b inference with deepspeed #502 @schoi-habana
- Fixing FusedSDPA recompute bug #512 @skaulintel
- Fixing update method - avoid copy idx to cpu which splitting the graph #524 @bgoldberg-habana
- Fix missing max_position_embeddings in model config in run_clm.py #530 @regisss
- Fix for attn_softmax_bf16 when generation_config is None #531 @schoi-habana
- Fix loading on meta device for PEFT models with DS-inference #528 @regisss
- Fix split by whitespaces not a single space #540 @oelayan7
- Fix stable diffusion pipelines #548 @regisss
- Update trainer.py #549 @skaulintel
- Add fallback for PEFT when the base model doesn't exist #557 @regisss
Others
- Update GaudiNIC multi-node-training dockerfile and setup #477 @yeonsily
- Adding ignore_eos flag to use in generation #469 @bhargaveede
- Add maximum hpugraphs and disable_tensor_cache arguments to GaudiTrainer #493 @skaulintel
- Update BridgeTower example #561 @regisss
- Remove mention of eager in readme. set use_lazy_mode to true by default #486 @skaulintel
- Add another tokenizer to multilingual list #550 @ssarkar2
- Specify problem type for classification #551 @ssarkar2
The regression tests associated to this release are here: https://github.com/huggingface/optimum-habana/actions/runs/7085551714