Accuracy fix for llama3.1-70B in eager/torch.compile mode #1746

ckvermaAI · 2025-02-05T15:39:14Z

Issue: Low accuracy in Llama3.1-70B with eager/torch.compile mode

(Following details extracted from huggingface/transformers#28685)
Use a number of transformers models that utilize arange for integer enumerations in the calculation of position embeddings with DeepSpeed zero.Init() and a low precision dtype (float16, bfloat16), and the generated embeddings will differ significantly from intended.

Using Llama as an example
t = torch.arange(self.max_seq_len_cached, device=device, dtype=self.inv_freq.dtype)
The inv_freq.dtype == float32. Single precision float can cover the required integer range for the enumeration (I believe it's in the 2k-8k range for Llama?).
However, when DeepSpeed zero.Init is used the init function patching will override the float dtype passed in with a low precision float dtype, so float32 -> bfloat16 or float16. Thus the integer range that can be represented without significant loss drops down to 256 for bfloat16 or 2048 for float16. DeepSpeed's patching has an exception for integer dtype, it will not cast arange to the low precision float dtype if arange dtype is an int type.

Fix: Simply set the dtype as torch.int32 for torch.arange.

torch.int64 is not used because it generates incorrect values (and corresponding JIT_IR graph is not as expected).

Co-authored-by: Vivek Goel <[email protected]>

yafshar

Very nice fix!

LGTM!

Hi @regisss, this PR is ready for your final review. Could you please take a look?

optimum/habana/transformers/models/llama/modeling_llama.py

HuggingFaceDocBuilderDev · 2025-02-07T03:38:58Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

LGTM!

accuracy fix for llama3.1-70B (#136)

41866a1

Co-authored-by: Vivek Goel <[email protected]>

ckvermaAI requested review from mandy-li and libinta as code owners February 5, 2025 15:39

yafshar approved these changes Feb 5, 2025

View reviewed changes

libinta reviewed Feb 5, 2025

View reviewed changes

optimum/habana/transformers/models/llama/modeling_llama.py Show resolved Hide resolved

libinta added the run-test Run CI for PRs from external contributors label Feb 7, 2025

regisss approved these changes Feb 7, 2025

View reviewed changes

regisss merged commit a0d14d2 into huggingface:main Feb 7, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Accuracy fix for llama3.1-70B in eager/torch.compile mode #1746

Accuracy fix for llama3.1-70B in eager/torch.compile mode #1746

ckvermaAI commented Feb 5, 2025

yafshar left a comment

HuggingFaceDocBuilderDev commented Feb 7, 2025

regisss left a comment

Accuracy fix for llama3.1-70B in eager/torch.compile mode #1746

Accuracy fix for llama3.1-70B in eager/torch.compile mode #1746

Conversation

ckvermaAI commented Feb 5, 2025

yafshar left a comment

Choose a reason for hiding this comment

HuggingFaceDocBuilderDev commented Feb 7, 2025

regisss left a comment

Choose a reason for hiding this comment