Add batch splitting in attention layer to hide NIC latency(#14) #1640

kalyank007 · 2024-12-19T10:40:05Z

- Introduced the `--attn_batch_split` parameter to enable batch splitting in the attention and mlp layer.
- This approach aims to overlap communication and computation, effectively hiding NIC latency during distributed attention operations.

- Perform the add in the beginning of the next layer for better pipelining

- Updated Readme

- [SW-212702] Fix the  attn_batch_split argument specific to llama config (#74)

huggingface#14) - Introduced the `--attn_batch_split` parameter to enable batch splitting in the attention and mlp layer. - This approach aims to overlap communication and computation, effectively hiding NIC latency during distributed attention operations. - Perform the add in the beginning of the next layer for better pipelining - Updated Readme - [SW-212702] Fix the attn_batch_split argument specific to llama config (huggingface#74) Co-authored-by: Kalyan <[email protected]>

examples/text-generation/README.md

optimum/habana/transformers/models/llama/modeling_llama.py

yeonsily

Is this PR an improvement of the current example run? If so, can you please post some number to show how much it can improve?

And can you please run CI to confirm this doesn't break anything?

python -m pytest tests/test_text_generation_example.py -s -v -k llama
python -m pytest tests/test_examples.py -s -v -k llama

yeonsily · 2025-01-08T22:17:39Z

optimum/habana/transformers/models/llama/modeling_llama.py

@@ -1274,7 +1362,13 @@ def forward(
                    valid_sequence_lengths=valid_sequence_lengths,
                    cache_idx=cache_idx,
                    num_virtual_tokens=num_virtual_tokens,
+                    attn_batch_split=attn_batch_split,
+                    prev_layer_residual=prev_layer_residual,


From the previous review, it seems this need to be changed to,

prev_layer_residual=prev_layer_residual if use_prev_layer_residual else None,

And where prev_layer_residual value is set after line 1298?

prev_layer_residual is set in line 1370 prev_layer_residual = layer_outputs[index], for the first layer it will be None.

Perf improvement on Gaudi3

Can you please post CI result also?

@yeonsily Can we mark this conversation resolved ?

Can you please re-run test_examples.py with RUN_SLOW=true GAUDI2_CI=1 ? as llama ones are all skipped and didn't run.

yeonsily · 2025-01-31T17:15:15Z

LGTM.

regisss · 2025-02-05T13:55:48Z

Can you please re-run test_examples.py with RUN_SLOW=true GAUDI2_CI=1 ? as llama ones are all skipped and didn't run.

@kalyank007 Can you make sure the Llama training examples still pass please?

kalyank007 · 2025-02-06T04:44:31Z

Can you please re-run test_examples.py with RUN_SLOW=true GAUDI2_CI=1 ? as llama ones are all skipped and didn't run.

@kalyank007 Can you make sure the Llama training examples still pass please?

Some tests failing because of access issues to data-set.

Log :
Build : 1.19.0
log_ci_split_05_02_19_47_39.log

…atch_split

HuggingFaceDocBuilderDev · 2025-02-07T10:28:18Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

regisss

LGTM!

kalyank007 requested review from ssarkar2, bhargaveede, vivekgoe, mandy-li, libinta and regisss as code owners December 19, 2024 10:40

ssarkar2 reviewed Dec 19, 2024

View reviewed changes

examples/text-generation/README.md Outdated Show resolved Hide resolved

optimum/habana/transformers/models/llama/modeling_llama.py Outdated Show resolved Hide resolved

libinta added the synapse1.20 label Dec 19, 2024

Updated readme for typo and argument group for attn_batch_split

624dcfa

mgonchar reviewed Dec 20, 2024

View reviewed changes

optimum/habana/transformers/models/llama/modeling_llama.py Outdated Show resolved Hide resolved

Remove debug prints

4cd5e8f

kalyank007 requested review from ssarkar2 and mgonchar January 2, 2025 06:42

yeonsily reviewed Jan 8, 2025

View reviewed changes

kalyank007 added 2 commits January 9, 2025 06:54

Updated check for prev_layer_residual add.

e542cbf

Remove redundant cat op from attn_split flow

7151f37

kalyank007 requested a review from yeonsily January 16, 2025 08:17

libinta changed the title ~~[SW-207965] Add batch splitting in attention layer to hide NIC latency(#14)~~ Add batch splitting in attention layer to hide NIC latency(#14) Jan 30, 2025

yeonsily approved these changes Jan 31, 2025

View reviewed changes

libinta added the run-test Run CI for PRs from external contributors label Feb 3, 2025

libinta removed the synapse1.20 label Feb 6, 2025

Merge remote-tracking branch 'optimum-habana/main' into attn_prompt_b…

0af0d39

…atch_split

regisss approved these changes Feb 7, 2025

View reviewed changes

regisss merged commit 3d7b2fa into huggingface:main Feb 7, 2025
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add batch splitting in attention layer to hide NIC latency(#14) #1640

Add batch splitting in attention layer to hide NIC latency(#14) #1640

kalyank007 commented Dec 19, 2024

yeonsily left a comment •

edited

Loading

yeonsily Jan 8, 2025

kalyank007 Jan 9, 2025

kalyank007 Jan 9, 2025 •

edited

Loading

yeonsily Jan 9, 2025

kalyank007 Jan 16, 2025

kalyank007 Jan 16, 2025

kalyank007 Jan 16, 2025

kalyank007 Jan 22, 2025

yeonsily Jan 23, 2025

yeonsily commented Jan 31, 2025

regisss commented Feb 5, 2025

kalyank007 commented Feb 6, 2025 •

edited

Loading

HuggingFaceDocBuilderDev commented Feb 7, 2025

regisss left a comment

Add batch splitting in attention layer to hide NIC latency(#14) #1640

Add batch splitting in attention layer to hide NIC latency(#14) #1640

Conversation

kalyank007 commented Dec 19, 2024

yeonsily left a comment • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

kalyank007 Jan 9, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

yeonsily commented Jan 31, 2025

regisss commented Feb 5, 2025

kalyank007 commented Feb 6, 2025 • edited Loading

HuggingFaceDocBuilderDev commented Feb 7, 2025

regisss left a comment

Choose a reason for hiding this comment

yeonsily left a comment •

edited

Loading

kalyank007 Jan 9, 2025 •

edited

Loading

kalyank007 commented Feb 6, 2025 •

edited

Loading