Update stream llm to get correct outputs and re-enable rerotated-attention test. #656

raikonenfnu · 2024-04-25T06:07:45Z

During the update of pytorch/HF, there seem to be a change of how causal mask was being handled. It seems like the attention.forward function used to get a causal_mask from the argument as attention_mask when is_causal is on. Now it seems like we would need to construct our own mask when is_causal is true. This was causing numerical issues in this test as well as on Llama2 qualitatively.

This PR introduces construction of causal mask, as well as removing unnecessary tensor parallel config checks which simplifies the code quite a bit.

monorimet

Nice, thanks for this fix! LGTM

raikonenfnu added 3 commits April 24, 2024 20:44

Update streaming-llm attention impl

7294808

Update/fix streaming llm attention to use decomposed ops (good output)

3df91b5

Remove experiment fluff + re-enable tests.

b733ac0

raikonenfnu requested review from stellaraccident and monorimet April 25, 2024 06:07

monorimet approved these changes Apr 25, 2024

View reviewed changes

raikonenfnu merged commit 4a01c40 into nod-ai:main Apr 25, 2024
4 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update stream llm to get correct outputs and re-enable rerotated-attention test. #656

Update stream llm to get correct outputs and re-enable rerotated-attention test. #656

raikonenfnu commented Apr 25, 2024

monorimet left a comment

Update stream llm to get correct outputs and re-enable rerotated-attention test. #656

Update stream llm to get correct outputs and re-enable rerotated-attention test. #656

Conversation

raikonenfnu commented Apr 25, 2024

monorimet left a comment

Choose a reason for hiding this comment