-
Notifications
You must be signed in to change notification settings - Fork 48
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Update stream llm to get correct outputs and re-enable rerotated-atte…
…ntion test. (#656) During the update of pytorch/HF, there seem to be a change of how causal mask was being handled. It seems like the attention.forward function used to get a `causal_mask` from the argument as `attention_mask` when is_causal is on. Now it seems like we would need to construct our own mask when `is_causal` is true. This was causing numerical issues in this test as well as on Llama2 qualitatively. This PR introduces construction of causal mask, as well as removing unnecessary tensor parallel config checks which simplifies the code quite a bit.
- Loading branch information
1 parent
7877444
commit 4a01c40
Showing
2 changed files
with
25 additions
and
58 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters