We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TransformerEngine has advanced Attention kernels, including support for FlashAttention-3 and low-precision kernels.
Having TransformerEngine's Attention as an attn_impl option would be super nice due to the additional features for H100 users.
attn_impl
Would require some changes in MPT configuration and adding that new Attention layer.
Not yet sure if I am available for the implementation, but wanted to get the request and discussion out there for now. :)
There was a previous PR with a similar proposal here: #803
The text was updated successfully, but these errors were encountered:
No branches or pull requests
🚀 Feature Request
TransformerEngine has advanced Attention kernels, including support for FlashAttention-3 and low-precision kernels.
Motivation
Having TransformerEngine's Attention as an
attn_impl
option would be super nice due to the additional features for H100 users.[Optional] Implementation
Would require some changes in MPT configuration and adding that new Attention layer.
Additional context
Not yet sure if I am available for the implementation, but wanted to get the request and discussion out there for now. :)
There was a previous PR with a similar proposal here: #803
The text was updated successfully, but these errors were encountered: