CUTLASS based FMHA kernels #1003
hwu36
started this conversation in
Show and tell
Replies: 1 comment 1 reply
-
I wonder if cutlass will develop "official" FMHA Kernel? |
Beta Was this translation helpful? Give feedback.
1 reply
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
First, @danthe3rd synced xFormers FMHA to CUTLASS repositories earlier last week. The new version is up to 20% faster in the forward pass and up to 10x faster in the backward pass. See #992 for the detail.
Second, @tridao just released FlashAttention v2. Compared with v1, v2 is 2x faster. See their repository for the detail.
Both implementations can run on both A100 and H100.
Beta Was this translation helpful? Give feedback.
All reactions