Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

replace kernel implementation using CK tile-programming performant kernels #33

Open
4 tasks
carlushuang opened this issue Jan 10, 2024 · 1 comment
Open
4 tasks
Assignees

Comments

@carlushuang
Copy link
Collaborator

carlushuang commented Jan 10, 2024

We are planning to replace the underneath kernel implementation with the newly developed CK tile-programming fmha kernel. The performance is much better for MI200/MI300, especially for MI300 cases. After this is done, the current implementation in main branch will be deprecated.

  • fwd integration with hdim=64/128, support mask, varlen, different kernels for padding case.
  • fwd extend to other hdims
  • dropout support
  • bwd integration (to be planed)
@sabreshao
Copy link
Collaborator

@carlushuang Our top priority ask on FA should be enabling E2E training for GPT3/LLAMA2/Qwen/MPT first.
Are you proposing to start integration based on current ROCM FA or base on latest upstream FA?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants