Skip to content

Commit

Permalink
[feat] support ffpa-l1 registers double buffers (#70)
Browse files Browse the repository at this point in the history
* Update README.md

* Update README.md

* Update env.py

* Update prefill.cuh

* Update ffpa_attn_templates_L1.cuh

* Update launch_templates.cuh

* Update README.md
  • Loading branch information
DefTruth authored Feb 4, 2025
1 parent 8aade41 commit 6a85c42
Show file tree
Hide file tree
Showing 5 changed files with 214 additions and 56 deletions.
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,9 +68,9 @@ By leveraging this approach, we can achieve better performance for large headdim

|📚Feature |📚Feature |📚Feature |📚Feature|
|:---:|:---:|:---:|:---:|
|✔️Tensor Cores|✔️Loop over N/D |✔️Tile Block(Br, Bc) |**MMA(m16n8k16)**|
|✔️Tensor Cores |✔️**MMA(m16n8k16)** |✔️Tile Block(Br, Bc) |️Tile MMA/Warp |
|✔️**Split Q**(FA-2)|✔️Pack LDST(128 bits)|✔️SMEM **Swizzle/Pad** |✔️Copy Async |
|️Tile MMA/Warp |✔️QKV Multi-Stages(1~4) |✔️Collective Store(**Shfl**)|✔️**Prefetch QKV** g2s |
|**Reg Double Buffers** |✔️QKV **Multi-Stages(1~4)** |✔️Collective Store(**Shfl**)|✔️**Prefetch QKV** g2s |
|✔️**QKV Fine-grained Tiling**|✔️**Shared QKV** SMEM|✔️Mixed MMA Acc|✔️**Persist Q** s2r/g2s|

- 📚 case: FFPA `L1` kernel template signature: [ffpa_attn_templates_L1.cuh](csrc/cuffpa/ffpa_attn_templates_L1.cuh)
Expand Down
Loading

0 comments on commit 6a85c42

Please sign in to comment.