[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop #6017

zhanglx13 · 2025-02-25T16:42:14Z

This PR adds a new amd.pass that hoists conver_layout to dotOperand layout for the Q tensor out of the loop. Therefore, Q tensor is kept in registers instead of being loaded at every iteration of the loop.

This PR is actually achieving the same thing as #4901. However, #4901 does not hoist local_load for Q in the epilogue, making Q tensor live in shared memory all the time.
On the other hand, this PR does the trick before stream-pipeline pass. Therefore, the livessness of Q tensor in shared memory is limited in the prologue.

sjw36

Looks good and much more simple. Thanks!

This PR adds a new amd.pass that hoists conver_layout to dotOperand layout for the Q tensor out of the loop. Therefore, Q tensor is kept in registers instead of being loaded at every iteration of the loop. This PR is actually achieving the same thing as triton-lang#4901. However, triton-lang#4901 does not hoist local_load for Q in the epilogue, making Q tensor live in shared memory all the time. On the other hand, this PR does the trick before stream-pipeline pass. Therefore, the livessness of Q tensor in shared memory is limited in the prologue.

third_party/amd/include/TritonAMDGPUTransforms/Passes.td

third_party/amd/lib/TritonAMDGPUTransforms/HoistLayoutConversions.cpp

sjw36 approved these changes Feb 25, 2025

View reviewed changes

zhanglx13 force-pushed the hoist_cvt branch from 0e6b790 to efd0fff Compare February 25, 2025 23:50

antiagainst requested changes Feb 26, 2025

View reviewed changes

Move the pass into a FuncOp scope

9de2cbb

antiagainst reviewed Feb 26, 2025

View reviewed changes

third_party/amd/lib/TritonAMDGPUTransforms/HoistLayoutConversions.cpp Outdated Show resolved Hide resolved

Addressed review comments and added lit tests

2e3af7d

antiagainst approved these changes Feb 26, 2025

View reviewed changes

antiagainst marked this pull request as ready for review February 26, 2025 16:00

antiagainst requested a review from ptillet as a code owner February 26, 2025 16:00

zhanglx13 merged commit e24d693 into triton-lang:main Feb 26, 2025
7 checks passed

antiagainst deleted the hoist_cvt branch February 28, 2025 19:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop #6017

[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop #6017

zhanglx13 commented Feb 25, 2025 •

edited

Loading

sjw36 left a comment

[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop #6017

[AMD] [FA] Hoist convert_layout to dotOp for Q out of the loop #6017

Conversation

zhanglx13 commented Feb 25, 2025 • edited Loading

sjw36 left a comment

Choose a reason for hiding this comment

zhanglx13 commented Feb 25, 2025 •

edited

Loading