Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Triton SIMD reduction investigation #3310

Open
victor-eds opened this issue Jan 30, 2025 · 1 comment
Open

Triton SIMD reduction investigation #3310

victor-eds opened this issue Jan 30, 2025 · 1 comment
Assignees

Comments

@victor-eds
Copy link
Contributor

Regular sub-group reduction not taking into account layouts may lead to subpar performance on PVC. This kind of workflows takes place when a reduction follows a matrix multiplication or a tensor with the same layout as the output of a matrix multiplication (DPAS layout). #2907 was the final PR trying to fix this at the Triton level. However, a parallel approach to fix it on IGC was run. The IGC approach however was still subpar as it required moving data around after doing the reduction while also using way more operations on the reduction itself.

Running FlashAttention using the SIMD reduction does not currently give good performance as, per my investigation, spilling is way higher in that case. This should not be the case as the algorithm should not increase register pressure, so maybe this is related to some kind of suboptimal instruction scheduling.

Reducing the DModel dimension to just 16 so no spilling takes place lead to better performance and overall better codegen in the SIMD reduction compared to the baseline reduction and approach. This may lead to think the SIMD reduction will give better performance (as well as being more general) as it acts in a higher level.

Now, to take full profit out of the optimization, we may have two paths:

  1. Improving instruction scheduling in the backend
  2. Explore splitting tensors across warps in the DModel dimension (reduction dimension). This may also alleviate register pressure and avoid spilling while exploiting the SIMD reduction
@sommerlukas
Copy link
Contributor

When this issue is resolved and performance investigation shows good results, we can enable the pass by default in the pipeline. This can be done by either reopening #2748 or by filling a new issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants