Triton SIMD reduction investigation #3310

victor-eds · 2025-01-30T16:44:03Z

Regular sub-group reduction not taking into account layouts may lead to subpar performance on PVC. This kind of workflows takes place when a reduction follows a matrix multiplication or a tensor with the same layout as the output of a matrix multiplication (DPAS layout). #2907 was the final PR trying to fix this at the Triton level. However, a parallel approach to fix it on IGC was run. The IGC approach however was still subpar as it required moving data around after doing the reduction while also using way more operations on the reduction itself.

Running FlashAttention using the SIMD reduction does not currently give good performance as, per my investigation, spilling is way higher in that case. This should not be the case as the algorithm should not increase register pressure, so maybe this is related to some kind of suboptimal instruction scheduling.

Reducing the DModel dimension to just 16 so no spilling takes place lead to better performance and overall better codegen in the SIMD reduction compared to the baseline reduction and approach. This may lead to think the SIMD reduction will give better performance (as well as being more general) as it acts in a higher level.

Now, to take full profit out of the optimization, we may have two paths:

Improving instruction scheduling in the backend
Explore splitting tensors across warps in the DModel dimension (reduction dimension). This may also alleviate register pressure and avoid spilling while exploiting the SIMD reduction

sommerlukas · 2025-02-03T08:42:12Z

When this issue is resolved and performance investigation shows good results, we can enable the pass by default in the pipeline. This can be done by either reopening #2748 or by filling a new issue.

sommerlukas mentioned this issue Feb 3, 2025

Enable -tritonintelgpu-optimize-reduction-locality by default #2748

Closed

vlad-penkin added this to the 4.0 [Performance] Core milestone Feb 3, 2025

vlad-penkin assigned sommerlukas Feb 18, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Triton SIMD reduction investigation #3310

Triton SIMD reduction investigation #3310

victor-eds commented Jan 30, 2025

sommerlukas commented Feb 3, 2025

Triton SIMD reduction investigation #3310

Triton SIMD reduction investigation #3310

Comments

victor-eds commented Jan 30, 2025

sommerlukas commented Feb 3, 2025