[PIPELINER] Refactor pipeliner lowering. #5989

pawelszczerbuk · 2025-02-22T00:08:07Z

This change reworks the pipeliner flow in triton. It systematizes the pipeliner transformations by making all of them part of the same SoftwarePipeliner pass, while making them modular and defining clear IR interfaces between them.
It also introduces new LowerLoop transformation that attempts to be more generic async operations lowering, written with minimal amount of assumptions of the IR shape that is coming from the pipeline scheduling sub-pass.

…Pipeliner

…to IR. Imporving debug dumps

…if's yield

…out blocked layout optimization

…r mmav5 scales

Mogball

So TLDR is this seems like a fairly coarse grained refactor of loop-scheduling+software pipelining into clear separable steps?

Mogball · 2025-02-24T22:02:22Z

lib/Dialect/TritonGPU/Transforms/Pipeliner/SoftwarePipeliner.cpp

+    // numStages) to the them, trying to populate the allowed stages. This
+    // step will be at some point extracted to separate pass that will be run
+    // only for loops missing the latency information.
+    assignLatencies(moduleOp, numStages);


OOoooooooohhhh. This looks super nice!

Mogball · 2025-02-24T22:03:12Z

lib/Dialect/TritonGPU/Transforms/Pipeliner/MatmulLoopPipeline.cpp

-  }
-  // Wait until there are 0 outstanding async dot ops.
-  builder.setInsertionPointAfter(forOp);
-  auto WarpGroupDotWaitAfterLoop = builder.create<ttng::WarpGroupDotWaitOp>(


the long awaited death of this file!

pawelszczerbuk · 2025-02-24T22:08:57Z

So TLDR is this seems like a fairly coarse grained refactor of loop-scheduling+software pipelining into clear separable steps?

Yeah, with some improvements to be able to lower more or less anything you can throw at it, without making assumptions of what can come out of current scheduling.

ThomasRaoux

LGTM!

include/triton/Dialect/TritonGPU/Transforms/PipeliningUtility.h

ThomasRaoux · 2025-02-25T02:42:35Z

test/TritonGPU/pipeline-lower-loop.mlir

@@ -0,0 +1,889 @@
+// RUN: triton-opt %s -split-input-file -allow-unregistered-dialect -tritongpu-test-pipeline-lower-loop -canonicalize | FileCheck %s


really nice!

Mogball

Impressive work. This is significantly cleaner and better layered!

include/triton/Dialect/TritonGPU/Transforms/Schedule.h

lib/Dialect/TritonGPU/Transforms/Pipeliner/LowerLoops.cpp

Mogball · 2025-02-25T18:02:58Z

So TLDR is this seems like a fairly coarse grained refactor of loop-scheduling+software pipelining into clear separable steps?

Yeah, with some improvements to be able to lower more or less anything you can throw at it, without making assumptions of what can come out of current scheduling.

Are there any specific changes you can call out that made the overall pipeliner more robust? I'm curious to know about them and it's not obvious from reading the PR? :P

ThomasRaoux · 2025-02-25T18:11:13Z

So TLDR is this seems like a fairly coarse grained refactor of loop-scheduling+software pipelining into clear separable steps?

Yeah, with some improvements to be able to lower more or less anything you can throw at it, without making assumptions of what can come out of current scheduling.

Are there any specific changes you can call out that made the overall pipeliner more robust? I'm curious to know about them and it's not obvious from reading the PR? :P

one thing I'm very excited about is in this PR is that significantly improves testability of different pieces of the pipeliner. For instance the lowering can be tested independently and we can independently test all the corner cases

… async cp lowering

pawelszczerbuk · 2025-02-25T21:41:33Z

So TLDR is this seems like a fairly coarse grained refactor of loop-scheduling+software pipelining into clear separable steps?

Yeah, with some improvements to be able to lower more or less anything you can throw at it, without making assumptions of what can come out of current scheduling.

Are there any specific changes you can call out that made the overall pipeliner more robust? I'm curious to know about them and it's not obvious from reading the PR? :P

one thing I'm very excited about is in this PR is that significantly improves testability of different pieces of the pipeliner. For instance the lowering can be tested independently and we can independently test all the corner cases

The other gap that this PR closed is introducing a fallback to pipelining in registers. Previously there was a handshake between scheduling and lowering where scheduling was not supposed to generate anything that lowering couldn't pipeline in shmem. New lowering can always fallback to pipelining in registers and should be able to pipeline basically any scheduled IR that comes its way.

…iable

lezcano · 2025-02-25T23:01:51Z

Does this allow us to completely kill

triton/lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp

Line 1118 in ef38bec

void LayoutRematerialization::hoistConvertDotOperand(

then? That'd be nice.

pawelszczerbuk · 2025-02-26T16:17:19Z

Does this allow us to completely kill

triton/lib/Dialect/TritonGPU/Transforms/RemoveLayoutConversions.cpp

Line 1118 in ef38bec

void LayoutRematerialization::hoistConvertDotOperand(

then? That'd be nice.

Not yet, here lowering is still picking up shared layout. I'll look into separating layout selection to a separate pass after the pipeliner, which should remove the need for this guy.

pawelszczerbuk added 30 commits January 28, 2025 17:09

Remove outer loop pipelining transformation

b4255bd

Merge branch 'main' into pawel/remove_outer_loop_pipe

77d9e32

Starting to work on lowering loads

a8279ad

.

bc5afdf

Merge branch 'main' into pawel/refactor_pipe_lowering

db1ed00

Working on lowering loads

3528729

Merge branch 'main' into pawel/refactor_pipe_lowering

b74e9f4

Merge branch 'main' into pawel/refactor_pipe_lowering

5f7a660

Working on createAsyncCopy

ad2ece4

.

e92f286

Somewhat working version for simple loads, added tests

0510fbd

.

6a1a106

Some more tests, some more fixes

5b46c26

.

109f1fa

Tests and fixes

bece86e

Merge branch 'main' into pawel/refactor_pipe_lowering

bd7dc10

Putting transformations in separate files, calling them from Software…

ac3c15b

…Pipeliner

typo

9d84de0

Removing LoopScheduling pass

6b4e26e

Adding perf remarks, cleaning up the comments

bf953ea

Update comments, remove dead code

5b7dafe

Removing more dead code. AssignLatencies always serializes latencies …

9f5fd47

…to IR. Imporving debug dumps

Merge branch 'main' into pawel/refactor_pipe_lowering

1db956c

Adding tests for assymetric loads and for dependent loads. Fixing bugs

8c6c855

Allocate additional buffer for wgmma pipelining

ae6110c

Properly handling cases with load users in next iteration and across …

73ab770

…if's yield

Fix for crash in tests, perf of LUT loads confirmed to be on par with…

1603717

…out blocked layout optimization

Merge branch 'main' into pawel/pawel/refactor_pipeline_lowering2

4a83f72

TMA loads and gather lowering implemented with tests

5bfcbb0

Lowering of TMA descriptors

f9b86b7

pawelszczerbuk added 5 commits February 21, 2025 11:08

Enabling wgmma pipelining, stab at proper lowering of multibuffers fo…

0bbb0d3

…r mmav5 scales

All the lit tests are passing

0b47299

Tests for proper lowering of mmav5 scaled

bb91139

.

9a3e1d6

Removing MatmulLoopPipeline

29a6f5e

pawelszczerbuk requested review from Mogball and ThomasRaoux February 22, 2025 00:08

pawelszczerbuk requested a review from ptillet as a code owner February 22, 2025 00:08

pawelszczerbuk mentioned this pull request Feb 22, 2025

[PIPELINE] Refactor loop lowering. #5918

Closed

pawelszczerbuk added 2 commits February 21, 2025 16:09

Merge branch 'main' into pawel/pawel/refactor_pipeline_lowering2

bc7978b

Adding missing file

84f4964

pawelszczerbuk mentioned this pull request Feb 22, 2025

[Pipeliner] Fix a bug that triggers an assertion error on Op's stage larger than numStages #5988

Closed

7 tasks

Mogball reviewed Feb 24, 2025

View reviewed changes

pawelszczerbuk added 2 commits February 24, 2025 16:08

Merge branch 'main' into pawel/pawel/refactor_pipeline_lowering2

6676a0b

Merge branch 'main' into pawel/pawel/refactor_pipeline_lowering2

aba01a0

ThomasRaoux approved these changes Feb 25, 2025

View reviewed changes

Mogball approved these changes Feb 25, 2025

View reviewed changes

include/triton/Dialect/TritonGPU/Transforms/Schedule.h Outdated Show resolved Hide resolved

lib/Dialect/TritonGPU/Transforms/Pipeliner/LowerLoops.cpp Outdated Show resolved Hide resolved

More aggressive asyncWaitOp combining, removing incorrect assert from…

2b61ebe

… async cp lowering

pawelszczerbuk added 5 commits February 25, 2025 14:09

PR comments

e790c33

Merge branch 'main' into pawel/pawel/refactor_pipeline_lowering2

54d450c

PR comments

2a69250

PR comments

945d8c3

Change the way pipelining test checks number of stages to be more rel…

86331b6

…iable

pawelszczerbuk merged commit 852c05f into triton-lang:main Feb 26, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[PIPELINER] Refactor pipeliner lowering. #5989

[PIPELINER] Refactor pipeliner lowering. #5989

pawelszczerbuk commented Feb 22, 2025

Mogball left a comment

Mogball Feb 24, 2025

Mogball Feb 24, 2025

pawelszczerbuk commented Feb 24, 2025

ThomasRaoux left a comment

ThomasRaoux Feb 25, 2025

Mogball left a comment

Mogball commented Feb 25, 2025

ThomasRaoux commented Feb 25, 2025

pawelszczerbuk commented Feb 25, 2025

lezcano commented Feb 25, 2025

pawelszczerbuk commented Feb 26, 2025

		@@ -0,0 +1,889 @@
		// RUN: triton-opt %s -split-input-file -allow-unregistered-dialect -tritongpu-test-pipeline-lower-loop -canonicalize \| FileCheck %s

[PIPELINER] Refactor pipeliner lowering. #5989

[PIPELINER] Refactor pipeliner lowering. #5989

Conversation

pawelszczerbuk commented Feb 22, 2025

Mogball left a comment

Choose a reason for hiding this comment

Mogball Feb 24, 2025

Choose a reason for hiding this comment

Mogball Feb 24, 2025

Choose a reason for hiding this comment

pawelszczerbuk commented Feb 24, 2025

ThomasRaoux left a comment

Choose a reason for hiding this comment

ThomasRaoux Feb 25, 2025

Choose a reason for hiding this comment

Mogball left a comment

Choose a reason for hiding this comment

Mogball commented Feb 25, 2025

ThomasRaoux commented Feb 25, 2025

pawelszczerbuk commented Feb 25, 2025

lezcano commented Feb 25, 2025

pawelszczerbuk commented Feb 26, 2025