[SWP] Print recurring dependencies when reporting scheduling conflicts #5375

sfzhu93 · 2024-12-09T08:59:01Z

This PR enhances error messaging for scheduling conflicts in software pipelining, providing clearer guidance for debugging issues. Users will now receive prompts to reposition the producer earlier in the loop body to facilitate pipelining.

Previously, error messages only indicated that a consumer was scheduled before its producer, which was insufficient for effective debugging. Additionally, the location of the producer's definition was unclear, especially when the actual producer is inside an SCF IfOp.

This PR improves error tracing by backtracking the data flow into nested MLIR blocks of IfOps to identify the root definition of the producer and the root user of the consumer. This is especially useful for persistent kernels, as demonstrated in #5172, which is now included as a unit test. Another example involving a fused persistent matmul is also included as a test. Many users adapt persistent kernels from the official tutorial for their implementations, and this PR is useful for debugging.

This PR currently tracks only IfOps. We leave other cases as future work.

New contributor declaration

I am not making a trivial change, such as fixing a typo in a comment.
I have written a PR description following these
rules.
I have run pre-commit run --from-ref origin/main --to-ref HEAD.
Select one of the following.
- I have added tests.
  - /python/test for end-to-end tests
- This PR does not need a test because FILL THIS IN.
Select one of the following.
- I have not added any lit tests.
- The lit tests I have added follow these best practices,
  including the "tests should be minimal" section. (Usually running Python code
  and using the instructions it generates is not minimal.)

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineExpander.cpp

ThomasRaoux · 2024-12-11T05:50:27Z

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineExpander.cpp

+//   }
+// }
+
+DenseSet<Operation *> findRootDefiningOp(Operation *op,


I thought what we had discussed was to have this logic separate from the transformation code?

I see. Would it be acceptable to encapsulate this logic into a separate class and move it to a new source file? This way, we can maintain a clear separation between the transformation code and this logic.

what part of this file do you need? Is it just the check to know if the schedule is valide?

I am not sure if I get your question, but this PR does not introduce more checks. This PR provides detailed error message when distance = 1, per discussions with Pawel. findRootDefiningOp is to support the detailed error message.

I can get your point to separate checks from transformation. For this PR, how about I put related impl of printSchedulingError into an separate file such as ErrorPrinter.cpp?

Trying to understand what is exactly needed here. Do we want a detailed error message for compiler developers to help them fix the issue, or just a message to the user that the compiler failed to fulfill a schedule?

For the former, we may need a message tied to the transformation and the current implementation looks helpful.

For the latter we might need more simplified messaging.

My point was that it would be good to do this outside of the expander rather. It doesn't look like it needs to reuse much so this could be implemented as a separate verification + diagnostic

I agree that we should make this independent of the expander. Expander is supposed to stay close to the LLVM upstream implementation, and most likely replaced by it someday. The new error is more informative and actionable, which is great, but it would be great if we could make it independent from the expander. Perhaps a verification pass that we pass the schedule vector to?

@pawelszczerbuk I see. Passing the schedule vector to a separate pass makes sense to me. Do you want to refactor the existing verification part in the expander to a new pass as well? Right now I am only adding an extra error reporting to the verification. My PR lacks some comments - it looks complicated but they just collect more information for the error message. There's no extra verification.

Thank you all for the discussion! Anyhow, let me first improve my code quality: 1) adding more comments and 2) moving into a separate file. Then I will try to further separate them into a pass.

Thanks @sfzhu93 ! Let's not modify the expander for now. If we'll do it, it should be separate PR that is in sync with change in the upstream LLVM expander. It can come later.

@pawelszczerbuk Thanks! I have cleaned up the code and I have minimum change to the expander. The PR is not yet finished - to make the error more informative, the error reporter will likely depend on the schedule of each Op as well.

I can further make it into a separate pass if you feel it is necessary.

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineExpander.cpp

python/test/unit/test_perf_warning.py

include/triton/Dialect/TritonGPU/Transforms/PipelineErrorReporter.h

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineErrorReporter.cpp

add a unit test linter update update update update update clang-format rebase update update update update update update update update update

sfzhu93 · 2025-02-25T22:37:38Z

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineExpander.cpp

@@ -264,6 +265,9 @@ bool LoopPipelinerInternal::verifySchedule() {
        diag.attachNote(producer->getLoc())
            .append("operand defined here: ")
            .appendOp(*producer, OpPrintingFlags().printGenericOpForm());
+        PipelineErrorReporter errorReporter(forOp, maxStage + 1, stages);


@pawelszczerbuk @Mogball I noticed your PRs #5726 and #5867. I left them unchanged in case that are related to your internal workload.

I think I can check if MLIR_ENABLE_DIAGNOSTICS=operations is set here to include the ops in the notes. Maybe do it in a separate PR.

manman-ren

I am okay with this patch in general, since it is isolated. We can refine the logic later on as we hit more cases where SWP failed to work due to "operation scheduled before its operands".

python/test/unit/test_perf_warning.py

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineErrorReporter.cpp

sfzhu93 commented Dec 9, 2024

View reviewed changes

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineExpander.cpp Outdated Show resolved Hide resolved

ThomasRaoux reviewed Dec 11, 2024

View reviewed changes

manman-ren reviewed Dec 13, 2024

View reviewed changes

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineExpander.cpp Outdated Show resolved Hide resolved

python/test/unit/test_perf_warning.py Outdated Show resolved Hide resolved

sfzhu93 mentioned this pull request Jan 11, 2025

[Frontend][Diagnostics] Improve emitting diagnostic information #5581

Merged

7 tasks

sfzhu93 force-pushed the loop-carry-dep-check branch 2 times, most recently from 23fa565 to 1fb19f7 Compare January 30, 2025 02:03

manman-ren reviewed Feb 25, 2025

View reviewed changes

sfzhu93 force-pushed the loop-carry-dep-check branch from 49b9517 to 30ed20c Compare February 25, 2025 19:31

update

11930a8

add a unit test linter update update update update update clang-format rebase update update update update update update update update update

sfzhu93 force-pushed the loop-carry-dep-check branch from 30ed20c to 11930a8 Compare February 25, 2025 19:34

update

60b7a5f

sfzhu93 changed the title ~~[WIP][SWP] Print recurring dependencies when reporting scheduling conflicts~~ [SWP] Print recurring dependencies when reporting scheduling conflicts Feb 25, 2025

sfzhu93 commented Feb 25, 2025

View reviewed changes

sfzhu93 added 2 commits February 25, 2025 14:39

update

dba88cd

update

52f8976

sfzhu93 marked this pull request as ready for review February 26, 2025 00:22

sfzhu93 requested a review from ptillet as a code owner February 26, 2025 00:22

manman-ren reviewed Feb 27, 2025

View reviewed changes

python/test/unit/test_perf_warning.py Show resolved Hide resolved

lib/Dialect/TritonGPU/Transforms/Pipeliner/PipelineErrorReporter.cpp Outdated Show resolved Hide resolved

sfzhu93 added 2 commits February 27, 2025 10:44

update

3b93b60

update

d9a829c

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SWP] Print recurring dependencies when reporting scheduling conflicts #5375

[SWP] Print recurring dependencies when reporting scheduling conflicts #5375

sfzhu93 commented Dec 9, 2024 •

edited

Loading

ThomasRaoux Dec 11, 2024

sfzhu93 Dec 11, 2024

ThomasRaoux Dec 12, 2024

sfzhu93 Dec 12, 2024

htyu Dec 13, 2024

ThomasRaoux Dec 13, 2024

pawelszczerbuk Dec 13, 2024

sfzhu93 Dec 13, 2024

pawelszczerbuk Dec 16, 2024

sfzhu93 Dec 19, 2024

sfzhu93 Feb 25, 2025

sfzhu93 Feb 25, 2025

manman-ren left a comment

[SWP] Print recurring dependencies when reporting scheduling conflicts #5375

Are you sure you want to change the base?

[SWP] Print recurring dependencies when reporting scheduling conflicts #5375

Conversation

sfzhu93 commented Dec 9, 2024 • edited Loading

New contributor declaration

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

manman-ren left a comment

Choose a reason for hiding this comment

sfzhu93 commented Dec 9, 2024 •

edited

Loading