Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SWP] Print recurring dependencies when reporting scheduling conflicts #5375

Open
wants to merge 6 commits into
base: main
Choose a base branch
from

Conversation

sfzhu93
Copy link
Contributor

@sfzhu93 sfzhu93 commented Dec 9, 2024

This PR enhances error messaging for scheduling conflicts in software pipelining, providing clearer guidance for debugging issues. Users will now receive prompts to reposition the producer earlier in the loop body to facilitate pipelining.

Previously, error messages only indicated that a consumer was scheduled before its producer, which was insufficient for effective debugging. Additionally, the location of the producer's definition was unclear, especially when the actual producer is inside an SCF IfOp.

This PR improves error tracing by backtracking the data flow into nested MLIR blocks of IfOps to identify the root definition of the producer and the root user of the consumer. This is especially useful for persistent kernels, as demonstrated in #5172, which is now included as a unit test. Another example involving a fused persistent matmul is also included as a test. Many users adapt persistent kernels from the official tutorial for their implementations, and this PR is useful for debugging.

This PR currently tracks only IfOps. We leave other cases as future work.

New contributor declaration

  • I am not making a trivial change, such as fixing a typo in a comment.

  • I have written a PR description following these
    rules.

  • I have run pre-commit run --from-ref origin/main --to-ref HEAD.

  • Select one of the following.

    • I have added tests.
      • /python/test for end-to-end tests
    • This PR does not need a test because FILL THIS IN.
  • Select one of the following.

    • I have not added any lit tests.
    • The lit tests I have added follow these best practices,
      including the "tests should be minimal" section. (Usually running Python code
      and using the instructions it generates is not minimal.)

// }
// }

DenseSet<Operation *> findRootDefiningOp(Operation *op,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought what we had discussed was to have this logic separate from the transformation code?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. Would it be acceptable to encapsulate this logic into a separate class and move it to a new source file? This way, we can maintain a clear separation between the transformation code and this logic.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what part of this file do you need? Is it just the check to know if the schedule is valide?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if I get your question, but this PR does not introduce more checks. This PR provides detailed error message when distance = 1, per discussions with Pawel. findRootDefiningOp is to support the detailed error message.

I can get your point to separate checks from transformation. For this PR, how about I put related impl of printSchedulingError into an separate file such as ErrorPrinter.cpp?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to understand what is exactly needed here. Do we want a detailed error message for compiler developers to help them fix the issue, or just a message to the user that the compiler failed to fulfill a schedule?

For the former, we may need a message tied to the transformation and the current implementation looks helpful.

For the latter we might need more simplified messaging.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My point was that it would be good to do this outside of the expander rather. It doesn't look like it needs to reuse much so this could be implemented as a separate verification + diagnostic

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that we should make this independent of the expander. Expander is supposed to stay close to the LLVM upstream implementation, and most likely replaced by it someday. The new error is more informative and actionable, which is great, but it would be great if we could make it independent from the expander. Perhaps a verification pass that we pass the schedule vector to?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelszczerbuk I see. Passing the schedule vector to a separate pass makes sense to me. Do you want to refactor the existing verification part in the expander to a new pass as well? Right now I am only adding an extra error reporting to the verification. My PR lacks some comments - it looks complicated but they just collect more information for the error message. There's no extra verification.

Thank you all for the discussion! Anyhow, let me first improve my code quality: 1) adding more comments and 2) moving into a separate file. Then I will try to further separate them into a pass.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @sfzhu93 ! Let's not modify the expander for now. If we'll do it, it should be separate PR that is in sync with change in the upstream LLVM expander. It can come later.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelszczerbuk Thanks! I have cleaned up the code and I have minimum change to the expander. The PR is not yet finished - to make the error more informative, the error reporter will likely depend on the schedule of each Op as well.

I can further make it into a separate pass if you feel it is necessary.

@sfzhu93 sfzhu93 force-pushed the loop-carry-dep-check branch 2 times, most recently from 23fa565 to 1fb19f7 Compare January 30, 2025 02:03
@sfzhu93 sfzhu93 force-pushed the loop-carry-dep-check branch from 49b9517 to 30ed20c Compare February 25, 2025 19:31
add a unit test

linter

update

update

update

update

update

clang-format

rebase

update

update

update

update

update

update

update

update

update
@sfzhu93 sfzhu93 force-pushed the loop-carry-dep-check branch from 30ed20c to 11930a8 Compare February 25, 2025 19:34
@sfzhu93 sfzhu93 changed the title [WIP][SWP] Print recurring dependencies when reporting scheduling conflicts [SWP] Print recurring dependencies when reporting scheduling conflicts Feb 25, 2025
@@ -264,6 +265,9 @@ bool LoopPipelinerInternal::verifySchedule() {
diag.attachNote(producer->getLoc())
.append("operand defined here: ")
.appendOp(*producer, OpPrintingFlags().printGenericOpForm());
PipelineErrorReporter errorReporter(forOp, maxStage + 1, stages);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pawelszczerbuk @Mogball I noticed your PRs #5726 and #5867. I left them unchanged in case that are related to your internal workload.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I can check if MLIR_ENABLE_DIAGNOSTICS=operations is set here to include the ops in the notes. Maybe do it in a separate PR.

@sfzhu93 sfzhu93 marked this pull request as ready for review February 26, 2025 00:22
@sfzhu93 sfzhu93 requested a review from ptillet as a code owner February 26, 2025 00:22
Copy link
Collaborator

@manman-ren manman-ren left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am okay with this patch in general, since it is isolated. We can refine the logic later on as we hit more cases where SWP failed to work due to "operation scheduled before its operands".

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants