Merge OpenAI Triton commit `716a521` #3361

whitneywhtsang · 2025-02-05T22:48:07Z

This PR change the Triton base from 1e0e51c to 716a521 (Feb 4).
Pass rate: 98.19%

Please do not squash and merge this PR.

Hoisting layout conversions into ifs relies on the assumption that the if infrequently executes, but this assumption only makes sense in a loop. A single top-level if in a kernel either executes or it doesn't, and if the hoist is incorrect, it can lead to a slowdown.

Since we load data in the column major format with `ldmatrix.trans`, pre-blackwell hardware seems difficult to support the transpose case.

Simplify pipelining by removing outer loop pipelining transformation. Performance benefits of it are smaller than pipelining fused persistent loops, while making the pipeliner harder to maintain and refactor.

The folder for TransOp was a bit too aggresive. Sometimes it would change the representation of a layout for an equivalent one, and that's not allowed in the current state of things. We move the optimisation we had to a different canonicalizer.

If we expect a warning, we need to use a fresh cache dir; otherwise, no warning will be thrown when the cache is hit. Also take out interpreter related code from this file.

Mogball and others added 8 commits February 3, 2025 22:31

[BACKEND] Disable ldmatrix.trans for fp8 (#5800)

547fba0

Since we load data in the column major format with `ldmatrix.trans`, pre-blackwell hardware seems difficult to support the transpose case.

[PROTON] Fix incorrect tmp_path initialization (#5803)

d85b664

[PIPELINE] Remove outer loop pipelining transformation (#5766)

ebb99b1

Simplify pipelining by removing outer loop pipelining transformation. Performance benefits of it are smaller than pipelining fused persistent loops, while making the pipeliner harder to maintain and refactor.

[LAYOUTS] Fix TransOp::fold (#5807)

b3524fa

The folder for TransOp was a bit too aggresive. Sometimes it would change the representation of a layout for an equivalent one, and that's not allowed in the current state of things. We move the optimisation we had to a different canonicalizer.

[TEST] Use a fresh triton cache dir for warning tests (#5809)

716a521

If we expect a warning, we need to use a fresh cache dir; otherwise, no warning will be thrown when the cache is hit. Also take out interpreter related code from this file.

Merge commit '032fa41a45847cdc00119ed3bdd5bc0adab9c938'

cdfecfc

Merge commit '716a5218908ec40a4b09a17ebce7b02d05cd64be'

3df5ce6

whitneywhtsang requested review from pbchekin and anmyachev February 5, 2025 22:48

whitneywhtsang self-assigned this Feb 5, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Merge OpenAI Triton commit `716a521` #3361

Merge OpenAI Triton commit `716a521` #3361

whitneywhtsang commented Feb 5, 2025

Merge OpenAI Triton commit 716a521 #3361

Are you sure you want to change the base?

Merge OpenAI Triton commit 716a521 #3361

Conversation

whitneywhtsang commented Feb 5, 2025

Merge OpenAI Triton commit `716a521` #3361

Merge OpenAI Triton commit `716a521` #3361