Cache Triton compilation artifacts during CI. #3680

jlebar · 2024-04-16T21:33:14Z

The vast majority of CI time is spent compiling Triton kernels. This caches
those, so if the compiler doesn't change, we should be able to reuse the cached
kernels. OTOH if the compiler does change, Triton will not (or, should not)
use stale compilation artifacts, because the Triton source code is effectively
part of the cache key.

Because the Triton C++ compiled code is part of the cache key, this will only
work if the Triton C++ build is deterministic. We should be able to see if
caching is working by comparing the names of the compilation artifacts between
two nop commits to main.

While we're here, we also run pre-commit on changes in the main branch.
Previously we only ran it on pull requests. The reason for this is that if we
don't run it on main, the pip caching won't work on non-main branches (because
the cache action only pulls from the current branch or main).

Also while we're here, rewrite the AMD workflow definition to reference the
nvidia workflow definition where they match -- it was getting hard to keep the
two workflows in sync. Because github actions don't support YAML references, I
had to add a preprocessing step to pre-commit. But it's not too bad.

.github/workflows/integration-tests.yml.in

antiagainst

Nice! LGTM; just a few nits. Sorry I said I'd look into this but gotten distracted by other tasks..

.pre-commit-config.yaml

.github/workflows/integration-tests.yml

The vast majority of CI time is spent compiling Triton kernels. This caches those, so if the compiler doesn't change, we should be able to reuse the cached kernels. OTOH if the compiler does change, Triton will not (or, should not) use stale compilation artifacts, because the Triton source code is effectively part of the cache key. Because the Triton C++ compiled code is part of the cache key, this will only work if the Triton C++ build is deterministic. We should be able to see if caching is working by comparing the names of the compilation artifacts between two nop commits to `main`. While we're here, we also run pre-commit on changes in the main branch. Previously we only ran it on pull requests. The reason for this is that if we don't run it on main, the pip caching won't work on non-main branches (because the cache action only pulls from the current branch or main). Also while we're here, rewrite the AMD workflow definition to reference the nvidia workflow definition where they match -- it was getting hard to keep the two workflows in sync. Because github actions don't support YAML references, I had to add a preprocessing step to pre-commit. But it's not too bad.

jlebar · 2024-04-18T02:59:04Z

Thank you for the reviews! Let's see how this goes on main.

jlebar force-pushed the cache-triton-artifacts branch 4 times, most recently from 294f7d1 to a17e207 Compare April 17, 2024 02:20

jlebar requested a review from antiagainst April 17, 2024 02:29

jlebar marked this pull request as ready for review April 17, 2024 02:29

jlebar requested a review from ptillet as a code owner April 17, 2024 02:29

chicheng reviewed Apr 17, 2024

View reviewed changes

.github/workflows/integration-tests.yml.in Outdated Show resolved Hide resolved

antiagainst approved these changes Apr 17, 2024

View reviewed changes

.pre-commit-config.yaml Show resolved Hide resolved

.github/workflows/integration-tests.yml Outdated Show resolved Hide resolved

.github/workflows/integration-tests.yml Outdated Show resolved Hide resolved

.github/workflows/integration-tests.yml Outdated Show resolved Hide resolved

jlebar added 4 commits April 17, 2024 22:22

Review comments.

cc56c67

Run every N hours

a6c8aa9

Rebase

bdcfbde

jlebar force-pushed the cache-triton-artifacts branch 4 times, most recently from 335075b to 5ea8de6 Compare April 18, 2024 02:44

Print diff if pre-commit fails.

52b1708

jlebar force-pushed the cache-triton-artifacts branch from 5ea8de6 to 52b1708 Compare April 18, 2024 02:49

jlebar enabled auto-merge (squash) April 18, 2024 02:49

jlebar merged commit 4303eab into main Apr 18, 2024
5 checks passed

jlebar deleted the cache-triton-artifacts branch April 18, 2024 02:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cache Triton compilation artifacts during CI. #3680

Cache Triton compilation artifacts during CI. #3680

jlebar commented Apr 16, 2024 •

edited

Loading

antiagainst left a comment

jlebar commented Apr 18, 2024

Cache Triton compilation artifacts during CI. #3680

Cache Triton compilation artifacts during CI. #3680

Conversation

jlebar commented Apr 16, 2024 • edited Loading

antiagainst left a comment

Choose a reason for hiding this comment

jlebar commented Apr 18, 2024

jlebar commented Apr 16, 2024 •

edited

Loading