Reduce CI usage #14983

straight-shoota · 2024-09-06T13:15:39Z

We're currently running 57 individual workflows in CI on every commit,¹ and counting (#14964).

Some runs are fairly small, like the library compatiblity tests for OpenSSL and libpcre which take ~25 seconds each and most of that is setup. Not too much to worry about those. But most are orders of magnitude bigger and produce a quite noticable load.

The majority of workflows run std_spec, compiler_spec, build the compiler itself and std_spec again or a part of that. The full routine of bin/ci build usually takes 30-40 minutes.

Our CI runners are generously sponsored by GitHub, so using more resources doesn't incur an immediate cost for us. But we should still use the resources responsibly. And we suffer from significant congestion when there's lots of activity because parallel runners are limited.²

I think we have some potential to reduce the number of runs for some workflows. We don't need to test everything on every commit.

Reduce matrix in `linux.yml`

We currently run a matrix job to test forward compatibility for every single Crystal version since 1.0.0. That's a total of 14 versions and only getting more.
I think we can safely reduce that number. I do not recall if this has ever brought any valuable insight. If something breaks compatibility with older compilers, it's usually broken from a specific compiler version downwards. So testing the oldest and most recent versions (currently 1.0.0 and 1.13.2) should theoretically be sufficient. You only need the versions in between to pinpoint where exactly the breakage appears, but that's part of the debugging process and doesn't need to be in CI.
We could still keep a couple more versions in between for due diligence, but there's definitely no need to test all versions on every commit. Perhaps we could run the full set on release builds (maintenance and nightlies) just to be sure.

Limit `llvm.yml` & other library version tests

We're currently testing support with all major LLVM versions between 13 and 18. These jobs also run on every commit. We certainly want to keep testing all these versions as long as they're supported.

But it would not be necessary to do that on every commit. I think we could limit this workflow to only run when llvm related source code is directly affected (src/llvm).
Similar restrictions could apply to other workflows that test library-support across multiple versions.

The problem with these is that changes outside the code tree somewhere else in stdlib can have an effect as well. If a change in src/pointer.cr would break something in llvm it would get unnoticed because the workflow doesn't run. The chances for this are probably quite low and we have some general coverage with std_spec as well.
We should make sure to run all workflows on release builds (nightlies and maintenances) though.

Reduce smoke tests

We run smoke test for targets that are somewhat supported but we don't have any CI runners for these platforms. Currently, these are 9 platforms.

Smoke test means we only build the object files for std_spec, compiler_spec and compiler for the respective target, but do not actually link it or execute any code.

So these tests are naturally quite limited. They can only detect platform-specific compile time errors. These may happen when working on code related to a specific platform, but otherwise they're very unlikely. And changes to the platform-specific code should be expected to be tested on the respective platform anyway, so smoke test won't do much.

I think we can easily limit smoke tests to run only in release builds.

Prerequisites

In Windows CI we have a couple of workflows to build the requires libraries. Those are cached so these workflows usually just download the cached values and do nothing else. Later jobs directly pull the assets from cache. The lib jobs are just to ensure the cache is populated. They are quite lightweight at ~20 seconds, but these jobs are basically useless in ~99% of runs.
Perhaps we could find a more efficient way to provide lib assets to the build jobs? On Linux we're using Docker images which contain all necessary dependencies, and Nix on macos.

Other measurements

I'm sure there are other things we could do to improve the performance of individual workflows. But they may require more research and digging. Hard to say upfront what would be fruitful.

Ideas such as #13413 come to mind.

All information is based on the state of the latest completed CI run on master, at this point that's https://github.com/crystal-lang/crystal/commit/a310dee1bbf30839964e798d7cd5653c5149ba3d ↩
For example, on Monday September 2, 2024 there were 7 successful runs of the Linux CI workflow with an average duration (time to completion, i.e. wait time + run time) of 64 minutes. On Thursday, September 5 there were 22 succesful runs with an average duration of 103 minutes. ↩

The text was updated successfully, but these errors were encountered:

HertzDevil · 2024-10-04T06:04:31Z

For PRs like #14969 and #15052 coming from branches of this repository itself rather than from forks, is there any point in running the same set of jobs for both push and pull request triggers?

straight-shoota · 2024-10-04T06:38:29Z

No, there's not really any point. But it's not trivial to deactivate that.
We had a proposal before in #9890 but that wasn't good enough. #10636 is pending review.

oprypin · 2024-10-04T06:42:23Z

No, but there is a point to run tests on pull requests and there is a point to run tests when you push to a branch to check something.

There are alternatives such as

the widespread bad one: not running push tests when pushing to a branch, which I hate with a passion. But if everyone else is really fine with always creating pull requests to try something out, then I can concede
not running pull request tests when the repository is the same, so the push tests must be running already
https://github.com/abseil/abseil-py/blob/296c08b0137fc3576cafbb0b85aea5fc06c391c1/.github/workflows/test.yml#L14
This one is actually something I've had on my radar to try out, it's just worth double-checking what edge case effects it might have.
But nobody seems to push to a branch of this repo anyway when creating pull requests, so this condition is not going to kick in, other than the bot's PRs
maybe we want to just hardcode somehow (similarly to the above example?): if the repo is this particular one and the branch is not main or a release branch, then don't run tests on push. Then people can still try things out on their fork.

oprypin · 2024-10-04T06:43:55Z

Seems [CI] Don't trigger GHA pull_request on PRs from local branches #10636 is a likely better way to implement the effect of (2) from the above list

straight-shoota · 2024-10-04T08:17:16Z

Regarding 3.

if the repo is this particular one and the branch is not main or a release branch, then don't run tests on push. Then people can still try things out on their fork.

That won't work. At least not without exceptions (and that would make it complicated).
We regularly push branches to this repo explicitly to run CI (without creating a PR because the code is still WIP and not ready for review, or testing some integration which will never lead to a PR).

straight-shoota added status:discussion topic:infrastructure/ci labels Sep 6, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Reduce CI usage #14983

Reduce CI usage #14983

straight-shoota commented Sep 6, 2024 •

edited

Loading

HertzDevil commented Oct 4, 2024 •

edited

Loading

straight-shoota commented Oct 4, 2024 •

edited

Loading

oprypin commented Oct 4, 2024

oprypin commented Oct 4, 2024

straight-shoota commented Oct 4, 2024 •

edited

Loading

Reduce CI usage #14983

Reduce CI usage #14983

Comments

straight-shoota commented Sep 6, 2024 • edited Loading

Reduce matrix in linux.yml

Limit llvm.yml & other library version tests

Reduce smoke tests

Prerequisites

Other measurements

Footnotes

HertzDevil commented Oct 4, 2024 • edited Loading

straight-shoota commented Oct 4, 2024 • edited Loading

oprypin commented Oct 4, 2024

oprypin commented Oct 4, 2024

straight-shoota commented Oct 4, 2024 • edited Loading

straight-shoota commented Sep 6, 2024 •

edited

Loading

Reduce matrix in `linux.yml`

Limit `llvm.yml` & other library version tests

HertzDevil commented Oct 4, 2024 •

edited

Loading

straight-shoota commented Oct 4, 2024 •

edited

Loading

straight-shoota commented Oct 4, 2024 •

edited

Loading