You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
A few ideas have been discussed to improve CI time. Since no one has had time to look into them, this post tracks the ideas in one place so they're accessible and can be discussed further in the comments section.
Reduce the number of jobs created on pull requests and CI builds.
Pipelines are composed of jobs which are composed of steps. Pipelines are scheduled onto build machines (agents) at job granularity. Today, a matrix of build configurations kicks off many jobs for a single pipeline run.
Currently, there are a lot of jobs created each time a pull request is pushed. For example:
Given that the total number of parallel jobs available to edk2 is 30 (which is already over the 10available to most public projects), even a single PR can create a large burden on CI.
Measures are in place once a PR run is started to use PR Eval to prevent unnecessary builds in jobs but scheduling jobs to realize that is still expensive.
The ask here is to analyze and try to move jobs out of the most frequently used PR paths if possible.
Collect and Publish CI Metrics
Gather detailed statistics about CI usage and publish the stats where possible to make the information more widely accessible and actionable.
Basic Acceptance Testing (BAT)
Today all jobs are kicked off for the pipeline run. The idea here is to define a very small set of work that is most representative of the main checks that fail and run them early and do not run additional jobs if any fail. This can prevent PRs with basic errors from consuming server time.
Suggestion: Include at least one platform build to represent each platform (GCC).
Adjust CI Builds
Today, after a PR is merged, all the jobs get kicked off again in "CI" instead of "PR" pipelines. If something was excluded in PR eval during PR, it might be found here. This does create a lot of overhead during times when PRs are frequently being merged.
Mergify is used today to order PRs into the master branch and ensure atomicity. Given that, the idea here is to structure checks in such a way as to eliminate redundant checks between a PR gate and CI builds when a PR is merged.
Next steps: Perhaps move CI runs to "daily" to reduce overall pipeline impact.
More Agents
Whether paying for additional Azure agents to support parallel jobs or introducing a self-hosted pool. In general, self-hosted pools are difficult to manage for open-source repos.
CodeQL - Follow up on whether anything is needed here (likely not - fairly fast today)
Explore moving more work onto GitHub agents to reduce ADO load
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
A few ideas have been discussed to improve CI time. Since no one has had time to look into them, this post tracks the ideas in one place so they're accessible and can be discussed further in the comments section.
Reduce the number of jobs created on pull requests and CI builds.
Pipelines are composed of jobs which are composed of steps. Pipelines are scheduled onto build machines (agents) at job granularity. Today, a matrix of build configurations kicks off many jobs for a single pipeline run.
Currently, there are a lot of jobs created each time a pull request is pushed. For example:
Windows VS2019 PR
Ubuntu GCC5 PR
PlatformCI_OvmfPkg_Windows_VS2019_PR
PlatformCI_OvmfPkg_Ubuntu_GCC5_PR
PlatformCI_EmulatorPkg_Windows_VS2019_PR
PlatformCI_EmulatorPkg_Ubuntu_GCC5_PR
PlatformCI_ArmVirtPkg_Ubuntu_GCC5_PR
tianocore.PatchCheck
Total Jobs Per PR: ~96
Given that the total number of parallel jobs available to edk2 is
30
(which is already over the10
available to most public projects), even a single PR can create a large burden on CI.Measures are in place once a PR run is started to use PR Eval to prevent unnecessary builds in jobs but scheduling jobs to realize that is still expensive.
The ask here is to analyze and try to move jobs out of the most frequently used PR paths if possible.
Collect and Publish CI Metrics
Gather detailed statistics about CI usage and publish the stats where possible to make the information more widely accessible and actionable.
Basic Acceptance Testing (BAT)
Today all jobs are kicked off for the pipeline run. The idea here is to define a very small set of work that is most representative of the main checks that fail and run them early and do not run additional jobs if any fail. This can prevent PRs with basic errors from consuming server time.
Suggestion: Include at least one platform build to represent each platform (GCC).
Adjust CI Builds
Today, after a PR is merged, all the jobs get kicked off again in "CI" instead of "PR" pipelines. If something was excluded in PR eval during PR, it might be found here. This does create a lot of overhead during times when PRs are frequently being merged.
Mergify is used today to order PRs into the
master
branch and ensure atomicity. Given that, the idea here is to structure checks in such a way as to eliminate redundant checks between a PR gate and CI builds when a PR is merged.Next steps: Perhaps move CI runs to "daily" to reduce overall pipeline impact.
More Agents
Whether paying for additional Azure agents to support parallel jobs or introducing a self-hosted pool. In general, self-hosted pools are difficult to manage for open-source repos.
Beta Was this translation helpful? Give feedback.
All reactions