Replies: 8 comments 18 replies
-
@'ing platform maintainers/reviewers here to draw attention: ArmVirtPkg: @samimujawar @leiflindholm @kraxel microvm: @kraxel AmdSev: @ruleof2 @jyao1 @mxu9 @tlendacky @mdroth Xen: @tperard LoongArchQemu: @kilaterlee @bibo-mao @lixianglai |
Beta Was this translation helpful? Give feedback.
-
Having maintained a company's CI servers for a few years and worked to reduce the time for CI to run from 20 minutes to 5, I like to think I have some experience around CI. Personally, I think much of what we're doing at the moment is wrong. It feels like the standard GitHub runners are slow, while we have "Larger GitHub-hosted runners" unprovisioned. We appear to be working around that by also using Azure, but that's pretty confusing for anyone looking at the CI system. Other significant open source projects such as FreeBSD and coreboot use Jenkins and don't require everything to be in "the cloud" (i.e. somebody else's computer). I've already offered use of my 128-core AMD EPYC server, and I also have a 96-core Ampere Altra machine I could put in a colo. In terms of pre-commit vs post-commit vs nightly, I think: Pre-commit should check for style and spelling issues, which should be very quick. Then do a set of build tests - how many I'm not sure, but probably not multiple per top-level package? Post-commit, we can run a full firmware build including ideally posting firmware images for OVMF, ArmPkg etc. Then nightly we can run more time-consuming checks such as Coverity, CodeQL etc. We might also have tasks that should be done weekly, such as making sure every single combination of toolchain/OS/architecture builds. We should also consider how edk2-platforms fits into this, since I don't think it should be a second-class citizen. |
Beta Was this translation helpful? Give feedback.
-
|
Beta Was this translation helpful? Give feedback.
-
@'ing platform maintainers/reviewers here to draw attention again: ArmVirtPkg: @samimujawar @leiflindholm @kraxel microvm: @kraxel AmdSev: @ruleof2 @jyao1 @mxu9 @tlendacky @mdroth Xen: @tperard LoongArchQemu: @kilaterlee @bibo-mao @lixianglai Hi platform maintainers, We have not gotten feedback from platform maintainers except for Rebecca. We would like to get any feedback. Our next steps are to make a proposed set of changes to PR gate CI re:platform CI builds in PR form and have the platform maintainers/reviewers added to the PR. If there is no feedback from maintainers on the PR, we will check it in and will await further feedback from maintainers. I am going to try a few other channels to reach platform maintainers. If you don't have any opinion on the subject, please let us know in this forum as well so that we can be aware you saw this and have no opinion. |
Beta Was this translation helpful? Give feedback.
-
I think for ArmVirtPkg we can have nightly builds. I understand that timezone differences can be an issue. However, having regular nightly builds will help catch issues earlier. |
Beta Was this translation helpful? Give feedback.
-
Thanks @kilaterlee and @samimujawar for weighing in. Definitely agreed that the term nightly here is not quite appropriate, we could also call it daily :). The central idea being that not all of these checks may need to be run on every PR and instead can be run periodically to catch issues. We will need to define a process on what happens when a daily/nightly (or other) build fails. The goal would be to have one representative of ArmVirt/Ovmf be run on PRs, which should catch the general cases and then the subflavors only run periodically, since they typically align with the main ArmVirt or Ovmf build. @kilaterlee , the platform maintainer does not need to commit a PR to trigger platform CI checks. My statement there was saying that if we do not hear from platform maintainers, the CI subgroup will put up a PR redefining the PR gates as we see fit and ask all platform maintainers for review. If they do not review that, we will assume they do not wish to weigh in on the subject and move forward with our plan. |
Beta Was this translation helpful? Give feedback.
-
I want to refocus this particular "platform" thread. Let me clarify a few things about this discussion versus the larger CI discussion.
We have a total of 30 parallel agents. The unique perspective needed from reaching out to platform owners here is:
We are balancing "core" versus "package" utilization of CI resources and we need this information to determine how much to allocate to platforms. Ideally, we need all platforms to schedule a combined maximum of 10 jobs (out of the 30) so we do not oversaturate build agents. Pushing the final allocation of resources, generic improvements to CI, and other topics aside. For your platform, can you please list what checks you need to run on every PR update (for code that affects your platform) and what checks you think can be moved to a less regular schedule such as "Post-Merge" (not run in a PR at all), "Nightly" (aka daily), "Other" (custom that you mention)? |
Beta Was this translation helpful? Give feedback.
-
Well, one reason for OvmfPkg and ArmVirtPkg being built is that it these are complete firmware images, so you'll get some end-to-end testing, and with the OVMF images being booted to efi shell even a basic smoke test to see whenever the image actually works. I think this is a check worth keeping in PR CI runs, because changes outside OvmfPkg can break OVMF, and in case that happens there is a high chance it breaks other platforms too. There is a large number of OVMF configurations though, and I think we do not need them all to archive that effect. For example the xen / microvm / bhyve build configurations could be separated. Run them only on PRs which actually change OvmfPkg code. |
Beta Was this translation helpful? Give feedback.
-
There is an ongoing effort to improve edk2's CI processes, in particular the length of time spent on PR gates. There are currently ~90 CI jobs that run on every PR and about half of them are from three packages: ArmVirtPkg, EmulatorPkg, and OvmfPkg. The reason these packages contribute so many jobs to CI is that there are many platforms within each package that each run CI in every PR.
The request from the edk2 CI working group is to get feedback from the platform maintainers on how we can reduce the overall number of jobs from these three packages while still running valuable checks and keeping overall run time to a minimum.
The weekly tools and CI meeting has been discussing this topic, which some of the platform maintainers have been part of. We would like to get feedback from the maintainers by the next instance Oct. 28th at 8am PST, maintainers are also welcome to join this meeting to discuss. Please use this discussion to record feedback prior to then.
The CI working group will take this feedback and make a set of proposals. In conjunction with this, the group is looking at reducing and optimizing other jobs.
At the 10/21 meeting the working group came up with a list of questions to solicit feedback on:
To give more clarity and context to this list, the CI working group has been discussing new methods of CI in edk2. We may choose to move some current PR status checks to a nightly CI run to reduce load on PR gates. Another proposal was reducing the set of platforms in edk2 to a more minimal, core set (with others migrating to edk2-platforms). This would ensure that edk2 can boot in what is considered a core virtual environment but does not check each flavor under ArmVirt and OVMF, as by and large these are similar and if one fails, others are likely to.
Again, there are other optimizations being discussed, this discussion is just focused on the platform side as that contributes half of the jobs running in each PR gate and we want to ensure each platform maintainer has a chance to weigh in.
Beta Was this translation helpful? Give feedback.
All reactions