Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize CI workflow. #1743

Merged
merged 30 commits into from
Jan 18, 2024
Merged

Optimize CI workflow. #1743

merged 30 commits into from
Jan 18, 2024

Conversation

player-03
Copy link
Contributor

@player-03 player-03 commented Jan 12, 2024

The -eval option means most of the CI tasks no longer have to wait for all the others. If we run more of them in parallel, users will be able to get much quicker feedback on their pull requests.

This way, it doesn't have to wait for the slow macos-ndll step.
We want as few steps as possible to have to wait for macos-ndll. There's no way around package-haxelib, but `-eval` gives us a much better option for ios-ndll.
Not required, but it'll make the workflow summary neater.
I'm almost sure Homebrew is what makes macos-ndll so slow, but this will help confirm it.
This function attempts to skip steps if the corresponding function was not overridden. However, in `-eval` mode, it may incorrectly detect the `@ignore` metadata on subclass functions, causing it to always skip.

Running an empty function is not a problem, but we wouldn't want to log that step if nothing was being done. Therefore I switched the check to only apply to the log call.

For one reason or another, the "watch" command can't be skipped, so I left it alone.
If switching to Linux is what made android-ndll so much faster, we should do the same for android-samples.
This removes the need to wait on macos-ndll.
@player-03 player-03 marked this pull request as draft January 14, 2024 00:52
air-samples and neko-samples run on Haxe 3, which didn't have Eval mode, so they need to wait for the host platform's ndll.

Everything else only needs the ndll that will be included in the app (if any). Now we have some samples that can start immediately!
Maybe this requirement could be removed in the future? Not sure why `lime setup` would even need it.
Some of these make sense, but I have no idea why air-samples is complaining about hxcpp.

I also don't understand why macos-samples needs the submodules (or perhaps why linux-samples doesn't), but `-eval` should get around that.
For air-samples, this means downloading the submodules so that `lime rebuild tools` can work.

For windows-samples, we can use `-eval` to skip `rebuild tools` entirely.
@player-03
Copy link
Contributor Author

At this point, it's possible to complete everything else while waiting for the brew bundle step of macos-ndll.

Tasks

Apparently Homebrew dropped support for macOS 11 sometime last month, which may line up with when it started being extremely slow. Worth a shot!
@player-03
Copy link
Contributor Author

Credit to @Apprentice-Alchemist for figuring out why Homebrew was so slow.

@player-03
Copy link
Contributor Author

Workflow succeeded in 25 minutes, compared to ~1h45m before this PR.

@player-03 player-03 marked this pull request as ready for review January 14, 2024 05:31
@joshtynjala
Copy link
Member

Great to see the CI finishing faster! I'm concerned about one thing that you changed, though. The removal of this dependency from the samples builds:

needs: package-haxelib

I made the samples builds rely on package-haxelib on purpose. The idea was to ensure that the Haxelib .zip bundle that we're generating is valid, and can build projects for all supported targets.

@player-03
Copy link
Contributor Author

I see where you're coming from. On the one hand, with the brew issue fixed (for now), going back to the package-haxelib bottleneck shouldn't slow things down too much. On the other hand, it has been packaging correctly for years, it would throw an error if any of the pieces were missing, and we're testing the pieces individually. I don't see how it could start going wrong now.

How about this as a compromise? We test each target as soon as the required ndlls become available, saving time. Later when package-haxelib finishes, we test its output by downloading the bundle on Windows/Mac/Linux, and building a sample in HashLink. (HashLink because all we're trying to do is prove that the bundled version of Lime works.)

@joshtynjala
Copy link
Member

How about this as a compromise? We test each target as soon as the required ndlls become available, saving time. Later when package-haxelib finishes, we test its output by downloading the bundle on Windows/Mac/Linux, and building a sample in HashLink. (HashLink because all we're trying to do is prove that the bundled version of Lime works.)

Works for me!

For instance, we're merging "android-ndll" with "android-samples" and merging "windows-ndll" with "windows-samples." There's no need to set up the same environment twice.
We need to make sure that lime-haxelib.zip is enough, on its own, to run Lime. And it needs to work on all three of Windows, Mac, and Linux.

Since this code will run at the very end of the workflow, not in parallel with anything, it's important to keep it quick. HashLink, Neko, and Flash are good candidates, but of those three, only HashLink is maintained.
It's quick too, might as well test package-haxelib a bit more thoroughly.
This allows it to run immediately, rather than having to wait for Windows. Plus, it makes the workflow summary clearly legible again.

The downside is that `-eval` wasn't available in Haxe 3, so we can no longer test that. However, I just added more Neko+Haxe 3 tests, so I think it balances out.
@player-03
Copy link
Contributor Author

Ok, the variables gave me a fair bit of trouble, and I think the 25-minute run might have been a fluke, but it's working.

Workflow

@player-03
Copy link
Contributor Author

Hmm, some of the Neko runs take a while to download the artifact. Maybe there are just too many in parallel. Could scale it back, I guess.

Workflow

The post-package-haxelib jobs all need to download its output, and some of them (seemingly at random) spend an extra ~6m waiting for it.
@player-03
Copy link
Contributor Author

Ugh, reducing the number of jobs didn't even help. All of the time saved was because the iOS builds happened to go a bit faster.

The ios job is currently the biggest bottleneck, and this sample is the least informative of the three.
Because apparently Mac can be the bottleneck sometimes.
@player-03
Copy link
Contributor Author

As an experiment, let's try going back to hard-coded runs-on values. It's possible GitHub optimizes based on knowing in advance which machines to use, and that's how we got that 25-minute run.

@player-03 player-03 force-pushed the organize-workflow branch 2 times, most recently from aaebef7 to 053e01f Compare January 17, 2024 16:33
I guess you can use job output variables there, but not environment ones. But the environment variable is nicer for everything else, so we'll just not use a variable here.
@player-03
Copy link
Contributor Author

It's definitely faster with hard-coded machines to run on. Still not 25 minutes, but I think that was only possible because we didn't test lime-haxelib.zip. @joshtynjala, does this look good to merge?

@joshtynjala joshtynjala self-requested a review January 18, 2024 06:49
@player-03 player-03 merged commit a414f64 into openfl:develop Jan 18, 2024
23 checks passed
@player-03 player-03 deleted the organize-workflow branch January 18, 2024 18:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants