Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

62221 Parallelise the performance tests #8190

Closed

Conversation

johnbillion
Copy link
Member

@johnbillion johnbillion commented Jan 25, 2025

This change introduces a job matrix for the "current", "before", and "base" performance tests to replace the current behaviour of running them sequentially in a single job.

This speeds up the overall performance workflow run by 18-20 minutes 🕐 .

Reasoning

  1. The order of the tests at the moment is "current" followed by "before" followed by "base". This means the database upgrade routine between the tests is actually being asked to perform a downgrade, which is not supported. In a PR, the "before" test is potentially being performed with a database schema from the proposed change. This problem is causing the tests in #21022 Switch to using bcrypt for hashing passwords #7333 to unnecessarily fail because of a change to the structure of data in the database which is not supported in the "before" or "base" version of the codebase.
  2. This change removes the potential for to tests to interfere with one another. The current tests rely on running the database upgrade, flushing the cache, and deleting transients between test runs, which doesn't guarantee that the "before" and "base" tests are running in a clean environment compared to the "current" test. By separating the tests into separate jobs, every test run starts with a clean environment and uses the same setup steps.
  3. The original intention of performing the tests sequentially in the same job was to reduce the chance that an environmental factor on GitHub Actions affects the comparison between the "current" and "before" tests that gets reported in a PR. I think in practice this is unlikely and uncommon, and the benefit of having tests that complete significantly faster outweighs this concern. This has no effect on the test reporting that's sent to codevitals.run.
  4. Bonus: Nicely grouped jobs on the GitHub Actions workflow run screens.

Notes

  • The "v2" workflow files are needed so the tests on old branches continue to work as they do currently.
  • Published "base" test results can be seen at https://webhook.cool/at/yellow-ice-88/. I'm using webhook.cool to avoid spamming codevitals.run with results from this PR.

Future enhancements

  • The "base" run result should be cached so it theoretically only ever needs to run once per release and then all subsequent performance runs pull its cached results. Needs more work that's outside of the scope of this change.
  • The "before" run result should be cached so multiple pushes to a PR don't unnecessarily re-run the same tests. Same as above, this needs more work so I will look into it at a later date.

Todo

  • Update inline docs in the workflow files
  • Reinstate the comparison
  • Reinstate the reporting
  • Add logic to only run the base test on a push to trunk

Trac ticket: https://core.trac.wordpress.org/ticket/62221

@johnbillion
Copy link
Member Author

@swissspidy @desrosj @sirreal @joemcgill What are your thoughts on this approach? The PR still needs some tweaks but the bulk of it is ready.

Latest results here: https://github.com/WordPress/wordpress-develop/actions/runs/13012122443

@swissspidy
Copy link
Member

The original intention of performing the tests sequentially in the same job was to reduce the chance that an environmental factor on GitHub Actions affects the comparison between the "current" and "before" tests that gets reported in a PR. I think in practice this is unlikely and uncommon

IIRC @dmsnell looked into that before and there can actually be quite some fluctuation, even depending on the time of day.

@dmsnell
Copy link
Member

dmsnell commented Jan 30, 2025

Thanks for the note, @swissspidy — I did in fact observe consistent variation when attempting to answer the question, “Does this release exhibit the same response times for the home page as the old release does?”

That variation existed on the order of hours and while I’m sure there was an explanation, it was on an isolated test runner off Github and I did’t dive into figuring out what caused it. The impact of this variation was massive, however, significantly higher than any code changes, so my recommendation for performance testing is that run tests over the course of at least six to ten hours and interleave runs of the control and test groups so that we could avoid this long-running bias, in case our tests happen to run near the point where they appear or disappear.

For CI jobs where the results are more like suggestions or questions about performance than official or reliable reporting, I don’t see why this should be a blocker. The tests can be re-run. In fact, at some point I think I also considered running the tests concurrently since they only maxed out a single CPU core. Running on two or more cores would end up being a more realistic test environment, even though it introduces more uncertainty.

@johnbillion
Copy link
Member Author

Thanks for the info Dennis! In that case I'll carry on with this PR (as it still needs work) because it seems that running the current/before/base tests in parallel won't affect the overall concern about the accuracy of the comparisons compared to what we have now.

There's a chance that they'll actually be more accurate because all three tests will start from a fresh installation, but looking back at the workflow runs on this PR it doesn't seem to make a measurable difference. But I'll continue looking at the numbers.

# Conflicts:
#	.github/workflows/end-to-end-tests.yml
#	.github/workflows/phpunit-tests.yml
#	.github/workflows/test-build-processes.yml
@joemcgill
Copy link
Member

The original intention of performing the tests sequentially in the same job was to reduce the chance that an environmental factor on GitHub Actions affects the comparison between the "current" and "before" tests that gets reported in a PR.

Originally, we only had two runs: "current" and "base", where the baseline is only run on commits to trunk (e.g., if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/trunk' }}) in order to have a consistent control measurement to send to the Code Vitals dashboard to normalize these fluctuations over time. Reducing side-effects from the test runner was the main reason these are being run in the same environment.

However, your observations about the way these are already potentially affecting each other by reusing the same DB is valid, i.e.:

This change removes the potential for to tests to interfere with one another.

In most cases, these tests are just being used as a spot check for folks reviewing the code if they suspect there might be a performance concern with a PR before it's committed, and for that use case, speeding up these runs seems much more valuable than isolating environmental side-effects. So running the "before" tests in parallel at minimum seems like a good idea.

For the base tests, I suspect that any commits that consistently lead to a major regression would still be visible in the dashboard if these are run in parallel, so it's probably worth the experiment to parallelize those as well, so I'm ok with us giving it a try and observing whether it negatively affects our ability to identify performance regressions during a release.

@johnbillion johnbillion marked this pull request as ready for review February 1, 2025 19:48
Copy link

github-actions bot commented Feb 1, 2025

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props johnbillion, swissspidy, dmsnell, joemcgill.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

@johnbillion
Copy link
Member Author

@johnbillion johnbillion closed this Feb 1, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants