62221 Parallelise the performance tests #8190

johnbillion · 2025-01-25T02:28:41Z

This change introduces a job matrix for the "current", "before", and "base" performance tests to replace the current behaviour of running them sequentially in a single job.

This speeds up the overall performance workflow run by 18-20 minutes 🕐 .

Reasoning

The order of the tests at the moment is "current" followed by "before" followed by "base". This means the database upgrade routine between the tests is actually being asked to perform a downgrade, which is not supported. In a PR, the "before" test is potentially being performed with a database schema from the proposed change. This problem is causing the tests in #21022 Switch to using bcrypt for hashing passwords #7333 to unnecessarily fail because of a change to the structure of data in the database which is not supported in the "before" or "base" version of the codebase.
This change removes the potential for to tests to interfere with one another. The current tests rely on running the database upgrade, flushing the cache, and deleting transients between test runs, which doesn't guarantee that the "before" and "base" tests are running in a clean environment compared to the "current" test. By separating the tests into separate jobs, every test run starts with a clean environment and uses the same setup steps.
The original intention of performing the tests sequentially in the same job was to reduce the chance that an environmental factor on GitHub Actions affects the comparison between the "current" and "before" tests that gets reported in a PR. I think in practice this is unlikely and uncommon, and the benefit of having tests that complete significantly faster outweighs this concern. This has no effect on the test reporting that's sent to codevitals.run.
Bonus: Nicely grouped jobs on the GitHub Actions workflow run screens.

Notes

The "v2" workflow files are needed so the tests on old branches continue to work as they do currently.
Published "base" test results can be seen at https://webhook.cool/at/yellow-ice-88/. I'm using webhook.cool to avoid spamming codevitals.run with results from this PR.

Future enhancements

The "base" run result should be cached so it theoretically only ever needs to run once per release and then all subsequent performance runs pull its cached results. Needs more work that's outside of the scope of this change.
The "before" run result should be cached so multiple pushes to a PR don't unnecessarily re-run the same tests. Same as above, this needs more work so I will look into it at a later date.

Todo

Update inline docs in the workflow files
Reinstate the comparison
Reinstate the reporting
Add logic to only run the base test on a push to trunk

Trac ticket: https://core.trac.wordpress.org/ticket/62221

…other run.

…ance tests as they do currently.

johnbillion · 2025-01-28T16:17:21Z

@swissspidy @desrosj @sirreal @joemcgill What are your thoughts on this approach? The PR still needs some tweaks but the bulk of it is ready.

Latest results here: https://github.com/WordPress/wordpress-develop/actions/runs/13012122443

swissspidy · 2025-01-28T16:42:24Z

The original intention of performing the tests sequentially in the same job was to reduce the chance that an environmental factor on GitHub Actions affects the comparison between the "current" and "before" tests that gets reported in a PR. I think in practice this is unlikely and uncommon

IIRC @dmsnell looked into that before and there can actually be quite some fluctuation, even depending on the time of day.

dmsnell · 2025-01-30T10:17:05Z

Thanks for the note, @swissspidy — I did in fact observe consistent variation when attempting to answer the question, “Does this release exhibit the same response times for the home page as the old release does?”

That variation existed on the order of hours and while I’m sure there was an explanation, it was on an isolated test runner off Github and I did’t dive into figuring out what caused it. The impact of this variation was massive, however, significantly higher than any code changes, so my recommendation for performance testing is that run tests over the course of at least six to ten hours and interleave runs of the control and test groups so that we could avoid this long-running bias, in case our tests happen to run near the point where they appear or disappear.

For CI jobs where the results are more like suggestions or questions about performance than official or reliable reporting, I don’t see why this should be a blocker. The tests can be re-run. In fact, at some point I think I also considered running the tests concurrently since they only maxed out a single CPU core. Running on two or more cores would end up being a more realistic test environment, even though it introduces more uncertainty.

johnbillion · 2025-01-30T17:04:28Z

Thanks for the info Dennis! In that case I'll carry on with this PR (as it still needs work) because it seems that running the current/before/base tests in parallel won't affect the overall concern about the accuracy of the comparisons compared to what we have now.

There's a chance that they'll actually be more accurate because all three tests will start from a fresh installation, but looking back at the workflow runs on this PR it doesn't seem to make a measurable difference. But I'll continue looking at the numbers.

# Conflicts: # .github/workflows/end-to-end-tests.yml # .github/workflows/phpunit-tests.yml # .github/workflows/test-build-processes.yml

joemcgill · 2025-01-30T19:40:36Z

The original intention of performing the tests sequentially in the same job was to reduce the chance that an environmental factor on GitHub Actions affects the comparison between the "current" and "before" tests that gets reported in a PR.

Originally, we only had two runs: "current" and "base", where the baseline is only run on commits to trunk (e.g., if: ${{ github.event_name == 'push' && github.ref == 'refs/heads/trunk' }}) in order to have a consistent control measurement to send to the Code Vitals dashboard to normalize these fluctuations over time. Reducing side-effects from the test runner was the main reason these are being run in the same environment.

However, your observations about the way these are already potentially affecting each other by reusing the same DB is valid, i.e.:

This change removes the potential for to tests to interfere with one another.

In most cases, these tests are just being used as a spot check for folks reviewing the code if they suspect there might be a performance concern with a PR before it's committed, and for that use case, speeding up these runs seems much more valuable than isolating environmental side-effects. So running the "before" tests in parallel at minimum seems like a good idea.

For the base tests, I suspect that any commits that consistently lead to a major regression would still be visible in the dashboard if these are run in parallel, so it's probably worth the experiment to parallelize those as well, so I'm ok with us giving it a try and observing whether it negatively affects our ability to identify performance regressions during a release.

github-actions · 2025-02-01T19:48:47Z

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props johnbillion, swissspidy, dmsnell, joemcgill.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

johnbillion · 2025-02-01T20:15:57Z

Committed in https://core.trac.wordpress.org/changeset/59749

johnbillion added 23 commits January 25, 2025 02:24

Split the performance tests into three jobs so they run in parallel.

2bdcaa9

Prevent unnecessary tests running in this branch.

e11aae7

Naming.

8b62844

This does need to use a script because it fetches an artifact from an…

a12a65f

…other run.

Start the environment in time.

5442eff

We always need a build so all the themes are available.

06072c5

Naming.

3ac9b9a

Compare the results.

8ee02d7

Naming.

f8f53e7

Todos.

6506842

Need a checkout here.

5979877

Reinstate the logging.

50e8501

Reuse the tag name via workflow output.

9090810

Reinstate the original version number handling.

bad5906

Skip the token check for now.

bd82165

Correct this prefix handling.

3739509

Split the comparison and publishing into a reusable workflow.

8430bc3

Prepare for running the base test only on pushes to trunk.

3398fa1

Hack the label.

a977c27

A v2 workflow is needed so older branches continue to run the perform…

9cfff3d

…ance tests as they do currently.

Keep shellcheck happy.

d1fad83

Merge branch 'trunk' into 62221-parallel-performance-tests

4a55471

Update the runners.

3d53664

johnbillion added 2 commits January 30, 2025 17:51

Merge branch 'trunk' into 62221-parallel-performance-tests

25caafc

# Conflicts: # .github/workflows/end-to-end-tests.yml # .github/workflows/phpunit-tests.yml # .github/workflows/test-build-processes.yml

Update the paths filtering.

f575835

johnbillion added 13 commits January 31, 2025 01:17

Consolidate the logic to determine whether to publish the report.

ab180b8

Tidying up.

7146739

Consolidate the logic for the subjects and target SHA.

4ec1e53

Boop.

f0a6cb5

Combine these steps into one.

d39178a

Finalise the logic for publishing the results.

fba63a7

Merge branch 'trunk' into 62221-parallel-performance-tests

df15f9d

Label tweaks.

d925c05

Reintroduce the base test for now.

0e97a1a

Ensure the correct version of Node is used during reporting.

a872e9c

Docs.

74f2ee1

Is this causing a parsing problem?

f57ce00

You what.

8892e8b

johnbillion marked this pull request as ready for review February 1, 2025 19:48

johnbillion closed this Feb 1, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

62221 Parallelise the performance tests #8190

62221 Parallelise the performance tests #8190

johnbillion commented Jan 25, 2025 •

edited

Loading

johnbillion commented Jan 28, 2025

swissspidy commented Jan 28, 2025

dmsnell commented Jan 30, 2025

johnbillion commented Jan 30, 2025

joemcgill commented Jan 30, 2025

github-actions bot commented Feb 1, 2025

johnbillion commented Feb 1, 2025

62221 Parallelise the performance tests #8190

62221 Parallelise the performance tests #8190

Conversation

johnbillion commented Jan 25, 2025 • edited Loading

Reasoning

Notes

Future enhancements

Todo

johnbillion commented Jan 28, 2025

swissspidy commented Jan 28, 2025

dmsnell commented Jan 30, 2025

johnbillion commented Jan 30, 2025

joemcgill commented Jan 30, 2025

github-actions bot commented Feb 1, 2025

johnbillion commented Feb 1, 2025

johnbillion commented Jan 25, 2025 •

edited

Loading