enable parallel tests with pytest-xdist and pytest-split #2874

ytausch · 2024-07-28T19:32:51Z

Enabling pytest-xdist (which is already in the list of our dependencies) brings down test time from 12 minutes to around 8:30, which saves us 3:30 min per test run.

For the model tests, test runtime is roughly equal but I still enabled it for consistency.

As you can see in the commit history, I also evaluated using pytest-split for splitting tests across multiple GitHub Runners but ~~could not see a significant performance gain because of the overhead involved with building the Docker container~~. I found a naive approach (building the Docker container 4 times in parallel) that brings us down another minute to 7:20, saving 40 % wait time in total.

Are there any tests on the MongoDB that could break with this? @beckermr

Pydantic model updated or no update needed

codecov · 2024-07-28T19:42:39Z

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 76.97%. Comparing base (7ecb031) to head (18387e6).

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #2874      +/-   ##
==========================================
- Coverage   76.98%   76.97%   -0.02%     
==========================================
  Files         112      112              
  Lines       11975    11977       +2     
==========================================
  Hits         9219     9219              
- Misses       2756     2758       +2

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

ytausch · 2024-07-28T22:52:42Z

@0xbe7a Thoughts about this?

ytausch · 2024-07-28T23:00:36Z

There seems to be something wrong with the coverage. Converting to draft.

beckermr · 2024-07-29T00:00:57Z

The tests used to break in general with this which is why I turned it off. It's a good idea to try again.

beckermr · 2024-07-29T13:22:45Z

Pytest split is really cool!

The mongodb tests happen against a test instance in a docker container in the GitHub runner, so that should work fine.

beckermr · 2024-07-29T13:23:39Z

We'll need to adjust the required tests for this pr. I will do that once we merge it.

ytausch · 2024-07-29T14:29:53Z

Note: I am wildly pushing here to evaluate different approaches.

I just found out that running pytest-xdist (parallelization within runners) disables the MongoDB tests because the sensitive environment variable MONGODB_CONNECTION_STRING is not visible for the worker processes, as they are hidden by the sensitive_env logic. Might be that my original runtime measures are incorrect because of this.

ytausch · 2024-07-29T15:41:22Z

Okay, so here is a summary of what I found out:

enabling pytest-xdist requires us force the MongoDB tests to run serially as they write to the same collections and could interfere with each other - I think @beckermr you might not have considered that in your reply? I implemented such a serialization approach here: ytausch@a41b902
- Edit: pytest.mark.xdist_group might even be better (docs)
enabling pytest-xdist disables the MongoDB tests (see my comment above) - a solution for that would need to be developed

In the diff of this PR (or here, you find my analysis of the runners we already use in this project. From the analysis, I figure out that a splitting the tests across 4 runners is too much. Requesting more runners than available does lead to test times to double, which we should avoid.

For that reason, I tried splitting the tests across 2 runners only - which lead to the tests split unequally across runners, yielding a runtime of 11:10, which is only an improvement of 50 seconds.
This can be improved by measuring test execution times, which is natively supported by pytest-split. Caching the Test Durations as GitHub Cache Artifacts leads to a test time of 9:30.

It is possible to combine pytest-xdist and pytest-split.

Other approaches for speeding up test times could include:

Optimizing the Docker build process (see Docker Image includes .git folder #2876)

squash! split tests with pytest-split

ytausch · 2024-07-29T17:16:56Z

Open issues:

the coverage decreases unexpectedly
deleting from the GitHub actions cache currently does not work -> this might just because I am an external collaborator to this repo and could go away after merging
enabling python-xdist would require a solution for the MongoDB connecting string env var -> I might do this in a follow-up PR

beckermr · 2024-07-29T21:31:04Z

My opinion is that we should not let the perfect be the enemy of the good and we should go ahead.

Are there tests not being run now and that is causing the coverage decrease?

beckermr

I was thinking through this, and I think we should do the test timings files like the coverage ones.

Specifically,

We upload the test durations as artifacts
download all of them at once with a glob like we do the for the coverage files
combine them with jq
update the cache

For the cache, instead of a delete and then save, which might cause a race condition for another job, we can use a key with say the branch name or github ref in it. Then we can restore adaptive using restore keys and save new caches easily when there is a miss.

ytausch force-pushed the parallel-tests branch 6 times, most recently from 5a6f484 to c2831f8 Compare July 28, 2024 22:24

ytausch marked this pull request as ready for review July 28, 2024 22:30

ytausch changed the title ~~implement parallel tests~~ enable parallel tests with pytest-xsplit Jul 28, 2024

ytausch force-pushed the parallel-tests branch from c2831f8 to 55b3836 Compare July 28, 2024 22:33

ytausch changed the title ~~enable parallel tests with pytest-xsplit~~ enable parallel tests with pytest-xsplit and pytest-split Jul 28, 2024

ytausch marked this pull request as draft July 28, 2024 23:00

ytausch force-pushed the parallel-tests branch 2 times, most recently from 112cff5 to 8cc1d01 Compare July 29, 2024 14:19

add note about GitHub runner allocation

4993579

ytausch force-pushed the parallel-tests branch from 8cc1d01 to 4993579 Compare July 29, 2024 14:58

ytausch changed the title ~~enable parallel tests with pytest-xsplit and pytest-split~~ enable parallel tests with pytest-xdist and pytest-split Jul 29, 2024

ytausch force-pushed the parallel-tests branch 2 times, most recently from 98533c2 to f320339 Compare July 29, 2024 16:02

split tests with pytest-split

cd2fb5d

squash! split tests with pytest-split

ytausch force-pushed the parallel-tests branch 2 times, most recently from 4fd5e8a to a4dc76a Compare July 29, 2024 16:41

cache test durations

c99906a

ytausch force-pushed the parallel-tests branch from a4dc76a to c99906a Compare July 29, 2024 16:50

beckermr added 4 commits August 1, 2024 08:15

Merge branch 'master' into parallel-tests

536faad

Merge branch 'master' into parallel-tests

c40d98f

fix: make sure to remove hashmap key in mongodb

bf66bd6

Merge branch 'master' into parallel-tests

ce24575

beckermr requested changes Aug 5, 2024

View reviewed changes

beckermr and others added 7 commits August 5, 2024 09:07

Merge branch 'master' into parallel-tests

06ea45a

Merge branch 'master' into parallel-tests

0b5c036

Merge branch 'master' into parallel-tests

bafb3f9

ci: try using artifact

2b8414d

fix: bug in combine and restore of test durations

33dbdcd

fix: only copy if file exists

fa8dabb

refactor: use glob

18387e6

beckermr marked this pull request as ready for review August 8, 2024 04:29

beckermr approved these changes Aug 8, 2024

View reviewed changes

beckermr merged commit ec3d3ca into regro:master Aug 8, 2024
8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable parallel tests with pytest-xdist and pytest-split #2874

enable parallel tests with pytest-xdist and pytest-split #2874

ytausch commented Jul 28, 2024 •

edited

Loading

codecov bot commented Jul 28, 2024 •

edited

Loading

ytausch commented Jul 28, 2024

ytausch commented Jul 28, 2024

beckermr commented Jul 29, 2024

beckermr commented Jul 29, 2024

beckermr commented Jul 29, 2024

ytausch commented Jul 29, 2024

ytausch commented Jul 29, 2024 •

edited

Loading

ytausch commented Jul 29, 2024

beckermr commented Jul 29, 2024

beckermr left a comment

enable parallel tests with pytest-xdist and pytest-split #2874

enable parallel tests with pytest-xdist and pytest-split #2874

Conversation

ytausch commented Jul 28, 2024 • edited Loading

codecov bot commented Jul 28, 2024 • edited Loading

Codecov Report

ytausch commented Jul 28, 2024

ytausch commented Jul 28, 2024

beckermr commented Jul 29, 2024

beckermr commented Jul 29, 2024

beckermr commented Jul 29, 2024

ytausch commented Jul 29, 2024

ytausch commented Jul 29, 2024 • edited Loading

ytausch commented Jul 29, 2024

beckermr commented Jul 29, 2024

beckermr left a comment

Choose a reason for hiding this comment

ytausch commented Jul 28, 2024 •

edited

Loading

codecov bot commented Jul 28, 2024 •

edited

Loading

ytausch commented Jul 29, 2024 •

edited

Loading