Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Intermittent MaxRetryError (and others) in CI unit test runs #899

Open
pont-us opened this issue Sep 29, 2023 · 5 comments · Fixed by #918
Open

Intermittent MaxRetryError (and others) in CI unit test runs #899

pont-us opened this issue Sep 29, 2023 · 5 comments · Fixed by #918

Comments

@pont-us
Copy link
Member

pont-us commented Sep 29, 2023

Describe the bug

Recently, CI unit test suite jobs have been producing increasingly frequent intermittent test failures due to time-outs and excessive retries. The problem seems to occur most frequently on GitHub, but occasionally also on AppVeyor. This issue was prompted by this GitHub actions run which produced the following error:

FAILED test/webapi/ows/stac/test_routes.py::StacRoutesTest::test_fetch_catalog_collection_single_items - 
urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=34113): Max retries exceeded with url: 
/ogc/collections/demo/items (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f8a94db71d0>: 
Failed to establish a new connection: [Errno 111] Connection refused'))

We can use this issue to collect further instances of the problem.

So far our fix for such errors has been "re-run the job and hope it goes away", which it generally does, but this is turning into something of a time sink.

To Reproduce
Steps to reproduce the behavior: keep re-running the GitHub unittest job until an error occurs.

Expected behavior
All tests should pass reliably on every CI run.

Additional context
With luck, this might be fairly easily fixable by tweaking some back-off / time-out / retry parameters in ServerTestCase or similar.

@pont-us
Copy link
Member Author

pont-us commented Oct 10, 2023

Another one, from https://github.com/dcs4cop/xcube/actions/runs/6467629870/job/17559781773:

FAILED test/webapi/places/test_routes.py::PlacesRoutesTest::test_places - urllib3.exceptions.MaxRetryError: 
HTTPConnectionPool(host='localhost', port=33397): Max retries exceeded with url: /places (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f6942170a10>: Failed to establish a new connection: 
[Errno 111] Connection refused'))

@pont-us
Copy link
Member Author

pont-us commented Oct 10, 2023

And another from https://github.com/dcs4cop/xcube/actions/runs/6467629870/job/17562750121:

FAILED test/webapi/ows/stac/test_routes.py::StacRoutesTest::test_fetch_catalog_collections - urllib3.exceptions.MaxRetryError: 
HTTPConnectionPool(host='localhost', port=53383): Max retries exceeded with url: /ogc/collections (Caused by 
NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f09cc4ae090>: Failed to establish a new connection: 
[Errno 111] Connection refused'))

@pont-us
Copy link
Member Author

pont-us commented Oct 11, 2023

https://github.com/dcs4cop/xcube/actions/runs/6467629870/job/17571143347

FAILED test/webapi/datasets/test_routes.py::DatasetsRoutesTest::test_fetch_datasets - urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=39147): Max retries exceeded with url: /datasets (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb970811990>: Failed to establish a new connection: [Errno 111] Connection refused'))
FAILED test/webapi/s3/test_routes.py::S3RoutesNewTest::test_fetch_get_s3_object - urllib3.exceptions.MaxRetryError: HTTPConnectionPool(host='localhost', port=54325): Max retries exceeded with url: /s3/datasets/demo.zarr/.zattrs (Caused by NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7fb98c58f950>: Failed to establish a new connection: [Errno 111] Connection refused'))

@pont-us
Copy link
Member Author

pont-us commented Dec 15, 2023

Not sure if this one is related, but this test run on AppVeyor produced

ERROR test/core/zarrstore/test_generic.py::CommonS3ZarrStoreTest::test_it - TimeoutError: timed out

@thomasstorm
Copy link
Contributor

thomasstorm commented Jan 18, 2024

Reopened as per @TonioF's comment. The PR addresses the current problems, but probably new related issues will appear, therefore the issue stays open.

#918 (review)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants