-
Notifications
You must be signed in to change notification settings - Fork 143
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Flaky Test] VM orchestration is unstable in integration tests #4356
Comments
Pinging @elastic/elastic-agent (Team:Elastic-Agent) |
Not OGC's fault, that is the integration testing framework preparing the instance. OGC doesn't do that. |
Not OGC, OGC doesn't create or prepare any stack. |
Not OGC, OGC doesn't run the tests or fetch the results. |
Just to be clear, OGC only creates the instance with the cloud providers nothing else. Everything else is done by the integration testing framework and is our code. |
@blakerouse would "VM orchestration" be a better term? I will rename this issue then. |
When it comes to OGC failures, sometimes we have something like this: https://buildkite.com/elastic/elastic-agent/builds/8091#018ea9bd-ee66-4a10-b1bc-b8f8030d80bc
Not sure we can do anything about it. |
For this one, yeah not sure we can do anything. |
I updated the description to organize known failures by categories and clean up my comments on this issue. |
I believe this is a new VM orchestration issue:
https://buildkite.com/elastic/elastic-agent/builds/8793#018f58b8-f8c9-4c73-9b40-6f2da4a73974 It is from a backport PR: #4709, I'll try re-running it. |
I moved all the failures that we can actually recover from to #4794 Since we have not had new errors for a while now and there is nothing new to report here, I'm closing this issue in favor of the new one. |
The failures can be categorized in following groups:
Firewall resource not found or already exists (quite often)(should be fixed by #4740)This has been reported in the OGC repository adam-stokes/ogc#28
Examples:
I believe it might be some kind of race condition, we should investigate further.
Networking issues
Tracked by #4794
Permission errors (serverless)
Examples:
SQL error
Examples:
GCP just fails with 500 (rare)
Examples:
Job did not complete in 180 seconds
Examples:
The text was updated successfully, but these errors were encountered: