Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Flaky Test]: Multiple tests – failed to fetched test output at ... #4279

Closed
rdner opened this issue Feb 19, 2024 · 7 comments
Closed

[Flaky Test]: Multiple tests – failed to fetched test output at ... #4279

rdner opened this issue Feb 19, 2024 · 7 comments
Assignees
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team

Comments

@rdner
Copy link
Member

rdner commented Feb 19, 2024

Failing test case

It's not a test case, the error happens after tests finished

Error message

Failed for instance windows-amd64-2022-fleet (@ 35.232.65.142): failed to execute tests on instance ogc-windows-amd64-2022-fleet-2486: error running sudo tests: failed to fetched test output at %home%\agent\build\TEST-go-remote-windows-amd64-2022-fleet-sudo.integration.out

Build

https://buildkite.com/elastic/elastic-agent/builds/7261#018dc1f1-d326-455c-bd41-fcad4fa2e0af

OS

Linux, Mac, Windows

Stacktrace and notes

>> (windows-amd64-2022-fleet) Failed for instance windows-amd64-2022-fleet (@ 35.232.65.142): failed to execute tests on instance ogc-windows-amd64-2022-fleet-2486: error running sudo tests: failed to fetched test output at %home%\agent\build\TEST-go-remote-windows-amd64-2022-fleet-sudo.integration.out

Looks like the SSH session stopped working since this code is failing:

resultPkg.Output, err = sshClient.GetFileContents(ctx, outputPath+".out", WithContentFetchCommand("type"))
if err != nil {
return OSRunnerPackageResult{}, fmt.Errorf("failed to fetched test output at %s.out", outputPath)
}
resultPkg.JSONOutput, err = sshClient.GetFileContents(ctx, outputPath+".out.json", WithContentFetchCommand("type"))
if err != nil {
return OSRunnerPackageResult{}, fmt.Errorf("failed to fetched test output at %s.out.json", outputPath)
}
resultPkg.XMLOutput, err = sshClient.GetFileContents(ctx, outputPath+".xml", WithContentFetchCommand("type"))
if err != nil {
return OSRunnerPackageResult{}, fmt.Errorf("failed to fetched test output at %s.xml", outputPath)
}

@rdner rdner added Team:Elastic-Agent Label for the Agent team flaky-test Unstable or unreliable test cases. labels Feb 19, 2024
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@pazone
Copy link
Contributor

pazone commented Feb 19, 2024

I see it's passed after 2 retries. Shouldn't we add retries to GetFileContents in the ssh.go?

@rdner
Copy link
Member Author

rdner commented Feb 20, 2024

@pazone we can, but I think we should also identify why the connection to hosts was unreliable, it happened twice again in https://buildkite.com/elastic/elastic-agent/builds/7264#018dc323-a3fd-423d-bed4-75fcf4027ffc

Looks like the issue occurs periodically and lasts for a few hours.

@cmacknz
Copy link
Member

cmacknz commented Feb 20, 2024

I think we add retries here regardless in case it improves the situation. It also looks like we drop the underlying error from resultPkg.Output, err so we can't tell exactly what the failure is.

@cmacknz
Copy link
Member

cmacknz commented Feb 20, 2024

If we can catch this while the VM is still alive we can check what the instance is doing in GCP.

@rdner
Copy link
Member Author

rdner commented Feb 21, 2024

@rdner
Copy link
Member Author

rdner commented Mar 13, 2024

Closing in favor of #4356

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
flaky-test Unstable or unreliable test cases. Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

4 participants