Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix flaky TestFQDN #3097

Merged
merged 3 commits into from
Jul 20, 2023
Merged

Fix flaky TestFQDN #3097

merged 3 commits into from
Jul 20, 2023

Conversation

ycombinator
Copy link
Contributor

@ycombinator ycombinator commented Jul 18, 2023

What does this PR do?

This PR tries to fix the flaky TestFQDN end-to-end test. It also adds documentation on using test namespaces to avoid flaky tests.

Why is it important?

So we don't have flaky tests.

@mergify
Copy link
Contributor

mergify bot commented Jul 18, 2023

This pull request does not have a backport label. Could you fix it @ycombinator? 🙏
To fixup this pull request, you need to add the backport labels for the needed
branches, such as:

  • backport-v./d./d./d is the label to automatically backport to the 8./d branch. /d is the digit

NOTE: backport-skip has been added to this pull request.

@elasticmachine
Copy link
Contributor

elasticmachine commented Jul 18, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-07-20T00:39:19.528+0000

  • Duration: 32 min 34 sec

Test stats 🧪

Test Results
Failed 0
Passed 5939
Skipped 27
Total 5966

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@elasticmachine
Copy link
Contributor

elasticmachine commented Jul 18, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.718% (77/78) 👍
Files 67.416% (180/267) 👍
Classes 66.329% (327/493) 👍
Methods 53.642% (1031/1922) 👍
Lines 39.907% (11779/29516) 👎 -0.061
Conditionals 100.0% (0/0) 💚

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 19, 2023

Buildkite CI (build 1840) failed on TestEnrollAndLog but TestFQDN passed (1 pass out of 1 attempt). Re-running...

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 19, 2023

Again, buildkite CI (build 1844) failed on TestEnrollAndLog but TestFQDN passed (2 passes out of 2 attempts). Re-running...

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator ycombinator added skip-changelog backport-v8.9.0 Automated backport with mergify Testing and removed backport-skip labels Jul 19, 2023
@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 19, 2023

This time buildkite CI (build 1845) failed on TestFQDN but the failure seems ephemeral, having to do with Agent enrollment:

=== RUN   TestFQDN
    fqdn_test.go:75: Set FQDN on host to luxbjo.baz.io
    fqdn_test.go:79: Enroll agent in Fleet with a test policy
    fqdn_test.go:95: Creating enrollment API key...
    fqdn_test.go:95: Unpacking and installing Elastic Agent
    fetcher.go:90: Using existing artifact elastic-agent-8.10.0-SNAPSHOT-linux-x86_64.tar.gz
    fixture.go:198: Extracting artifact elastic-agent-8.10.0-SNAPSHOT-linux-x86_64.tar.gz to /tmp/TestFQDN1512827491/001
    fixture.go:211: Completed extraction of artifact elastic-agent-8.10.0-SNAPSHOT-linux-x86_64.tar.gz to /tmp/TestFQDN1512827491/001
    fixture.go:425: Components were not modified from the fetched artifact
    fixture.go:342: >> running agent with: [/tmp/TestFQDN1512827491/001/elastic-agent-8.10.0-SNAPSHOT-linux-x86_64/elastic-agent install --force --non-interactive --url https://0d2825bf0abb421cb776abdeaf733296.fleet.us-central1.gcp.qa.cld.elstc.co:443 --enrollment-token MWt1N2E0a0JXVHJUV3pxOFZ4LUg6TGpTRTJWZWZReldzS2NOZlRzcFJQZw==]
    fqdn_test.go:95: Installing in non-interactive mode.{"log.level":"info","@timestamp":"2023-07-19T01:20:23.936Z","log.origin":{"file.name":"cmd/enroll_cmd.go","file.line":478},"message":"Starting enrollment to URL: https://0d2825bf0abb421cb776abdeaf733296.fleet.us-central1.gcp.qa.cld.elstc.co:443/","ecs.version":"1.6.0"}
        Error: fail to enroll: could not save enrollment information: failed to ensure key: could not get agent key: cipher: message authentication failed
        For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.10/fleet-troubleshooting.html
        Error: enroll command failed with exit code: 1
        For help, please see our troubleshooting guide at https://www.elastic.co/guide/en/fleet/8.10/fleet-troubleshooting.html
        
    fqdn_test.go:96: 
        	Error Trace:	/home/ubuntu/agent/testing/integration/fqdn_test.go:96
        	Error:      	Received unexpected error:
        	            	unable to enroll Elastic Agent: error running agent install command: exit status 1
        	Test:       	TestFQDN
    fqdn_test.go:58: Un-enrolling Elastic Agent...
    fqdn_test.go:61: Restoring hostname...
    fqdn_test.go:65: Restoring original /etc/hosts...
--- FAIL: TestFQDN (48.29s)

This is not how this test has failed in the past when being flaky. It has failed on verifying the hostname in logs-* and metrics-* indices. So I don't think we should count this failure here as the flaky test failure this PR is trying to fix. Re-running...

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 19, 2023

Buildkite CI (build 1847) passed. Which means TestFQDN passed (3 passes out of 3 attempts). Running again...

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 19, 2023

Buildkite CI (build 1848) passed again. Which means TestFQDN passed (4 passes out of 4 attempts). Running again...

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 19, 2023

Buildkite CI (build 1852) passed again. Which means TestFQDN passed (5 passes out of 5 attempts). Running again...

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 19, 2023

Buildkite CI (build 1859) passed again. Which means TestFQDN passed (6 passes out of 6 attempts). Running again...

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator
Copy link
Contributor Author

Buildkite CI (build 1866) passed again. Which means TestFQDN passed (7 passes out of 7 attempts). Running again...

@ycombinator
Copy link
Contributor Author

buildkite test this

@ycombinator
Copy link
Contributor Author

ycombinator commented Jul 19, 2023

Buildkite CI (build 1873) passed again. Which means TestFQDN passed (8 passes out of 8 attempts). Running again...

Aiming for 10 attempts total, in case people are wondering how many episodes are in this saga.

@ycombinator
Copy link
Contributor Author

/test

@ycombinator ycombinator added the Team:Elastic-Agent Label for the Agent team label Jul 19, 2023
@ycombinator
Copy link
Contributor Author

Buildkite CI (build 1882) passed again. Which means TestFQDN passed (9 passes out of 9 attempts). Running again... hopefully one final time 🤞.

@ycombinator
Copy link
Contributor Author

/test

@ycombinator
Copy link
Contributor Author

Buildkite CI (build 1883) failed on TestEnrollAndLog but TestFQDN passed (10 passes out of 10 attempts).

I think the fix in this PR is good to remove the flakiness in TestFQDN. Putting PR into review.

@ycombinator ycombinator marked this pull request as ready for review July 20, 2023 03:58
@ycombinator ycombinator requested a review from a team as a code owner July 20, 2023 03:58
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@jlind23 jlind23 enabled auto-merge (squash) July 20, 2023 05:33
@pierrehilbert
Copy link
Contributor

Thanks @ycombinator for the time you spent on this.
Let's merge it when the last integration run will be successful

@pierrehilbert pierrehilbert changed the title [WIP] Fix flaky TestFQDN Fix flaky TestFQDN Jul 20, 2023
@jlind23 jlind23 merged commit d2162bb into elastic:main Jul 20, 2023
10 checks passed
mergify bot pushed a commit that referenced this pull request Jul 20, 2023
* Pass namespace to FQDN ES query
* Add documentation on namespace usage
* Add warning

(cherry picked from commit d2162bb)

# Conflicts:
#	docs/test-framework-dev-guide.md
pierrehilbert added a commit that referenced this pull request Jul 20, 2023
* [WIP] Fix flaky `TestFQDN` (#3097)

* Pass namespace to FQDN ES query
* Add documentation on namespace usage
* Add warning

(cherry picked from commit d2162bb)

# Conflicts:
#	docs/test-framework-dev-guide.md

* Update test-framework-dev-guide.md

---------

Co-authored-by: Shaunak Kashyap <[email protected]>
Co-authored-by: Pierre HILBERT <[email protected]>
@ycombinator ycombinator deleted the fix-it-test-fqdn branch July 20, 2023 11:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
backport-v8.9.0 Automated backport with mergify skip-changelog Team:Elastic-Agent Label for the Agent team Testing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants