Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add beat serverless tests & related support #3258

Closed

Conversation

fearful-symmetry
Copy link
Contributor

What does this PR do?

There's a lot going on here. This PR:

  • Makes changes to the integration test framework so it can run beats, and not just elastic-agent
  • Adds a variety of beat tests for test conditions that are effected by serverless
  • Updates the serverless client code to reflect ongoing changes to the serverless APIs
  • Adds new elasticsearch/kibana helpers
  • Adds a few temporary hacks to the magefile so these tests will run in the agent repo

Some important details and caveats also accompany this:

  • As of right now, these tests will not pass, nor are they expected to, as beats currently does not have serverless support. They should all run and pass against a stateful deployment.
  • These tests are in the agent repo, and not beats, as we don't have buildkite set up in beats yet.
  • There's the potential for some false positives with some of the setup tests. The tests check for the presence of things like index templates, dashboards, etc, and thus they can return a false positive if the tests are run against a pre-existing. I'm not sure if it's possible to tag things like dashboards with some kind of metadata so we can check to see if we're testing "our" dashboards.

Why is it important?

We want to get rid of the old python tests, and also we need tests for serverless support.

Checklist

  • My code follows the style guidelines of this project
  • I have commented my code, particularly in hard-to-understand areas
  • I have added tests that prove my fix is effective or that my feature works
  • I have added an integration test or an E2E test

How to test this PR locally

  • pull down
  • run mage -v integration:TestBeatServerless metricbeat
  • To test against a stateful deployment, explicitly set STACK_PROVISIONER to ess

@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

@elasticmachine
Copy link
Contributor

elasticmachine commented Aug 16, 2023

💚 Build Succeeded

the below badges are clickable and redirect to their specific view in the CI or DOCS
Pipeline View Test View Changes Artifacts preview preview

Expand to view the summary

Build stats

  • Start Time: 2023-09-28T16:01:26.384+0000

  • Duration: 25 min 25 sec

Test stats 🧪

Test Results
Failed 0
Passed 6313
Skipped 63
Total 6376

💚 Flaky test report

Tests succeeded.

🤖 GitHub comments

Expand to view the GitHub comments

To re-run your PR in the CI, just comment with:

  • /test : Re-trigger the build.

  • /package : Generate the packages.

  • run integration tests : Run the Elastic Agent Integration tests.

  • run end-to-end tests : Generate the packages and run the E2E Tests.

  • run elasticsearch-ci/docs : Re-trigger the docs validation. (use unformatted text in the comment!)

@fearful-symmetry
Copy link
Contributor Author

buildkite failures appear to be unrelated: Error: fetchBinaryFromArtifactsApi failed for auditbeat on linux/amd64: the artifact was not found

@elasticmachine
Copy link
Contributor

elasticmachine commented Aug 16, 2023

🌐 Coverage report

Name Metrics % (covered/total) Diff
Packages 98.78% (81/82) 👍
Files 65.878% (195/296) 👎 -0.223
Classes 65.574% (360/549) 👎 -0.12
Methods 52.282% (1134/2169) 👎 -0.462
Lines 37.868% (12866/33976) 👎 -0.288
Conditionals 100.0% (0/0) 💚

@pierrehilbert
Copy link
Contributor

@pazone didn't you fix the fetchBinaryFromArtifactsApi issue?

@joshdover
Copy link
Contributor

  • I'm not sure if it's possible to tag things like dashboards with some kind of metadata so we can check to see if we're testing "our" dashboards.

Best option I can think of that wouldn't require changing what Beats does (like injecting a tag) is to have the test code to look at the updated_at field on the Saved Objects that are created. It's not ideal since it relies on clocks but we can probably make the check relatively relaxed - updated_at in the last 5 minutes or something like that.

@pazone
Copy link
Contributor

pazone commented Aug 17, 2023

@pierrehilbert It's a bit another issue. Fixing it

@fearful-symmetry
Copy link
Contributor Author

Alright, took @joshdover 's advice, and the dashboard tests will now check the updated time.

pkg/testing/fetcher_local.go Outdated Show resolved Hide resolved
pkg/testing/fixture.go Outdated Show resolved Hide resolved
pkg/testing/fixture.go Outdated Show resolved Hide resolved
@@ -106,6 +106,7 @@ func (DebianRunner) Copy(ctx context.Context, sshClient *ssh.Client, logger Logg
"failed to remove agent directory before unziping new one: %w. stdout: %q, stderr: %q",
err, stdout, stderr)
}
_, _, _ = sshRunCommand(ctx, sshClient, "sudo", []string{"rm", "-rf", "beats"}, nil)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this removing a random beats directory? We set the remote path to $HOME/agent I don't see why the same cannot be used for beats?

Copy link
Contributor Author

@fearful-symmetry fearful-symmetry Aug 25, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, the working directory is the same here, since tests are still being run out of the agent repo, the only difference is that now we have additional beat artifacts that we upload in addition to the agent working directory.

We could refactor this so we don't have /agent hard-coded, but we'd still need to find a way to delete the working directory and any non-working-directory artifacts. I wasn't too sure what the runner was doing, so I didn't want to blow away the entire home directory or something.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like the best solution here is to probably just to make some of the copy code more generic, and just use some kind of heuristic to delete anything that looks like a working directory or artifact.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Alright, moved things around to make the code a little more generic, at least.

testbeatName string
}

func TestMetricbeatSeverless(t *testing.T) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this being added in the Elastic Agent repository? I get that it might be faster, but probably only a little bit. All the modules you are using for this to work are in pkg/ which I did on purpose so other repositories can use that code. Why not just add this directly to the beats repository?

Copy link
Member

@cmacknz cmacknz Aug 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is being done here because it is much faster, serverless has a rapidly approaching deadline and we need tests to check whether Beats will work.

Beats hasn't been ported to Buildkite yet, it is the most complex build we have and it would have taken too long to do that before writing the tests.

This should be temporary, I'm hoping the work to port Beats to buildkite will start soon. We may be able to leave these tests on a branch depending on the timing to avoid this living for too long.

@pchila pchila removed their request for review August 25, 2023 12:43
@fearful-symmetry
Copy link
Contributor Author

...never seen that error before

[2023-09-13T18:16:37.809Z]         	            	fork/exec C:\Users\jenkins\AppData\Local\Temp\TestRemovePath2272529880\001\subdir\main.exe: Operation did not complete successfully because the file contains a virus or potentially unwanted software.

@pierrehilbert
Copy link
Contributor

pierrehilbert commented Sep 14, 2023

@fearful-symmetry we got this error in multiple other places, it has been introduced in the journey to remove our own folder on Windows. That's something we need to fix. We have an open PR for that: #3411
cc @blakerouse @cmacknz

@lukeelmers
Copy link
Member

We had originally intended to make these public, but we recently decided we would keep them internal for serverless until we had a strong reason not to.

However, I don't think we had considered the use case of beats installing dashboards.

@joshdover Just to follow up on this, the core team decided in elastic/kibana#162418 that for now we will stick with our most recent plan of keeping the saved objects import/export APIs internal while we work out our long term vision for these APIs. This means Beats would indeed need to set x-elastic-internal-origin: Beats in these requests.

Kibana won't care if you set this header unnecessarily, so if it's easier for Beats to just globally always use the header when talking to Kibana, that should be fine too. Sorry for the inconvenience!

@fearful-symmetry
Copy link
Contributor Author

This means Beats would indeed need to set x-elastic-internal-origin: Beats

@lukeelmers does kibana care what the value of that header is? Does it have to be beats ?

@lukeelmers
Copy link
Member

does kibana care what the value of that header is? Does it have to be beats ?

It is convenient to use the application name for tracking/monitoring purposes, however it isn't strictly required -- Kibana doesn't care what value you use.

@fearful-symmetry
Copy link
Contributor Author

So, the actual serverless PR is here: elastic/beats#36649

@cmacknz
Copy link
Member

cmacknz commented Sep 25, 2023

The tests here aren't running in CI per the test report. Otherwise the tests would be failing because the Beats don't actually support serverless yet.

While the tests are in the wrong repository, I think we can just move them to a daily schedule so that we get the test coverage without blocking agent PRs (since no agent code is involved in the standalone Beats tests) while we work on allowing these tests to be triggered from the Beats repository. I will create issues for both of these issues.

We should add a test that ensures that Elastic Agent can be installed and enrolled with a Severless Observability project. Probably we can parameterize the monitoring logs integration test to run against both stateful and serverless. This test shouldn't require the DSL support since Fleet takes care of that for us.

@fearful-symmetry
Copy link
Contributor Author

We should add a test that ensures that Elastic Agent can be installed and enrolled with a Severless Observability project.

As part of this PR? @cmacknz

@fearful-symmetry
Copy link
Contributor Author

Alright, need to pause merging/reviewing this for now, based on this conversation: elastic/beats#36649 (comment)

Turns out under serverless, auth via username:pass and api_key results in different access permissions, which is causing some setup tasks to fail under serverless. Need to change how the tests do auth so they can connect to ES via api_key.

@elastic-sonarqube
Copy link

SonarQube Quality Gate

Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

No Coverage information No Coverage information
0.0% 0.0% Duplication

@fearful-symmetry
Copy link
Contributor Author

@cmacknz I know we talked about it a bit yesterday, but what's the next steps for this? We want to break out all the backing library changes into a different PR? What was the goal with that?

@cmacknz
Copy link
Member

cmacknz commented Sep 28, 2023

@cmacknz I know we talked about it a bit yesterday, but what's the next steps for this? We want to break out all the backing library changes into a different PR? What was the goal with that?

The Beat tests here can't be merged without #3470 or elastic/beats#36675.

The support for provisioning serverless projects can be merged however, and we should have an integration test in the agent repository to test that Elastic Agent can ship data to Elasticsearch as mentioned in #3258 (comment). I'm comfortable with merging that to main and having it gate PRs. Such an agent test will also test that Beats can ship data to ES, but won't test the setup commands.

My goal is to ensure we have appropriate serverless test coverage in the right places, triggered by the correct things. I want to avoid having tests against Beat snapshot images gate agent PRs, because that doesn't make any sense. So let's strip this PR down to just the Beat specific functionality and create a separate PR testing serverless support for agent. We can make a dedicated issue for this if it helps.

@mergify
Copy link
Contributor

mergify bot commented Oct 5, 2023

This pull request is now in conflicts. Could you fix it? 🙏
To fixup this pull request, you can check out it locally. See documentation: https://help.github.com/articles/checking-out-pull-requests-locally/

git fetch upstream
git checkout -b serverless-tests upstream/serverless-tests
git merge upstream/main
git push upstream serverless-tests

@fearful-symmetry
Copy link
Contributor Author

Closed, now at #3658

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants