-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support serverless #36649
Support serverless #36649
Conversation
This pull request does not have a backport label.
To fixup this pull request, you need to add the backport labels for the needed
|
Excited to see this! I will review more next week, but I first wanted to just test it out but I'm having some trouble. After building metricbeat with this PR, I'm running setup against a Serverless instance on production and the process is hanging on the initial ping. I tried running regular setup and with the
I also tried completely omitting the API key to see if I got an auth error and adding the port number explicitly to the ES host URL but it all hangs on this initial ping. Anything I might be missing? |
@joshdover So, I've run into this problem a few times. The URL that beats is trying to connect to is
and I did a curl myself, no response. I'm guessing you entered |
@fearful-symmetry since you are closest to this you can create this list :) In particular create issues for any parts of this that are unclear to you so that we can discuss them. |
This error message should be special cased to either ILM or DSL depending on which type we detect we are connected to. Remember this error is going to be seen by users who have never used Filebeat before, if we don't tell them exactly what to do they likely won't know what the solution is themselves. |
#setup.dsl.check_exists: true | ||
|
||
# Overwrite the lifecycle policy at startup. The default is false. | ||
#setup.dsl.overwrite: false |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You need a newline at the end of the file otherwise there is a missing space between the end of this section and the next one:
# Overwrite the lifecycle policy at startup. The default is false.
#setup.dsl.overwrite: false
# =================================== Kibana ===================================
# Starting with Beats version 6.0.0, the dashboards are loaded via the Kibana API.
# This requires a Kibana endpoint configuration.
Similarly for the error:
We should be able to suggest which one of ILM or DSL should be used because we know which ES type we are connected to. |
If I set the following DSL configuration I get an error. If I remove the policy_name setting and use the default it works: setup.dsl.enabled: true
# Set the lifecycle policy name. The default policy name is
# 'filebeat'.
setup.dsl.policy_name: mypolicy
setup.dsl.policy_file: "dsl_policy.json"
setup.dsl.overwrite: true ❯ cat dsl_policy.json
{"data_retention": "71d"}
❯ ./filebeat setup --index-management
Exiting: error loading template: error updating lifecycle policy: error creating policy from config: error submitting policy: error creating lifecycle policy: got 404 from elasticsearch: {"error":{"root_cause":[{"type":"index_not_found_exception","reason":"no such index [mypolicy]","resource.type":"index_or_alias","resource.id":"mypolicy","index_uuid":"_na_","index":"mypolicy"}],"type":"index_not_found_exception","reason":"no such index [mypolicy]","resource.type":"index_or_alias","resource.id":"mypolicy","index_uuid":"_na_","index":"mypolicy"},"status":404} If I set It seems like the |
@cmacknz Yah, the My initial idea was just to keep the DSL and ILM config the same, but you're probably right, |
It seems like it can be a pattern so |
#setup.dsl.enabled: true | ||
|
||
# Set the lifecycle policy name or pattern. For DSL, this name must match the data stream that the lifecycle is for. | ||
# The default data stream pattern is %{[beat.name]}-%{[beat.version]}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Were you intended to actually have %{[beat.name]}-%{[beat.version]}
be templated or did you want this literal text in the docs?
I don't know that I've seen this format used elsewhere but I might be missing it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah, you're right, that's probably not the right way to do that...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, turns out there's no template variable for .VersionName
or anything similar, so the next best thing is metricbeat-%{[agent.version]}
an such.
Kinda baffled by the tar error in the packaging step, let's try that again... |
/test |
Alright, think we're at the point where we can force-merge? |
Yes, one more test and I'll merge it. I still think some of the error messages can be improved but that doesn't need to block this PR, will file a follow up for that. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Re-tested the latest DSL changes with the data_stream_pattern config, still works. Thank!
Merging this one.
@elastic/fleet-qasource-external please test that ILM continues to work. You will need to add manual test cases for the new data stream lifecycle (DSL) configuration when using a serverless project. The instructions are in the PR description. It would also be a good idea to do some exploratory testing around ILM and DSL. |
@elastic/fleet-qasource-external we will want to test this for each Beat individually to confirm that:
|
@elastic/fleet-qasource-external The reference documentation for configuring ILM can be found in each Beat's reference configuration file: ILMbeats/filebeat/filebeat.reference.yml Lines 2413 to 2435 in e322104
A sample ILM configuration is below which can be saved to a file for testing: {
"policy": {
"phases": {
"hot": {
"min_age": "0ms",
"actions": {
"rollover": {
"max_age": "10d",
"max_primary_shard_size": "50gb"
}
}
}
}
}
} A sample ILM configuration for use with # ====================== Index Lifecycle Management (ILM) ======================
setup.ilm.enabled: false
setup.ilm.policy_name: "mypolicy"
setup.ilm.policy_file: "ilm_template.json"
#setup.ilm.check_exists: true
setup.ilm.overwrite: true To setup ILM you will need to first generate an API key which can be done from stack management. Note that the API key has to have the Beats format. The loaded lifecycle policies can be viewed in Stack Management. The default ILM policy for filebeat will be named DSLbeats/filebeat/filebeat.reference.yml Lines 2437 to 2462 in e322104
To setup DSL you will need to generate an API key which can be done from the Serverless project security page: A sample DSL configuration is below which can be saved to a file for testing: {"data_retention": "5d"} If you run # ======================== Data Stream Lifecycle (DSL) =========================
setup.dsl.enabled: true
setup.dsl.data_stream_pattern: "filebeat-*"
setup.dsl.policy_file: "dsl_policy.json"
# setup.dsl.check_exists: true
setup.dsl.overwrite: true |
Hi @cmacknz We have completed the testing by installing all beats on stateful 8.12.0 SNAPSHOT kibana cloud environment and 8.12.0 SNAPSHOT serverless environment. Beats Installed:
Build details:
Please let us know if we are missing any scenario needs to be covered here. Thanks!! |
Thanks for those tests, looks great! |
* make serverless integration tests run * update deps * linter, error handling * still fixing error handling * fixing old formatting verbs * still finding format verbs * add docs, fix typos * initial functional pass * fix setup, config * fix naming of config section * add headers * make linter happy * still making linter happy * tinkering with tests * still fixing tests * revert file * tinker with export * fix logging in tests * fix load checking in setup * fix url in integration test * fix commented out test line * stil tinkering with integraton test * fix bad init in tests, add more check to ES handler * add init checks for client handler, add more unit tests * make template loader serverless aware * change naming, error handling, rework config system * fix up integration tests * clean up load tests * stil making linter happy * simplify manager init, fix tests, update docs * minor test fixes * clean up tests * clean up typos, remove legacy error handling * expand logging * logging, error handling changes * change error messages * update lifetimes for serverless elasticsearch * fix integration tests * change error handling, clean up log messages * tinker with DSL config name * update docs * fix name example
This is a rather large PR that updates beats to provide DSL support and setup on serverless. Some of the parts are:
setup.dsl.*
sectionAs an added note: the index management code is pretty complex and labyrinthine, and while I tried to simplify some things, it's still (needlessly) complex. I'm fairly certain that it would be possible to significantly simplify a lot of this code, but due to the tight deadlines of serverless support, it probably won't happen as part of this PR.
Checklist
CHANGELOG.next.asciidoc
orCHANGELOG-developer.next.asciidoc
.How to test this PR locally
When I started working on this, my hope was that I could just glue on DSL support without touching the existing ILM code, but that wasn't the case. This means we'll need to test new DSL support, and existing ILM support.
Here's a high-level overview of suggested tests, which will need to be performed against both serverless and stateful instances
[beat] setup --index-management
against a fresh ES instance, with all default settings[beat] setup --index-management
against a fresh ES instance, with a custom policy file[beat] setup --index-management
against a fresh ES instance, with both ILM and DSL enabled in the config, ensure beats fails[beat] setup --index-management
against an existing ES instance with existing datastream config, all default settings, insure we don't overwrite[beat] setup --index-management
against an existing ES instance with existing datastream config, withoverwrite
enabled, ensure we have overwritten data.[beat] setup --index-management
with nosetup.*
flags set in the config, ensure defaults are properly set up.overwrite
enabled and a custom policy on the second setup. Ensure the policy is successfully updated.Tests to run over serverless only:
setup.template.name
andsetup.dsl.policy_name
to different values, runsetup --index-management
. A warning message should be printed.setup.template.name
andsetup.dsl.policy_name
to the same value, runsetup --index-management
, then run again with a custom policy, andsetup.dsl.overwrite
set to trueTo create a custom policy for testing:
For DSL: create a file with
{"data_retention": "71d"}
, point to it with thesetup.dsl.policy_file
valueFor ILM: ILM policies are a little more complex, see here for an example, point to it with the
setup.ilm.policy_file
valueValidating setup
To validate the
setup --index-management
, there are two ES endpoints to check:_data_stream
points to the data stream, and contains alifecycle
section that should contain a valid lifecycle policy (see the above section for an example). The other endpoint is_index_template
, which should contain a matching index template that links to the data stream.Known issues
config
library inelastic-agent-libs
(issue forthcoming), if a user has bothsetup.ilm.*
andsetup.dsl.*
config values, one will need to be explicitly set todisabled
, even if the other is explicitly set toenabled
Config principles
Beats now has to care about both ILM and DSL config, as well as the upstream elasticsearch. Beats should behave based on these principles:
Undefined behavior
Right now, there's some edge cases where I'm not sure about the correct behavior.
overwrite
enabled will fail, This is because the initial index management setup injects the lifecycle policy directly into the template policy. However, to update the DSL policy, we must make a separate REST call to an endpoint that includes the template name. If the user has set a custom template name but not a policy name, the code will revert to a default (and incorrect) template name. Should the user be expected to settemplate.name
andpolicy_name
? Should the code silently defer totemplate.name
? Should the initial setup fail and tell the user to correct their config if one is set but not the other?