Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Invalid policy changes can be accepted because of ephemeral variable state #2887

Open
faec opened this issue Jun 15, 2023 · 1 comment
Open
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team

Comments

@faec
Copy link
Contributor

faec commented Jun 15, 2023

Agent's Coordinator regenerates its component model whenever it receives a change to its policy or variables. Whether this generation succeeds depends on both the variables and the policy -- some policy updates may succeed with one set of variables but not another. An example where this becomes a serious problem is the following (abbreviated) input config:

...
host: "${kubernetes.pod.ip}:1234"
condition: env_input_enabled = "true"

This always expands to a bad policy because the EQL syntax has an error: if the user wants to check their input flag, they need to use ${env_input_enabled} = "true" instead.

Now suppose this policy is sent to an Agent that doesn't yet know its value for kubernetes.pod.ip (or whatever other context variable the config depends on). Agent silently skips any inputs with missing variables, and it stops checking the rest of the policy as soon as it finds one, so the condition field isn't validated. This policy change will generate a valid component model that omits this input, and it will be reported to Fleet as successful.

If the Kubernetes metadata is then refreshed, producing new variables, Agent will try again to generate its component model, and will fail when it reaches condition. It will then enter an unhealthy state no matter what the values of the previously missing variables are.

The core problem here is that our AST processing that generates the component model depends on the current values of the variables -- this error could be detected and reported when we first receive the policy change, but we only verify the parts of the policy that are in active use. Instead, we should validate/preprocess the whole policy regardless of what the variables are, leaving the variable substitution for last, so we know that we can still produce a well-formed component model for any variables we are given. (This doesn't guarantee that the resulting components will always be healthy, but it guarantees that we at least have an unambiguous configuration to give them.)

Note: this issue had different symptoms prior to 8.8. In older versions, invalid EQL syntax wasn't reported as an error, but instead silently evaluated to false (changed in this PR). In that case, this policy wouldn't report an explicit error, but would instead silently skip the configured input no matter what variables were set.

Related issues:

@faec faec added bug Something isn't working Team:Elastic-Agent Label for the Agent team labels Jun 15, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/elastic-agent (Team:Elastic-Agent)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working Team:Elastic-Agent Label for the Agent team
Projects
None yet
Development

No branches or pull requests

2 participants