Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fleet] Support changing the default log level per policy (or globally) #158861

Closed
cmacknz opened this issue Jun 1, 2023 · 22 comments · Fixed by #180607
Closed

[Fleet] Support changing the default log level per policy (or globally) #158861

cmacknz opened this issue Jun 1, 2023 · 22 comments · Fixed by #180607
Assignees
Labels
QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team

Comments

@cmacknz
Copy link
Member

cmacknz commented Jun 1, 2023

Currently in Fleet the agent log level can only be changed per agent. It would be more convenient to be able to change the log level for every agent assigned to a specific policy. Often this is done to reduce the amount of logs generated by agents to make monitoring more cost effective (for example defaulting all agents to the warning level instead of info).

The agent policy exposes the ability to do this but it isn't possible to change it in Fleet. https://github.com/elastic/elastic-agent/blob/fdc46bffaa744f7916d64112b1b52d836f25f6d1/elastic-agent.reference.yml#L175-L177

# Sets log level. The default log level is info.
# Available log levels are: error, warning, info, debug
#agent.logging.level: info

See some requests for this in the comments on elastic/elastic-agent#2212 (comment).

@cmacknz cmacknz added the Team:Fleet Team label for Observability Data Collection Fleet team label Jun 1, 2023
@elasticmachine
Copy link
Contributor

Pinging @elastic/fleet (Team:Fleet)

@octavioranieri
Copy link

+1
The current workaround is to use the API.
The below script does that in bulk for all the online agents.
https://gist.github.com/octavioranieri/116cdb48eab548eb9fc38ae0f79332ca

@nimarezainia
Copy link
Contributor

#158861 just for reference.

@nimarezainia
Copy link
Contributor

@cmacknz any concern about changing this at the policy level and causing a drastic increase in log volume?

@cmacknz
Copy link
Member Author

cmacknz commented Jan 25, 2024

That is a concern, I think most uses of this feature typically want to lower the log level globally. That won't prevent people from accidentally turning on debug logs for 3000 agents, but we could prevent that in the product.

I think it would be reasonable to forbid or warn about enabling the debug level at the policy level. That is the only one that will cause problems.

If we allowed setting the log level per integration that could be another mitigation for this scenario https://github.com/elastic/ingest-dev/issues/2405.

@defensivedepth
Copy link

Yes, please let us lower the log level globally.

@kpollich
Copy link
Member

kpollich commented Apr 9, 2024

To be explicit, what we're looking to add here is a dropdown in the agent policy settings for the log level, e.g.

image

The dropdown should have the same options as the log level selector on the agents logs UI, and should default to info

image

This dropdown should set the agent.logging.level property on the agent policy saved object, and that property should be included in the final compiled policy. This means updating the agent policy SO schema, updating the agent policy API to honor this new value, and ensuring Fleet's OpenAPI spec includes it as well.

@kpollich kpollich assigned criamico and unassigned kaanyalti Apr 9, 2024
@juliaElastic
Copy link
Contributor

We could also add this config with the settings framework, which will be merged soon: https://github.com/elastic/ingest-dev/issues/2471
We would need to add an enum type to the config that is rendered as an EuiSelect.

@juliaElastic
Copy link
Contributor

There is an open issue on elastic-agent side about this setting, we should confirm if this actually works to be set from agent policy level: elastic/elastic-agent#2851

@criamico
Copy link
Contributor

criamico commented Apr 10, 2024

There is an open issue on elastic-agent side about this setting, we should confirm if this actually works to be set from agent policy level: elastic/elastic-agent#2851

I looked in the elastic agent ticket and it seems that it was never done. Also from elastic/elastic-agent#4252 (comment) it seems that the priority of this change was a bit uncertain.

@jlind23 am I good to start working on it even though the elastic agent side of it is not done?
We can hide the dropdown under a feature flag until the elastic agent side is completed, just in case this goes in before FF and the elastic agent part is not done by FF.

@jlind23
Copy link
Contributor

jlind23 commented Apr 11, 2024

@criamico hidding this behind a feature is anyway a good idea so from my perspective you are good to move forward with this one.
Please consider extending @juliaElastic's work on the settings framework,

@criamico
Copy link
Contributor

criamico commented Apr 12, 2024

I need some further clarifications about this task. Trying to summarize them here.

Question 1 - Overrides

Last year we added overrides to the agent policy, is it correct to assume that if logging_level is defined at agent level and we introduce an override of agent.logging.level as described here this should take precedence?
@nchaulet @kpollich

Question 2 - Reset functionality

In the related elastic agent ticket I noticed this comment explaining that we need some way to reset the "agent level" logging so that the "agent policy" level logging can be used. Quoting:

  1. Have a sensible default log level (info)
  2. Set the log level for a group of agents (policy)
  3. Increase the log verbosity on a single agent as a one-off to debug an issue, without affecting other agents on the policy

IMO the precedence order should be:

Default to 1
2 always overrides 1
3 always overrides 2

And this other comment says that we need to implement some sort of reset functionality in Fleet.
It's not totally clear to me how it should work and if it's needed as part of this work, so I'd appreciate to have some guidance about it.
@nimarezainia @kpollich

@kpollich
Copy link
Member

Last year we added #159414 to the agent policy, is it correct to assume that if logging_level is defined at agent level and we introduce an override of agent.logging.level as described #159414 (comment) this should take precedence?

I would expect the override value to take precedence over a policy level setting.

And this other #158861 (comment) says that we need to implement some sort of reset functionality in Fleet.
It's not totally clear to me how it should work and if it's needed as part of this work, so I'd appreciate to have some guidance about it.

This is the first time I'm seeing the proposal to have a reset functionality related to temporarily increasing the log verbosity of a single agent. I think right now we have the log level selector alongside the agent logs component, which can change the log level for a single agent. However there's no concept of resetting that log level back to the policy-level setting, and I don't think we've planned for that as part of this issue.

cc @jlind23

@cmacknz
Copy link
Member Author

cmacknz commented Apr 12, 2024

However there's no concept of resetting that log level back to the policy-level setting, and I don't think we've planned for that as part of this issue.

This means if a user ever uses the per agent log level selection then the policy has permanently lost control of the agent log level for that agent. I think there is a near 100% chance every user will toggle the log level for debugging at least once, which will lead to a very inconsistent experience and likely SDHs about why they still see unexpected log messages.

I think we need this concept to exist as part of this overall feature.

@kpollich
Copy link
Member

Thanks @cmacknz I see what you mean. To support a "reset" or "unset" type action we might be able to create an action with a payload of

type: 'SETTINGS',
data: {
  log_level: null,
},

Though I'm not sure if Fleet Server will handle this right now or if we need to add logic to unset the agent-specific log level.

I do agree this should be figured out before we land this feature as we don't users to have agents that ignore the policy-level setting forever because they toggled the log level from the agent logs UI in Fleet. This is completely nonobvious and very detrimental if we don't address the ability to set the agent's log level back to being controlled by its policy.

@criamico
Copy link
Contributor

criamico commented Apr 12, 2024

@kpollich My changes to add the new agent policy setting are ready. The new setting is hidden with a flag, so we can probably ship that change and create a separate ticket to handle the reset changes. What do you think?

@nimarezainia
Copy link
Contributor

@kpollich @criamico thank you for this detail. I am going to modify the title of this issue. We do not want this issue to change the default log level (I don't think that is being done here just wanted to reiterate). Here we need the ability for the user to set the log level for a group of agents from the policy level.

As discussed above also the user may override this log level at the agent level. Fleet should honor that override.
With the changes above, how would the user reset all the agents' log level to match what's in the policy?

btw the log level content is currently under review (here), we will switch the default log level once we have sensible/useful set of logs at the default level.

@criamico
Copy link
Contributor

With the changes above, how would the user reset all the agents' log level to match what's in the policy?

@nimarezainia as I wrote in #158861 (comment), the changes above only add the new dropdown in agent policy settings, but it is currently hidden under a flag. There's no mechanism to reset the agents' log level yet.

I opened a ticket to track the remaining work: #180778

criamico added a commit that referenced this issue Apr 15, 2024
Closes #158861

## Summary
Expose agent logging level in agent policy settings. 
The new setting is added via the new settings framework and will show up
under "advanced settings". It's currently hidden until the agent
supports it.

### Testing
Enable the settings config:
#180597 (comment)
- Go to the agent policy settings form
- Under advanced settings there is a new dropdown "Agent logging level".
Choose a value and save the policy
- The new value should be retained after saving
- Go to the agent policies tab and select action "View policy"
- The new field should be visible under `agent.logging.level`

<details>
  <summary> Screenshots</summary>
  
![Screenshot 2024-04-12 at 16 21
46](https://github.com/elastic/kibana/assets/16084106/b3083bb5-703a-44c6-a00d-da9d64a2b083)
![Screenshot 2024-04-12 at 16 21
52](https://github.com/elastic/kibana/assets/16084106/bd934262-86f0-4b11-b7b4-d4be72f00715)
![Screenshot 2024-04-12 at 16 22
28](https://github.com/elastic/kibana/assets/16084106/4a352c82-f274-4779-9718-85135f84b0c8)
![Screenshot 2024-04-12 at 16 27
48](https://github.com/elastic/kibana/assets/16084106/11e20051-f91b-44c1-b795-76ef6804cf78)

</details>

### Checklist
- [ ] Any text added follows [EUI's writing
guidelines](https://elastic.github.io/eui/#/guidelines/writing), uses
sentence case text and includes [i18n
support](https://github.com/elastic/kibana/blob/main/packages/kbn-i18n/README.md)
- [ ] [Unit or functional
tests](https://www.elastic.co/guide/en/kibana/master/development-tests.html)
were updated or added to match the most common scenarios

---------

Co-authored-by: Kibana Machine <[email protected]>
@criamico
Copy link
Contributor

@nimarezainia
Based on the above discussion I think that this feature is not going to enabled in 8.14.0 as it is not ready yet. FF is today and I don't think that we have the time to finish #180778. Is it ok to delay it to the next release?

@nimarezainia
Copy link
Contributor

@nimarezainia Based on the above discussion I think that this feature is not going to enabled in 8.14.0 as it is not ready yet. FF is today and I don't think that we have the time to finish #180778. Is it ok to delay it to the next release?

@criamico yes that is ok. Thank you for your diligence on this. We are working to find different ways the log levels can be reduced in 8.14. We will tee up #180778 for a follow on release.

criamico added a commit that referenced this issue Apr 17, 2024
## Summary
Part of #158861
#180607 added a new log level
selector in Agent Policy settings. However there were some small bugs
with it. This PR addresses all of them:

- Extra dot in copy text
- A broken link was displayed - Made this link optional
- When clicking "Cancel", all the other fields on the page reset back to
their original values but the log level doesn't. The reason is the no
default was set for the select

### Before
![image
(19)](https://github.com/elastic/kibana/assets/16084106/bc310642-5425-4413-8cfa-aff03557f2eb)

### After
![Screenshot 2024-04-17 at 10 50
04](https://github.com/elastic/kibana/assets/16084106/30fc6797-7297-496e-9ec7-209e04a44bba)


### Testing 
- Enable `hidden: false`
[here](https://github.com/elastic/kibana/blob/bc9cd862f04430f4e50e7da2a11f16bc80756e9c/x-pack/plugins/fleet/common/settings/agent_policy_settings.ts#L133)
- Change the log level and then click "cancel" on the bottom of the
page, it should reset back to the default value ("info" if the policy is
new, the previous saved value if the policy already had this value set)
- No extra dot and link should be visible
@amolnater-qasource
Copy link

amolnater-qasource commented Jul 26, 2024

Hi Team,

We have executed 04 testcases under the Feature test run for the 8.15.0 release at the link:

Changing the default log level per policy

Status:

PASS: 04

Build details:
VERSION: 8.15.0 BC4
BUILD: 76261
COMMIT: 9d62937
Artifact Link: https://staging.elastic.co/8.15.0-a7432175/summary-8.15.0.html

As the testing is completed on this feature, we are marking this as QA:Validated.

Please let us know if anything else is required from our end.
Thanks

@amolnater-qasource amolnater-qasource added QA:Validated Issue has been validated by QA and removed QA:Needs Validation Issue needs to be validated by QA labels Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
QA:Validated Issue has been validated by QA Team:Fleet Team label for Observability Data Collection Fleet team
Projects
None yet
Development

Successfully merging a pull request may close this issue.