-
Notifications
You must be signed in to change notification settings - Fork 10
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Excessive amount of logs in audit.log on a CIS-hardened systems caused by grafana-agent accessing log files inside LXD containers #52
Comments
Can you please provide steps for deploying a usable cis_hardened (level2) system with juju? |
Hi @przemeklal, I managed to relate grafana agent to a cis hardened system (cis_level2_server) and I did not see this behavior. Could you see if you can reproduce the issue? |
Deploying g-agent to a cis-hardened machine is not enough to reproduce it. You should also deploy LXCs on that machine and relate g-agent to the apps running in these containers:
Once g-agents inside these LXDs try to access stuff in /var/log, auditd log spam starts one level below, on the LXD "host". |
I am still having issues reproducing this nested lxd setup. If someone could reach out and schedule some time to walk me through it that would be very helpful. |
The issues seems to be solved by the classically confined snap. Waiting for approval in the snap store. https://forum.snapcraft.io/t/classic-version-for-the-grafana-agent-snap/40378?u=dstathis |
Hi @dstathis, |
@mr-cal The snap is now building correctly. I hope to have this fix fully published in the next day or two. |
Bug Description
On a CIS hardened (level 2) Charmed Openstack control node hosting 25 LXDs running Openstack control plane services, installing and running grafana-agent inside those LXDs caused massive amounts of logs to be written to audit.log on the host level (12G in less than a day, then it just ran out of disk space).
Pretty much all these "new" entries in audit.log are reports of grafana-agent accessing
/var/log/.../*log
files.Typical entries in audit.log look like this one:
Logs from
/var/log/aodh/aodh-evaluator.log
(and all other files logged in audit.log) are searchable in Loki and everything else looks fine. There aren't any related errors being reported in the logs of the grafana-agent running inside the LXD.Additionally, not all files accessed by grafana-agent in the LXDs are reported in audit.log on the host level. The main difference seems to be ownership of log files and directories. For example, I see many logs reporting
/var/log/aodh/*.log
files,/var/log/barbican/.log
files, etc. but nothing about/var/log/juju/*.log
or/var/log/syslog
.Their ownership is as follows:
It seems that as long as files are owned by syslog:adm, grafana-agent's syscalls are not recorded. Accessing files owned by root, barbican (OpenStack service user), hacluster users, results in massive amounts of audit logs.
This may or may not be related to group membership of these user accounts:
This massive audit.log spam may have catastrophic results, for example, if the CIS "4.1.2.3 Ensure system is disabled when audit logs are full" rule is in place, in the worst case it may just shut down the system after running out of space on the /var/log/audit partition.
The issue doesn't occur with filebeat for example, so it might be also related to grafana-agent being a snap.
Is there anything that can be tweaked in grafana-agent snap that could help with this?
Also, my recommendation is to avoid relating grafana-agent to Loki in any CIS-hardened deployments until this is resolved.
To Reproduce
Deploy grafana-agent in any Openstack control-plane LXD container running on a CIS-hardened host, relate it to Loki and watch /var/log/audit/audit.log.
Environment
CIS-hardened Ubuntu 20.04
Charmed Openstack focal/ussuri
Relevant log output
Snippets posted above audit.log file sizes for the sake of completeness: -r--r----- 1 root adm 9.3G Jan 30 00:00 audit.log-20240130_000001 -rw-r----- 1 root adm 0 Jan 30 00:00 audit.log.1 -rw-r----- 1 root adm 2.1G Jan 30 06:28 audit.log
Additional context
This is a potential blocker for grafana-agent deployments on CIS-hardened clouds.
The text was updated successfully, but these errors were encountered: