Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Could not configure a source for OOM detection, disabling OOM events #2150

Closed
wongchao opened this issue Jan 18, 2019 · 25 comments · Fixed by #2553
Closed

Could not configure a source for OOM detection, disabling OOM events #2150

wongchao opened this issue Jan 18, 2019 · 25 comments · Fixed by #2553

Comments

@wongchao
Copy link

rancher1.6

Ubuntu 16.04.2
4.4.0-62-generic
Docker version 17.12.1-ce, build 7390fc6
google/cadvisor:v0.32.0

volumn:
"/:/rootfs:ro",
"/var/run:/var/run:rw",
"/sys:/sys:ro",
"/var/lib/docker/:/var/lib/docker:ro",
"rancher-cni:/.r:ro"

logs:
2019/1/18 下午6:12:08W0118 10:12:08.685620 1 manager.go:349] Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory

@dashpole
Copy link
Collaborator

maybe you need to mount /dev/kmsg into the container?

@CpuID
Copy link

CpuID commented Feb 28, 2019

Tried that, same outcome. Tried it RO and RW.

@tommmlij
Copy link

tommmlij commented Mar 8, 2019

Run the cadvisor container with privileged: true

@torhoehn
Copy link

torhoehn commented Mar 9, 2019

@tommmlij That was working for me.

@lizibin
Copy link

lizibin commented Apr 25, 2019

@wongchao 我也遇到同样的问题,你解决了吗?

@ZVilusinsky
Copy link

ZVilusinsky commented Nov 27, 2019

privileged: true does not resolve it on Centos 7.7

Metrics flow and work without privileged, but following running instructions results in Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory
Mounting /dev instead of /dev/disk changes it to Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: operation not permitted
Turning privileged on/off does not seem to affect it in any way.

edit: well, found something. Even though compose reference mentions it, and the feature exists, it is still not merged, so swarm does not yet support privileged moby/swarmkit#1030

@jamesdboone
Copy link

Probably better to use --device tag to mount device into container than using privileged...

@shrinidhi111
Copy link

Run the cadvisor container with privileged: true

How do I run it as privileged? I am already running it as sudo.

@dashpole
Copy link
Collaborator

dashpole commented Mar 2, 2020

@katarzyna-z
Copy link
Collaborator

@dashpole Do you think that we should add --privileged to example command of running cAdvisor in docker in https://github.com/google/cadvisor/blob/master/README.md?

@katarzyna-z
Copy link
Collaborator

I see that issue don't appear if /dev/kmsg is added to container, for example:

VERSION=v0.35.0 # use the latest release version from https://github.com/google/cadvisor/releases
sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8080:8080 \
  --name=cadvisor \
  --device=/dev/kmsg \
  gcr.io/google-containers/cadvisor:$VERSION

@avimanyu786
Copy link

I see that issue don't appear if /dev/kmsg is added to container, for example:

VERSION=v0.35.0 # use the latest release version from https://github.com/google/cadvisor/releases
sudo docker run \
  --volume=/:/rootfs:ro \
  --volume=/var/run:/var/run:ro \
  --volume=/sys:/sys:ro \
  --volume=/var/lib/docker/:/var/lib/docker:ro \
  --volume=/dev/disk/:/dev/disk:ro \
  --publish=8080:8080 \
  --name=cadvisor \
  --device=/dev/kmsg \
  gcr.io/google-containers/cadvisor:$VERSION

On Docker Compose, the solution to this issue is to add the devices configuration option:

    devices:
      - /dev/kmsg:/dev/kmsg

It's no longer reported in the logs after I added this to my docker-compose file.

@WolfspiritM
Copy link

WolfspiritM commented Mar 12, 2021

For anyone having the same problem trying to get this working in docker swarm...
As docker swarm doesn't support devices yet I've managed to get it running via:

version: "3.7"

services:
  cadvisor:
    image: docker
    volumes: 
        - /var/run/docker.sock:/var/run/docker.sock
    entrypoint: ["/bin/sh","-c"]
    networks:
    - monitoring
    deploy:
      mode: global
    environment:
      - PARENT={{.Task.Name}}
      - CHILDNAME={{.Service.Name}}_sidecar.{{.Node.ID}}.{{.Task.ID}}
      - CADVISOR_VERSION=v0.37.5
    command: 
    - |
      exec docker run -i --rm --network="container:$${PARENT}" \
            --volume=/:/rootfs:ro \
            --volume=/var/run:/var/run:ro  \
            --volume=/sys:/sys:ro  \
            --volume=/var/lib/docker/:/var/lib/docker:ro \
            --volume=/dev/disk/:/dev/disk:ro \
            --name $${CHILDNAME} \
            --privileged \
            --device=/dev/kmsg \
            gcr.io/cadvisor/cadvisor:$${CADVISOR_VERSION}

What this does is run a simple unprivileged container which has access to the docker sock of the host. It then calls docker run for cadvisor and attaches cadvisor to the same network as the currently running container. That's all it does. If the service container is stopped cause of the exec the sigterm is passed to the docker run causing both containers to be shut down cleanly and the sidecard container gets removed cause of the --rm leaving no trace.

Cause the cadvisor sidecar container is attached to the service container network it can then be reached from e.g. prometheus via http://STACK_cadvisor:8080

Hope that helps anyone.

@fleXible
Copy link

fleXible commented Feb 20, 2022

On Docker Compose, the solution to this issue is to add the devices configuration option:

    devices:
      - /dev/kmsg:/dev/kmsg

It's no longer reported in the logs after I added this to my docker-compose file.

But still not working with docker stack deploy at least not with the current compose-file-v3.8

@khangdc2
Copy link

On Docker Compose, the solution to this issue is to add the devices configuration option:

    devices:
      - /dev/kmsg:/dev/kmsg

It's no longer reported in the logs after I added this to my docker-compose file.

But still not working with docker stack deploy at least not with the current compose-file-v3.8

add flag privileged: true

@fleXible
Copy link

On Docker Compose, the solution to this issue is to add the devices configuration option:

    devices:
      - /dev/kmsg:/dev/kmsg

It's no longer reported in the logs after I added this to my docker-compose file.

But still not working with docker stack deploy at least not with the current compose-file-v3.8

add flag privileged: true

How come that I missed that? Oh, I remember: Because neither privileged mode nor device is currently supported with docker stack deploy in a docker swarm environment.

kunte0 added a commit to kunte0/Prometheus-Grafana-Monitor-Setup that referenced this issue May 26, 2022
fix bug added Device /dev/kmsg see google/cadvisor#2150 (comment)
@thetredev
Copy link

Sorry to comment on this issue, but I came across it myself. I don't like starting containers privileged, which is why I digged a bit deeper and found this: https://unix.stackexchange.com/questions/390184/dmesg-read-kernel-buffer-failed-permission-denied

So, performing the following on the host system is probably the better solution (for Docker, at least):

echo "sysctl kernel.dmesg_restrict = 0" | sudo tee /etc/sysctl.d/99-cadvisor.conf
sudo sysctl -p /etc/sysctl.d/99-cadvisor.conf

@youngzyl
Copy link

Sorry to comment on this issue, but I came across it myself. I don't like starting containers privileged, which is why I digged a bit deeper and found this: https://unix.stackexchange.com/questions/390184/dmesg-read-kernel-buffer-failed-permission-denied

So, performing the following on the host system is probably the better solution (for Docker, at least):

echo "sysctl kernel.dmesg_restrict = 0" | sudo tee /etc/sysctl.d/99-cadvisor.conf
sudo sysctl -p /etc/sysctl.d/99-cadvisor.conf

Incredible solution. The privileged should not be configured when there are any potential configuration problems or bugs.

I get below error when running sudo sysctl -p /etc/sysctl.d/99-cadvisor.conf
sysctl: cannot stat /proc/sys/sysctl kernel/dmesg_restrict: No such file or directory

According to the reference docs, https://unix.stackexchange.com/questions/390184/dmesg-read-kernel-buffer-failed-permission-denied
I restarted the procps service or reboot, then resolve this issue sudo service procps restart

@thetredev
Copy link

Glad I could provide some help! :)

@danilok
Copy link

danilok commented Apr 6, 2023

Sorry to comment on this issue, but I came across it myself. I don't like starting containers privileged, which is why I digged a bit deeper and found this: https://unix.stackexchange.com/questions/390184/dmesg-read-kernel-buffer-failed-permission-denied

So, performing the following on the host system is probably the better solution (for Docker, at least):

echo "sysctl kernel.dmesg_restrict = 0" | sudo tee /etc/sysctl.d/99-cadvisor.conf
sudo sysctl -p /etc/sysctl.d/99-cadvisor.conf

I did the steps mentioned but I got the same error as @youngzyl. But looking at other sysctl.d files, the correct content of /etc/sysctl.d/99-cadvisor.conf should be only kernel.dmesg_restrict = 0, without sysctl at the beginning.

So

echo "kernel.dmesg_restrict = 0" | sudo tee /etc/sysctl.d/99-cadvisor.conf
sudo sysctl -p /etc/sysctl.d/99-cadvisor.conf

worked for me.

@thetredev
Copy link

thetredev commented Apr 6, 2023

@danilok that looks way better than what I could look up. Thanks for sharing!

@RajS999
Copy link

RajS999 commented May 16, 2023

I was able to use cadvisor few hours ago. Now it is refusing to start. The only different thing I did is attach vscode to some of the docker containers. I am guessing if there are some disk access conflicts which is causing this issue and if priviledged: true will make things more worse?

Trying privileged: true since running below did not resolve the issue:

 echo "kernel.dmesg_restrict = 0" | sudo tee /etc/sysctl.d/99-cadvisor.conf
 sudo sysctl -p /etc/sysctl.d/99-cadvisor.conf

@enricovian
Copy link

Sorry for the comment on a closed issue, but I experienced the same problem.
Instead of starting the container in privileged mode, I managed to solve the issue by adding the SYS_ADMIN container capability.

Here's a snippet from my docker-compose.yml for reference:

cadvisor:
    ...
    devices:
      - /dev/kmsg:/dev/kmsg
    cap_add:
      - SYS_ADMIN

@levelsw
Copy link

levelsw commented Jun 27, 2024

For anyone having the same problem trying to get this working in docker swarm... As docker swarm doesn't support devices yet I've managed to get it running via:

version: "3.7"

services:
  cadvisor:
    image: docker
    volumes: 
        - /var/run/docker.sock:/var/run/docker.sock
    entrypoint: ["/bin/sh","-c"]
    networks:
    - monitoring
    deploy:
      mode: global
    environment:
      - PARENT={{.Task.Name}}
      - CHILDNAME={{.Service.Name}}_sidecar.{{.Node.ID}}.{{.Task.ID}}
      - CADVISOR_VERSION=v0.37.5
    command: 
    - |
      exec docker run -i --rm --network="container:$${PARENT}" \
            --volume=/:/rootfs:ro \
            --volume=/var/run:/var/run:ro  \
            --volume=/sys:/sys:ro  \
            --volume=/var/lib/docker/:/var/lib/docker:ro \
            --volume=/dev/disk/:/dev/disk:ro \
            --name $${CHILDNAME} \
            --privileged \
            --device=/dev/kmsg \
            gcr.io/cadvisor/cadvisor:$${CADVISOR_VERSION}

What this does is run a simple unprivileged container which has access to the docker sock of the host. It then calls docker run for cadvisor and attaches cadvisor to the same network as the currently running container. That's all it does. If the service container is stopped cause of the exec the sigterm is passed to the docker run causing both containers to be shut down cleanly and the sidecard container gets removed cause of the --rm leaving no trace.

Cause the cadvisor sidecar container is attached to the service container network it can then be reached from e.g. prometheus via http://STACK_cadvisor:8080

Hope that helps anyone.

I tried all the solutions in this thread with my docker swarm stack, only the solution provided by WolfspiritM worked for me

@Tuxdude
Copy link

Tuxdude commented Jul 4, 2024

Sorry for the comment on a closed issue, but I experienced the same problem. Instead of starting the container in privileged mode, I managed to solve the issue by adding the SYS_ADMIN container capability.

Here's a snippet from my docker-compose.yml for reference:

cadvisor:
    ...
    devices:
      - /dev/kmsg:/dev/kmsg
    cap_add:
      - SYS_ADMIN

You don't need the full SYS_ADMIN capability. SYSLOG capability is all you need to allow cAdvisor to access /dev/kmsg, which is even further restricted.

@em1208
Copy link

em1208 commented Sep 16, 2024

For anyone having the same problem trying to get this working in docker swarm... As docker swarm doesn't support devices yet I've managed to get it running via:

version: "3.7"

services:
  cadvisor:
    image: docker
    volumes: 
        - /var/run/docker.sock:/var/run/docker.sock
    entrypoint: ["/bin/sh","-c"]
    networks:
    - monitoring
    deploy:
      mode: global
    environment:
      - PARENT={{.Task.Name}}
      - CHILDNAME={{.Service.Name}}_sidecar.{{.Node.ID}}.{{.Task.ID}}
      - CADVISOR_VERSION=v0.37.5
    command: 
    - |
      exec docker run -i --rm --network="container:$${PARENT}" \
            --volume=/:/rootfs:ro \
            --volume=/var/run:/var/run:ro  \
            --volume=/sys:/sys:ro  \
            --volume=/var/lib/docker/:/var/lib/docker:ro \
            --volume=/dev/disk/:/dev/disk:ro \
            --name $${CHILDNAME} \
            --privileged \
            --device=/dev/kmsg \
            gcr.io/cadvisor/cadvisor:$${CADVISOR_VERSION}

What this does is run a simple unprivileged container which has access to the docker sock of the host. It then calls docker run for cadvisor and attaches cadvisor to the same network as the currently running container. That's all it does. If the service container is stopped cause of the exec the sigterm is passed to the docker run causing both containers to be shut down cleanly and the sidecard container gets removed cause of the --rm leaving no trace.

Cause the cadvisor sidecar container is attached to the service container network it can then be reached from e.g. prometheus via http://STACK_cadvisor:8080

Hope that helps anyone.

Unfortunately this solution does not fully work for me, when I do docker stack rm <stack name> it does not remove the sidecar container. Also docker stack deploy creates a new sidecar container which is not the expected outcome.

I decided not to configure the sidecar and just use

  cadvisor:
    image: gcr.io/cadvisor/cadvisor:latest
    volumes:
      - /:/rootfs:ro
      - /var/run:/var/run:rw
      - /var/run/docker.sock:/var/run/docker.sock:rw
      - /sys:/sys:ro
      - /var/lib/docker/:/var/lib/docker:ro
      - /dev/disk/:/dev/disk:ro
    deploy:
      mode: global

but unfortunately this solution does not solve the OOM detection problem and I can see this on the logs

Could not configure a source for OOM detection, disabling OOM events: open /dev/kmsg: no such file or directory

I hope this helps someone and if I find a better solution I will post it here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.