Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Host metrics #293

Open
wants to merge 5 commits into
base: monitoring
Choose a base branch
from
Open

Host metrics #293

wants to merge 5 commits into from

Conversation

mwiencek
Copy link
Member

This is based on #291.

The service definition was based on https://github.com/metabrainz/prometheus-exp/blob/main/node.sh

@mwiencek mwiencek changed the base branch from master to monitoring January 29, 2025 06:06
@mwiencek
Copy link
Member Author

The only issue I have with this is the node-exporter logs being spammed with the following:

node-exporter  | ts=2025-01-29T06:07:03.163Z caller=stdlib.go:105 level=error msg="error gathering metrics: 21 error(s) occurred:\n* [from Gatherer #2] collected metric \"node_filesystem_device_error\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_readonly\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:1}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_size_bytes\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_free_bytes\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_avail_bytes\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files_free\" { label:{name:\"device\"  value:\"devpts\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"devpts\"}  label:{name:\"mountpoint\"  value:\"/dev/pts\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_device_error\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_readonly\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:1}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_size_bytes\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_free_bytes\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_avail_bytes\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files_free\" { label:{name:\"device\"  value:\"mqueue\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"mqueue\"}  label:{name:\"mountpoint\"  value:\"/dev/mqueue\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_device_error\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_readonly\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:1}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_size_bytes\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_free_bytes\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_avail_bytes\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values\n* [from Gatherer #2] collected metric \"node_filesystem_files_free\" { label:{name:\"device\"  value:\"proc\"}  label:{name:\"device_error\"  value:\"\"}  label:{name:\"fstype\"  value:\"proc\"}  label:{name:\"mountpoint\"  value:\"/proc\"}  gauge:{value:0}} was collected before with the same name and label values"

I have no idea why this is happening or how to resolve it -- devpts for example is already listed in --collector.filesystem.ignored-fs-types, so I don't know why it's being collected.

@reosarevok
Copy link
Member

Other than the same error you mentioned, and the fact that I had to rebuild the dev-search image every time because it failed at (re)creating existing collections (which I understand is being fixed already), this seems to work great!

@yvanzo
Copy link
Contributor

yvanzo commented Jan 31, 2025

I merged SIR dev stuff into master, rebased the target branch monitoring on it, and rebased the source branch host metrics on it to resolve conflicts.

- node-exporter-textfile-collector:/textfile-collector-directory
command:
- --path.rootfs=/host
- --collector.filesystem.ignored-fs-types="^(autofs|binfmt_misc|cgroup|configfs|debugfs|devpts|devtmpfs|fusectl|hugetlbfs|mqueue|nsfs|tmpfs|overlay|proc|procfs|pstore|rpc_pipefs|securityfs|sysfs|tracefs)$"
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FYI, not sure if it applies to the version we run yet, but:

According to https://github.com/prometheus/node_exporter/blob/810510e12b063690e6e52700a867676b93492e92/collector/filesystem_common.go#L262

"--collector.filesystem.ignored-fs-types is DEPRECATED and will be removed in 2.0.0, use --collector.filesystem.fs-types-exclude"

And according to https://github.com/prometheus/node_exporter/blob/810510e12b063690e6e52700a867676b93492e92/collector/filesystem_common.go#L233

"--collector.filesystem.ignored-mount-points is DEPRECATED and will be removed in 2.0.0, use --collector.filesystem.mount-points-exclude"

I checked the code, and I don't see a good reason for it to duplicate entries but if they are read twice (any symlink or multiple mounts?) plus it should be ignored.

Copy link
Member Author

@mwiencek mwiencek Feb 6, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see these in /etc/mtab (inside the node-exporter container):

devpts /dev/pts devpts rw,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0
devpts /host/dev/pts devpts ro,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=000 0 0
devpts /host/mnt/hdd/docker/btrfs/subvolumes/c7cf67edc828a039fd362077c065f44799152030a7f79d252f2d274ad80f5460/dev/pts devpts ro,nosuid,noexec,relatime,gid=5,mode=620,ptmxmode=666 0 0

The last one points to an empty dir. Not sure about symlinks. devpts wasn't the only filesystem type with issues, either...

For now I'll disable the filesystem collector since I don't think it's essential for monitoring Solr performance anyway.

Edit: Thanks for the hint about the deprecated flags -- updating them didn't help, but I'll rename them in the commented code.

It currently spams the logs with errors of this form:

  collected metric [...] was collected before with the same name and label
  values

However, the metrics it references should already be ignored/excluded by the
options that I'm disabling in this commit, so it's not clear how to resolve
this.
@mwiencek mwiencek marked this pull request as ready for review February 6, 2025 16:00
@mwiencek mwiencek mentioned this pull request Feb 7, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants