Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

alerts for weird things #19

Open
nikhiljha opened this issue May 19, 2020 · 3 comments
Open

alerts for weird things #19

nikhiljha opened this issue May 19, 2020 · 3 comments

Comments

@nikhiljha
Copy link
Member

nikhiljha commented May 19, 2020

hal

if something like that happens, there should be an alert ^

I don't know if there's a tool that makes alert for patterns like that

@jvperrin
Copy link
Member

Do we need an alert for something like this? IMO it only matters once we actually start getting close to running out of usable memory/disk/whatever or there's some performance problem for an application for instance. Could we alert on those instead?

@ja5087
Copy link
Member

ja5087 commented May 19, 2020

You can probably write a Prometheus query for that pattern, but I think it's fine since this isn't necessarily abnormal, just a large job.

@dkess
Copy link
Member

dkess commented May 19, 2020

These kinds of alerts are currently configured in Prometheus puppet: https://github.com/ocf/puppet/blob/master/modules/ocf_prometheus/files/rules.d/node.rules.yaml

You can alert on most things, including a memory threshold or a "sustained rate of change" type thing (the DiskWillFillIn3Hours alert tries to do this). There's lots of room to experiment here, though a simple threshold is probably fine. There are probably best practices for this around on the internet too.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants