forked from scylladb/scylla-monitoring
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Add basic alert support to the dashboards (scylladb#267)
* Adding a general kill container script We are about to add an additional container, so it's a good time to remove the duplication from the kill container functionality. Signed-off-by: Amnon Heiman <[email protected]> * remove the kill grafana and prometheus scripts * add a script to start the alert manager * add the alert manager datasource pluging * set the prometheus datasource alert manager to the dashbaords * start and kill the alert manager container * Base configuration for the prometheus and the alarmmanager The base configuration was only added as a first step. We expect that user would chanage it to their own use cases. Signed-off-by: Amnon Heiman <[email protected]> * base alertmanager rule configuration * Add the prometheus rules to the prometheus container * add alarm_table class to the types.json * add an alarm table to the main dashboard The table was added here as a starting point. It would probably moved and better formatted. * set the alert manager address based on its container * set the down time rules to 30s * set the alert manger address dynamically * make the prometheus config a template * create the prometheus config file from template * set severity to 1 instead of page * Revert "add an alarm table to the main dashboard" This reverts commit ca69085. * remove the sudo from kill-container * Revert "add alarm_table class to the types.json" This reverts commit 0de7701. * Add the alertmanager to the README * alertmanager to optionaly get its port from the commandline
- Loading branch information
Showing
12 changed files
with
293 additions
and
98 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,40 @@ | ||
#!/usr/bin/env bash | ||
|
||
usage="$(basename "$0") [-h] [ -p container port ] [-n optional name] [-b base name] -- kills existing Docker instances at given ports" | ||
|
||
while getopts ':hb:p:n:' option; do | ||
case "$option" in | ||
h) echo "$usage" | ||
exit | ||
;; | ||
p) PORT=$OPTARG | ||
;; | ||
n) NAME=$OPTARG | ||
;; | ||
b) BASE_NAME=$OPTARG | ||
;; | ||
:) printf "missing argument for -%s\n" "$OPTARG" >&2 | ||
echo "$usage" >&2 | ||
exit 1 | ||
;; | ||
\?) printf "illegal option: -%s\n" "$OPTARG" >&2 | ||
echo "$usage" >&2 | ||
exit 1 | ||
;; | ||
esac | ||
done | ||
if [ -z $NAME ]; then | ||
if [ -z $PORT ]; then | ||
NAME=$BASE_NAME | ||
else | ||
NAME=$BASE_NAME-$PORT | ||
fi | ||
fi | ||
|
||
if [ "$(docker ps -q -f name=$NAME)" ]; then | ||
docker kill $NAME | ||
fi | ||
|
||
if [[ "$(docker ps -aq --filter name=$NAME 2> /dev/null)" != "" ]]; then | ||
docker rm -v $NAME | ||
fi |
This file was deleted.
Oops, something went wrong.
This file was deleted.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,9 @@ | ||
# Alert for any instance that is unreachable for > 30 seconds. | ||
ALERT InstanceDown | ||
IF up == 0 | ||
FOR 30s | ||
LABELS { severity = "1" } | ||
ANNOTATIONS { | ||
summary = "Instance {{ $labels.instance }} down", | ||
description = "{{ $labels.instance }} of job {{ $labels.job }} has been down for more than 30 seconds.", | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,115 @@ | ||
global: | ||
# The smarthost and SMTP sender used for mail notifications. | ||
smtp_smarthost: 'localhost:25' | ||
smtp_from: '[email protected]' | ||
|
||
# The root route on which each incoming alert enters. | ||
route: | ||
# The root route must not have any matchers as it is the entry point for | ||
# all alerts. It needs to have a receiver configured so alerts that do not | ||
# match any of the sub-routes are sent to someone. | ||
receiver: 'team-X-mails' | ||
|
||
# The labels by which incoming alerts are grouped together. For example, | ||
# multiple alerts coming in for cluster=A and alertname=LatencyHigh would | ||
# be batched into a single group. | ||
group_by: ['alertname', 'cluster'] | ||
|
||
# When a new group of alerts is created by an incoming alert, wait at | ||
# least 'group_wait' to send the initial notification. | ||
# This way ensures that you get multiple alerts for the same group that start | ||
# firing shortly after another are batched together on the first | ||
# notification. | ||
group_wait: 30s | ||
|
||
# When the first notification was sent, wait 'group_interval' to send a batch | ||
# of new alerts that started firing for that group. | ||
group_interval: 5m | ||
|
||
# If an alert has successfully been sent, wait 'repeat_interval' to | ||
# resend them. | ||
repeat_interval: 3h | ||
|
||
# All the above attributes are inherited by all child routes and can | ||
# overwritten on each. | ||
|
||
# The child route trees. | ||
routes: | ||
# This routes performs a regular expression match on alert labels to | ||
# catch alerts that are related to a list of services. | ||
- match_re: | ||
service: ^(foo1|foo2|baz)$ | ||
receiver: team-X-mails | ||
|
||
# The service has a sub-route for critical alerts, any alerts | ||
# that do not match, i.e. severity != critical, fall-back to the | ||
# parent node and are sent to 'team-X-mails' | ||
routes: | ||
- match: | ||
severity: critical | ||
receiver: team-X-pager | ||
|
||
- match: | ||
service: files | ||
receiver: team-Y-mails | ||
|
||
routes: | ||
- match: | ||
severity: critical | ||
receiver: team-Y-pager | ||
|
||
# This route handles all alerts coming from a database service. If there's | ||
# no team to handle it, it defaults to the DB team. | ||
- match: | ||
service: database | ||
|
||
receiver: team-DB-pager | ||
# Also group alerts by affected database. | ||
group_by: [alertname, cluster, database] | ||
|
||
routes: | ||
- match: | ||
owner: team-X | ||
receiver: team-X-pager | ||
|
||
- match: | ||
owner: team-Y | ||
receiver: team-Y-pager | ||
|
||
|
||
# Inhibition rules allow to mute a set of alerts given that another alert is | ||
# firing. | ||
# We use this to mute any warning-level notifications if the same alert is | ||
# already critical. | ||
inhibit_rules: | ||
- source_match: | ||
severity: 'critical' | ||
target_match: | ||
severity: 'warning' | ||
# Apply inhibition if the alertname is the same. | ||
equal: ['alertname'] | ||
|
||
|
||
receivers: | ||
- name: 'team-X-mails' | ||
email_configs: | ||
- to: '[email protected]' | ||
|
||
- name: 'team-X-pager' | ||
email_configs: | ||
- to: '[email protected]' | ||
pagerduty_configs: | ||
- service_key: <team-X-key> | ||
|
||
- name: 'team-Y-mails' | ||
email_configs: | ||
- to: '[email protected]' | ||
|
||
- name: 'team-Y-pager' | ||
pagerduty_configs: | ||
- service_key: <team-Y-key> | ||
|
||
- name: 'team-DB-pager' | ||
pagerduty_configs: | ||
- service_key: <team-DB-key> | ||
|
Oops, something went wrong.