Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CRIMAPP-1448 Prometheus rules for RDS in Crime Datastore production #27759

Merged
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Original file line number Diff line number Diff line change
@@ -0,0 +1,68 @@
# Prometheus Alerts
#
# https://user-guide.cloud-platform.service.justice.gov.uk/documentation/monitoring-an-app/how-to-create-alarms.html
#
# Note: we are using a regex in the namespace to filter and trigger alerts
# in both, staging and production environments.
#
# To see the current alerts in this namespace:
# kubectl describe prometheusrule -n laa-criminal-applications-datastore-production
#
# Alerts will be sent to the slack channel: #laa-crime-apply-alerts
#
apiVersion: monitoring.coreos.com/v1
kind: PrometheusRule
metadata:
name: prometheus-rules-rds
namespace: laa-criminal-applications-datastore-production
labels:
role: alert-rules
prometheus: cloud-platform
spec:
groups:
- name: rds-rules
rules:
- alert: Datastore-Production-RDS-HighCPUUtilization
expr: aws_rds_cpuutilization_average{dbinstance_identifier="cloud-platform-42511ff2a5d7e782"} > 25
for: 1m
labels:
severity: laa-crime-apply-alerts
annotations:
message: Datastore production - RDS CPU usage > 25%.
dashboard_url: https://grafana.live.cloud-platform.service.justice.gov.uk/d/VR46pmwWk/aws-rds?orgId=1&var-datasource=P896B4444D3F0DAB8&var-region=default&var-dbinstanceidentifier=cloud-platform-42511ff2a5d7e782

- alert: Datastore-Production-RDS-LowStorage
expr: aws_rds_free_storage_space_average{dbinstance_identifier="cloud-platform-42511ff2a5d7e782"} < 1024*1024*1024
for: 1m
labels:
severity: laa-crime-apply-alerts
annotations:
message: Datastore production - RDS storage capacity < 1GB.
dashboard_url: https://grafana.live.cloud-platform.service.justice.gov.uk/d/VR46pmwWk/aws-rds?orgId=1&var-datasource=P896B4444D3F0DAB8&var-region=default&var-dbinstanceidentifier=cloud-platform-42511ff2a5d7e782

- alert: Datastore-Production-RDS-HighReadLatency
expr: aws_rds_read_latency_average{dbinstance_identifier="cloud-platform-42511ff2a5d7e782"} > 0.5
for: 1m
labels:
severity: laa-crime-apply-alerts
annotations:
message: Datastore production - RDS read latency > 500ms.
dashboard_url: https://grafana.live.cloud-platform.service.justice.gov.uk/d/VR46pmwWk/aws-rds?orgId=1&var-datasource=P896B4444D3F0DAB8&var-region=default&var-dbinstanceidentifier=cloud-platform-42511ff2a5d7e782

- alert: Datastore-Production-RDS-HighWriteLatency
expr: aws_rds_write_latency_average{dbinstance_identifier="cloud-platform-42511ff2a5d7e782"} > 0.5
for: 1m
labels:
severity: laa-crime-apply-alerts
annotations:
message: Datastore production - RDS write latency > 500ms.
dashboard_url: https://grafana.live.cloud-platform.service.justice.gov.uk/d/VR46pmwWk/aws-rds?orgId=1&var-datasource=P896B4444D3F0DAB8&var-region=default&var-dbinstanceidentifier=cloud-platform-42511ff2a5d7e782

- alert: Datastore-Production-RDS-HighDatabaseConnections
expr: aws_rds_database_connections_average{dbinstance_identifier="cloud-platform-42511ff2a5d7e782"} > 20
for: 1m
labels:
severity: laa-crime-apply-alerts
annotations:
message: Datastore production - RDS number of database connections > 20.
dashboard_url: https://grafana.live.cloud-platform.service.justice.gov.uk/d/VR46pmwWk/aws-rds?orgId=1&var-datasource=P896B4444D3F0DAB8&var-region=default&var-dbinstanceidentifier=cloud-platform-42511ff2a5d7e782
Loading