Skip to content

Commit

Permalink
Merge pull request #25 from kabisa/renovate/kabisa-generic-monitor-da…
Browse files Browse the repository at this point in the history
…tadog-1.x

Update Terraform kabisa/generic-monitor/datadog to v1
  • Loading branch information
obeleh authored Jul 22, 2022
2 parents 07020a8 + 0d82ffe commit d4c6b8e
Show file tree
Hide file tree
Showing 20 changed files with 34 additions and 40 deletions.
2 changes: 1 addition & 1 deletion .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ repos:
- id: terraform-validate
- id: tflint
- repo: https://github.com/kabisa/terraform-datadog-pre-commit-hook
rev: "1.3.3"
rev: "1.3.6"
hooks:
- id: terraform-datadog-docs
args:
Expand Down
35 changes: 16 additions & 19 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -31,9 +31,6 @@ module "system" {
service_check_include_tags = ["host:myserver"]
}
```


Expand All @@ -43,22 +40,22 @@ Monitors:

| Monitor name | Default enabled | Priority | Query |
|-----------------|------|----|------------------------|
| [Bytes Received](#bytes-received) | True | 3 | avg(last_30m):avg:system.net.bytes_rcvd{tag:xxx} by {${var.alert_by}} > 5000000 |
| [Bytes Sent](#bytes-sent) | True | 3 | avg(last_30m):avg:system.net.bytes_sent{tag:xxx} by {${var.alert_by}} > 5000000 |
| [CPU](#cpu) | True | 2 | avg(last_30m):avg:system.cpu.user{tag:xxx} by {${var.alert_by}} + avg:system.cpu.system{tag:xxx} by {${var.alert_by}} > 95 |
| [Datadog Agent](#datadog-agent) | True | 2 | avg(${var.dd_agent_evaluation_period}):avg:datadog.agent.running{${local.dd_agent_filter}} by {${var.alert_by}} < 1 |
| [Disk Free Bytes](#disk-free-bytes) | False | 2 | avg(last_5m):min:system.disk.free{tag:xxx} by {host,device} < 10000000000 |
| [Disk Free Percent](#disk-free-percent) | True | 2 | avg(last_5m):100 * min:system.disk.free{tag:xxx} by {host,device} / min:system.disk.total{tag:xxx} by {host,device} < 10 |
| [Disk In Use Percentage](#disk-in-use-percentage) | False | 2 | avg(last_5m):min:system.disk.in_use{tag:xxx} by {${var.alert_by}} * 100 > 90 |
| [Disk Iowait](#disk-iowait) | True | 2 | avg(${var.disk_io_wait_evaluation_period}):avg:system.cpu.iowait{${local.disk_io_wait_filter}} by {${var.alert_by}} > ${var.disk_io_wait_critical} |
| [Memory Free Bytes](#memory-free-bytes) | False | 2 | avg(last_5m):min:system.mem.usable{tag:xxx} by {${var.alert_by}} < 1000000000 |
| [Memory Free Percent](#memory-free-percent) | True | 2 | avg(last_5m):min:system.mem.pct_usable{tag:xxx} by {${var.alert_by}} * 100 < 10 |
| [Memory Usable Percent](#memory-usable-percent) | False | 2 | avg(last_5m):100 * min:system.mem.usable{tag:xxx} by {${var.alert_by}} / min:system.mem.total{tag:xxx} by {${var.alert_by}} < 10 |
| [Packets In Errors](#packets-in-errors) | True | 3 | avg(last_15m):100 * max:system.net.packets_in.error{tag:xxx} by {${var.alert_by}} / ( max:system.net.packets_in.count{tag:xxx} by {${var.alert_by}} + 1000 ) > 1 |
| [Packets Out Errors](#packets-out-errors) | True | 3 | avg(last_15m):100 * max:system.net.packets_out.error{tag:xxx} by {${var.alert_by}} / ( max:system.net.packets_out.count{tag:xxx} by {${var.alert_by}} + 1000 ) > 1 |
| [Reboot](#reboot) | True | 3 | min(last_5m):derivative(max:system.uptime{tag:xxx} by {${var.alert_by}}) < 0 |
| [Required Services](#required-services) | True | 2 | processes('${each.key}').over('tag:xxx').by('host').rollup('count').last('${lookup(each.value, "freshness_duration", var.required_services_default_freshness_duration)}') < ${lookup(each.value, "process_count", 1)} |
| [Swap](#swap) | True | 3 | avg(${var.swap_percent_free_evaluation_period}):min:system.swap.pct_free{${local.swap_percent_free_filter}} by {${var.alert_by}} * 100 < ${var.swap_percent_free_critical} |
| [Bytes Received](#bytes-received) | True | 3 | `avg(last_30m):avg:system.net.bytes_rcvd{tag:xxx} by {${var.alert_by}} > 5000000` |
| [Bytes Sent](#bytes-sent) | True | 3 | `avg(last_30m):avg:system.net.bytes_sent{tag:xxx} by {${var.alert_by}} > 5000000` |
| [CPU](#cpu) | True | 2 | `avg(last_30m):avg:system.cpu.user{tag:xxx} by {${var.alert_by}} + avg:system.cpu.system{tag:xxx} by {${var.alert_by}} > 95` |
| [Datadog Agent](#datadog-agent) | True | 2 | `avg(${var.dd_agent_evaluation_period}):avg:datadog.agent.running{${local.dd_agent_filter}} by {${var.alert_by}} < 1` |
| [Disk Free Bytes](#disk-free-bytes) | False | 2 | `avg(last_5m):min:system.disk.free{tag:xxx} by {host,device} < 10000000000` |
| [Disk Free Percent](#disk-free-percent) | True | 2 | `avg(last_5m):100 * min:system.disk.free{tag:xxx} by {host,device} / min:system.disk.total{tag:xxx} by {host,device} < 10` |
| [Disk In Use Percentage](#disk-in-use-percentage) | False | 2 | `avg(last_5m):min:system.disk.in_use{tag:xxx} by {${var.alert_by}} * 100 > 90` |
| [Disk Iowait](#disk-iowait) | True | 2 | `avg(${var.disk_io_wait_evaluation_period}):avg:system.cpu.iowait{${local.disk_io_wait_filter}} by {${var.alert_by}} > ${var.disk_io_wait_critical}` |
| [Memory Free Bytes](#memory-free-bytes) | False | 2 | `avg(last_5m):min:system.mem.usable{tag:xxx} by {${var.alert_by}} < 1000000000` |
| [Memory Free Percent](#memory-free-percent) | True | 2 | `avg(last_5m):min:system.mem.pct_usable{tag:xxx} by {${var.alert_by}} * 100 < 10` |
| [Memory Usable Percent](#memory-usable-percent) | False | 2 | `avg(last_5m):100 * min:system.mem.usable{tag:xxx} by {${var.alert_by}} / min:system.mem.total{tag:xxx} by {${var.alert_by}} < 10` |
| [Packets In Errors](#packets-in-errors) | True | 3 | `avg(last_15m):100 * max:system.net.packets_in.error{tag:xxx} by {${var.alert_by}} / ( max:system.net.packets_in.count{tag:xxx} by {${var.alert_by}} + 1000 ) > 1` |
| [Packets Out Errors](#packets-out-errors) | True | 3 | `avg(last_15m):100 * max:system.net.packets_out.error{tag:xxx} by {${var.alert_by}} / ( max:system.net.packets_out.count{tag:xxx} by {${var.alert_by}} + 1000 ) > 1` |
| [Reboot](#reboot) | True | 3 | `min(last_5m):derivative(max:system.uptime{tag:xxx} by {${var.alert_by}}) < 0` |
| [Required Services](#required-services) | True | 2 | `processes('${each.key}').over('tag:xxx').by('host').rollup('count').last('${lookup(each.value, "freshness_duration", var.required_services_default_freshness_duration)}') < ${lookup(each.value, "process_count", 1)}` |
| [Swap](#swap) | True | 3 | `avg(${var.swap_percent_free_evaluation_period}):min:system.swap.pct_free{${local.swap_percent_free_filter}} by {${var.alert_by}} * 100 < ${var.swap_percent_free_critical}` |

# Getting started developing
[pre-commit](http://pre-commit.com/) was used to do Terraform linting and validating.
Expand Down
2 changes: 1 addition & 1 deletion bytes-received.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "bytes_received" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Bytes received"
query = "avg(${var.bytes_received_evaluation_period}):avg:system.net.bytes_rcvd{${local.bytes_received_filter}} by {${var.alert_by}} > ${var.bytes_received_critical}"
Expand Down
2 changes: 1 addition & 1 deletion bytes-sent.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "bytes_sent" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Bytes sent"
query = "avg(${var.bytes_sent_evaluation_period}):avg:system.net.bytes_sent{${local.bytes_sent_filter}} by {${var.alert_by}} > ${var.bytes_sent_critical}"
Expand Down
2 changes: 1 addition & 1 deletion cpu.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "cpu" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - High CPU"
query = "avg(${var.cpu_evaluation_period}):avg:system.cpu.user{${local.cpu_filter}} by {${var.alert_by}} + avg:system.cpu.system{${local.cpu_filter}} by {${var.alert_by}} > ${var.cpu_critical}"
Expand Down
2 changes: 1 addition & 1 deletion dd-agent-data.tf
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
module "dd_agent_data" {
source = "kabisa/service-check-monitor/datadog"
version = "1.4.1"
version = "2.0.0"

name = "System - Datadog data missing"
metric_name = "datadog.agent.up"
Expand Down
2 changes: 1 addition & 1 deletion dd-agent.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "dd_agent" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Datadog agent not running"
query = "avg(${var.dd_agent_evaluation_period}):avg:datadog.agent.running{${local.dd_agent_filter}} by {${var.alert_by}} < 1"
Expand Down
2 changes: 1 addition & 1 deletion disk-free-bytes.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "disk_free_bytes" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Disk Free (bytes)"
query = "avg(${var.disk_free_bytes_evaluation_period}):min:system.disk.free{${local.disk_free_bytes_filter}} by {host,device} < ${var.disk_free_bytes_critical}"
Expand Down
2 changes: 1 addition & 1 deletion disk-free-percent.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "disk_free_percent" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Disk Free (percentage)"
query = "avg(${var.disk_free_percent_evaluation_period}):100 * min:system.disk.free{${local.disk_free_percent_filter}} by {host,device} / min:system.disk.total{${local.disk_free_percent_filter}} by {host,device} < ${var.disk_free_percent_critical}"
Expand Down
2 changes: 1 addition & 1 deletion disk-in-use-percentage.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "disk_in_use_percentage" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "Disk In Use Percentage"
query = "avg(${var.disk_in_use_percentage_evaluation_period}):min:system.disk.in_use{${local.disk_in_use_percentage_filter}} by {${var.alert_by}} * 100 > ${var.disk_in_use_percentage_critical}"
Expand Down
2 changes: 1 addition & 1 deletion disk-iowait.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "disk_io_wait" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Disk IO Wait"
query = "avg(${var.disk_io_wait_evaluation_period}):avg:system.cpu.iowait{${local.disk_io_wait_filter}} by {${var.alert_by}} > ${var.disk_io_wait_critical}"
Expand Down
3 changes: 0 additions & 3 deletions examples/example.tf
Original file line number Diff line number Diff line change
Expand Up @@ -8,6 +8,3 @@ module "system" {
filter_str = "host:myserver"
service_check_include_tags = ["host:myserver"]
}



2 changes: 1 addition & 1 deletion memory-free-bytes.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "memory_free_bytes" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Memory Free Bytes"
query = "avg(${var.memory_free_bytes_evaluation_period}):min:system.mem.usable{${local.memory_free_bytes_filter}} by {${var.alert_by}} < ${var.memory_free_bytes_critical}"
Expand Down
2 changes: 1 addition & 1 deletion memory-free-percent.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "memory_free_percent" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Memory Free Percent"
# Note: system.mem.pct_usable is actually a faction not a percentage
Expand Down
2 changes: 1 addition & 1 deletion memory-usable-percent.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "memory_usable_percent" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "Memory Usable Percent"
query = "avg(${var.memory_usable_percent_evaluation_period}):100 * min:system.mem.usable{${local.memory_usable_percent_filter}} by {${var.alert_by}} / min:system.mem.total{${local.memory_usable_percent_filter}} by {${var.alert_by}} < ${var.memory_usable_percent_critical}"
Expand Down
2 changes: 1 addition & 1 deletion packets-in-errors.tf
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ locals {

module "packets_in_errors" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Packet In Errors"
# +1000 helps out filtering low packet rates, this prevents a handful of packet errors to skew the percentage when for example only 100 packets are received/sent
Expand Down
2 changes: 1 addition & 1 deletion packets-out-errors.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "packets_out_errors" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Packet Out Errors"
# +1000 helps out filtering low packet rates, this prevents a handful of packet errors to skew the percentage when for example only 100 packets are received/sent
Expand Down
2 changes: 1 addition & 1 deletion reboot.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "reboot" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "Sytem - Reboot detected"
query = "min(last_5m):derivative(max:system.uptime{${local.reboot_filter}} by {${var.alert_by}}) < 0"
Expand Down
2 changes: 1 addition & 1 deletion required-services.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "required_services" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

for_each = var.required_services_config

Expand Down
2 changes: 1 addition & 1 deletion swap.tf
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ locals {

module "swap_percent_free" {
source = "kabisa/generic-monitor/datadog"
version = "0.7.5"
version = "1.0.0"

name = "System - Swap percent free"
query = "avg(${var.swap_percent_free_evaluation_period}):min:system.swap.pct_free{${local.swap_percent_free_filter}} by {${var.alert_by}} * 100 < ${var.swap_percent_free_critical}"
Expand Down

0 comments on commit d4c6b8e

Please sign in to comment.