Skip to content

Commit

Permalink
feat: dns check (#91)
Browse files Browse the repository at this point in the history
* feat: add dns check

* feat: implement the rest of the dns check

* feat: add dns metrics
* test: add unit tests for dns check methods

* feat: add dns check to check register

* chore: licensing headers

* chore: mv route name to constant

* fix: retry on lookup failures

* feat: mv metrics to dedicated file

* chore: adjust log messages

* test: add context cancelation test

* chore: rename test functions

* chore: rm unused check handlers

* test: create ctx for each test

* fix: handle injected global targets

* chore: PR review

* chore: adjust metric names
* test: add metric collector tests
* chore: rephrase logging message

* test: add more set config test cases

* docs: adjust metrics godocs

* docs: add dns check

* docs: add dns check to about this component section

* chore: add dns config to helm chart

* fix: loader default value runtime config path

* fix: use custom dialer

* chore: duration metric naming
  • Loading branch information
lvlcn-t authored Jan 29, 2024
1 parent 9f1d0d3 commit 585c9c1
Show file tree
Hide file tree
Showing 19 changed files with 1,081 additions and 216 deletions.
68 changes: 61 additions & 7 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,9 @@
- [Check: Latency](#check-latency)
- [Example configuration](#example-configuration-1)
- [Latency Metrics](#latency-metrics)
- [Check: DNS](#check-dns)
- [Example configuration](#example-configuration-2)
- [DNS Metrics](#dns-metrics)
- [API](#api)
- [Metrics](#metrics)
- [Code of Conduct](#code-of-conduct)
Expand All @@ -38,15 +41,15 @@ executed periodically.

## About this component

The `sparrow` performs several checks to monitor the health of the infrastructure and network from its point of view.
The following checks are available:
The `sparrow` performs several checks to monitor the health of the infrastructure and network from its point of view. The following checks are available:

1. Health check - `health`: The `sparrow` is able to perform an HTTP-based (HTTP/1.1) health check to the provided
endpoints.
The `sparrow` will expose its own health check endpoint as well.
1. [Health check](#check-health) - `health`: The `sparrow` is able to perform an HTTP-based (HTTP/1.1) health check to the provided endpoints. The `sparrow` will expose its own health check endpoint as well.

2. Latency check - `latency`: The `sparrow` is able to communicate with other `sparrow` instances to calculate the time
a request takes to the target and back. The check is http (HTTP/1.1) based as well.
2. [Latency check](#check-latency) - `latency`: The `sparrow` is able to communicate with other `sparrow` instances to calculate the time a request takes to the target and back. The check is http (HTTP/1.1) based as well.

3. [DNS check](#check-dns) - `dns`: The `sparrow` is able to perform DNS resolution checks to monitor domain name system performance and reliability. The check has the ability to target specific domains or IPs for monitoring.

Each check is designed to provide comprehensive insights into the various aspects of network and service health, ensuring robust monitoring and quick detection of potential issues.

## Installation

Expand Down Expand Up @@ -365,6 +368,57 @@ checks:
- Description: Latency of targets in seconds
- Labelled with `target`

### Check: DNS

Available configuration options:

- `checks`
- `dns`
- `interval` (duration): Interval to perform the DNS check.
- `timeout` (duration): Timeout for the DNS check.
- `retry`
- `count` (integer): Number of retries for the DNS check.
- `delay` (duration): Initial delay between retries for the DNS check.
- `targets` (list of strings): List of targets to lookup. Needs to be a valid domain or IP. Can be
another `sparrow` instance. Automatically updated when a targetManager is configured.

#### Example configuration

```yaml
checks:
dns:
interval: 10s
timeout: 30s
retry:
count: 3
delay: 1s
targets:
- www.example.com
- www.google.com
```

#### DNS Metrics

- `sparrow_dns_status`
- Type: Gauge
- Description: Lookup status of targets
- Labelled with `target`

- `sparrow_dns_check_count`
- Type: Counter
- Description: Count of DNS checks done
- Labelled with `target`

- `sparrow_dns_duration`
- Type: Gauge
- Description: Duration of DNS resolution attempts
- Labelled with `target`

- `sparrow_dns_duration`
- Type: Histogram
- Description: Histogram of response times for DNS checks
- Labelled with `target`

## API

The `sparrow` exposes an API for accessing the results of various checks. Each check registers its own endpoint at `/v1/metrics/{check-name}`. The API's definition is available at `/openapi`.
Expand Down
2 changes: 1 addition & 1 deletion chart/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,6 +62,6 @@ A Helm chart to install Sparrow
| serviceMonitor.interval | string | `"30s"` | Sets the scrape interval |
| serviceMonitor.labels | object | `{}` | Additional label added to the service Monitor |
| serviceMonitor.scrapeTimeout | string | `"5s"` | Sets the scrape timeout |
| sparrowConfig | object | `{"loader":{"path":"/config/checksConfig.yaml","type":"file"},"name":"sparrow.com"}` | Sparrow configuration read on startup see: https://github.com/caas-team/sparrow/blob/main/docs/sparrow_run.md |
| sparrowConfig | object | `{"loader":{"path":"/config/checks.yaml","type":"file"},"name":"sparrow.com"}` | Sparrow configuration read on startup see: https://github.com/caas-team/sparrow/blob/main/docs/sparrow_run.md |
| tolerations | list | `[]` | |

49 changes: 29 additions & 20 deletions chart/values.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -111,7 +111,7 @@ sparrowConfig:
name: sparrow.com
loader:
type: file
path: /config/checksConfig.yaml
path: /config/checks.yaml
# name: the-sparrow.com
# api:
# address:
Expand Down Expand Up @@ -139,25 +139,34 @@ sparrowConfig:
# see: https://github.com/caas-team/sparrow?tab=readme-ov-file#checks
checksConfig:
checks: {}
# checks:
# health:
# interval: 15s
# timeout: 10s
# retry:
# count: 3
# delay: 1s
# targets:
# - "https://www.example.com/"
# - "https://www.google.com/"
# latency:
# interval: 15s
# timeout: 30s
# retry:
# count: 3
# delay: 2s
# targets:
# - https://example.com/
# - https://google.com/
# checks:
# health:
# interval: 15s
# timeout: 10s
# retry:
# count: 3
# delay: 1s
# targets:
# - "https://www.example.com/"
# - "https://www.google.com/"
# latency:
# interval: 15s
# timeout: 30s
# retry:
# count: 3
# delay: 2s
# targets:
# - https://example.com/
# - https://google.com/
# dns:
# interval: 10s
# timeout: 30s
# retry:
# count: 5
# delay: 1s
# targets:
# - www.example.com
# - www.google.com

# -- Configure a service monitor for prometheus-operator
serviceMonitor:
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -47,7 +47,7 @@ require (
github.com/subosito/gotenv v1.6.0 // indirect
go.uber.org/multierr v1.11.0 // indirect
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa // indirect
golang.org/x/sys v0.15.0 // indirect
golang.org/x/sys v0.16.0 // indirect
golang.org/x/text v0.14.0 // indirect
google.golang.org/protobuf v1.31.0 // indirect
gopkg.in/ini.v1 v1.67.0 // indirect
Expand Down
4 changes: 2 additions & 2 deletions go.sum
Original file line number Diff line number Diff line change
Expand Up @@ -106,8 +106,8 @@ go.uber.org/multierr v1.11.0/go.mod h1:20+QtiLqy0Nd6FdQB9TLXag12DsQkrbs3htMFfDN8
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa h1:FRnLl4eNAQl8hwxVVC17teOw8kdjVDVAiFMtgUdTSRQ=
golang.org/x/exp v0.0.0-20231110203233-9a3e6036ecaa/go.mod h1:zk2irFbV9DP96SEBUUAy67IdHUaZuSnrz1n472HUCLE=
golang.org/x/sync v0.0.0-20181221193216-37e7f081c4d4/go.mod h1:RxMgew5VJxzue5/jJTE5uejpjVlOe/izrB70Jof72aM=
golang.org/x/sys v0.15.0 h1:h48lPFYpsTvQJZF4EKyI4aLHaev3CxivZmv7yZig9pc=
golang.org/x/sys v0.15.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/sys v0.16.0 h1:xWw16ngr6ZMtmxDyKyIgsE93KNKz5HKmMa3b8ALHidU=
golang.org/x/sys v0.16.0/go.mod h1:/VUhepiaJMQUp4+oa/7Zr1D23ma6VTLIYjOOTFZPUcA=
golang.org/x/text v0.14.0 h1:ScX5w1eTa3QqT8oi6+ziP7dTV1S2+ALU0bI+0zXKWiQ=
golang.org/x/text v0.14.0/go.mod h1:18ZOQIKpY8NJVqYksKHtTdi31H5itFRjB5/qKTNYzSU=
golang.org/x/xerrors v0.0.0-20191204190536-9bdfabe68543/go.mod h1:I/5z698sn9Ka8TeJc9MKroUUfqBBauWjQqLJ2OPfmY0=
Expand Down
5 changes: 0 additions & 5 deletions pkg/checks/checks.go
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,6 @@ import (
"github.com/getkin/kin-openapi/openapi3"
"github.com/prometheus/client_golang/prometheus"

"github.com/caas-team/sparrow/pkg/api"
"github.com/caas-team/sparrow/pkg/checks/types"
)

Expand All @@ -48,10 +47,6 @@ type Check interface {
SetConfig(ctx context.Context, config any) error
// Schema returns an openapi3.SchemaRef of the result type returned by the check
Schema() (*openapi3.SchemaRef, error)
// RegisterHandler Allows the check to register a handler on sparrows http server at runtime
RegisterHandler(ctx context.Context, router *api.RoutingTree)
// DeregisterHandler allows the check to deregister a handler on sparrows http server at runtime
DeregisterHandler(ctx context.Context, router *api.RoutingTree)
// GetMetricCollectors allows the check to provide prometheus metric collectors
GetMetricCollectors() []prometheus.Collector
}
101 changes: 0 additions & 101 deletions pkg/checks/checks_moq.go

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

Loading

0 comments on commit 585c9c1

Please sign in to comment.