-
Notifications
You must be signed in to change notification settings - Fork 44
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
chore: deploy plg stack and update podMonitor (#1453)
- Loading branch information
1 parent
c32df6c
commit e072b01
Showing
22 changed files
with
549 additions
and
30 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
## How to Deploy PLG Stack on Kubernetes | ||
|
||
**PLG stack** here refers to Promtail, Loki and Grafana, where Promtail extracts and collects logs from docker containers log files and pushes them to the Loki service which then Grafana uses to show logs in the log panel. | ||
|
||
### Install Loki Stack | ||
|
||
In this tutorial, we will show how to deploy them using loki-stack helm chart. | ||
The Loki stack is a lightweight log aggregation solution from Grafana. | ||
|
||
**Step 1.** Add the Grafana Helm Chart repository and Update repo: | ||
|
||
```bash | ||
# Add Grafana's Helm Chart repository and Update repo : | ||
helm repo add grafana https://grafana.github.io/helm-charts | ||
helm repo update | ||
``` | ||
|
||
**Step 2.** Install Loki Stack: | ||
|
||
If you have prometheus and Grafana already installed, you may deploy the loki stack with values as follows: | ||
|
||
```yaml | ||
# cat values.yaml | ||
loki: | ||
enabled: true | ||
url: http://loki-stack.logging:3100 | ||
image: | ||
tag: 2.9.3 # set image tag to 2.8.10 or higher to fix the issue 'Failed to load log volume for this query' | ||
persistence: | ||
enabled: true # set to true to persist logs | ||
|
||
promtail: | ||
enabled: true | ||
config: | ||
clients: | ||
- url: http://loki-stack.logging:3100/loki/api/v1/push # set loki url, don't forget the `namespace` of loki service | ||
``` | ||
```bash | ||
# Deploy the Loki stack to namespace logging. customize the values.yaml as needed. | ||
helm upgrade --install loki-stack grafana/loki-stack -n logging --create-namespace -f values.yaml | ||
``` | ||
|
||
For more details please refer to [loki stack](https://github.com/grafana/helm-charts/tree/main/charts/loki-stack). | ||
|
||
> [!IMPORTANT] | ||
> If you are deploying the stack with loki version 2.6.1, you may encounter the error 'Failed to load log volume for this query'. | ||
> To fix the issue, you should upgrade the loki version to 2.8.10 or higher, as discussed in the [issue](https://github.com/grafana/grafana/issues/84144). | ||
**Step 3.** Check Status: | ||
|
||
```bash | ||
kubectl get pods -n logging | ||
``` | ||
|
||
All the pods should be in the `Running` state. | ||
|
||
### Configure Loki in Grafana | ||
|
||
#### Step 1. Add Loki Data Source to Grafana | ||
|
||
Visit Grafana Dashboard in your browser and Go to `Home` -> `Connections` -> `Data Sources` -> `Add new data source` -> `Loki` and fill in the following details: | ||
|
||
- **Name**: Loki | ||
- **URL**: `http://loki-stack.logging:3100/`, where `logging` is the namespace where Loki is deployed. | ||
|
||
Click on `Save & Test` to save the data source. | ||
|
||
Then click on `Home` > `Explore` then choose `Loki` as the data source to filter labels and run queries, say `{namespace="default",stream="stdout"}` to see the logs. | ||
|
||
If you encounter the `Failed to load log volume for this query` error, please upgrade the loki version to 2.8.10 or higher. | ||
|
||
### Step 2. Import a Loki Dashboard for Logs | ||
|
||
You can import a Loki dashboard to visualize logs in Grafana or create your own dashboard. | ||
|
||
More dashboards can be found at [Grafana Dashboards](https://grafana.com/grafana/dashboards). | ||
|
||
### Example: Collect Logs for MySQL Cluster | ||
|
||
1. Create MySQL Cluster | ||
|
||
```bash | ||
kubectl create -f examples/mysql/cluster.yaml | ||
``` | ||
|
||
2. Open Grafa and import dashboard to visualize logs, for example, you can import the following dashboard: | ||
|
||
- <https://grafana.com/grafana/dashboards/16966-container-log-dashboard/> | ||
|
||
3. You may choose the namespace and stream to filter logs and see the logs in the log panel |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,90 @@ | ||
apiVersion: monitoring.coreos.com/v1 | ||
kind: PrometheusRule | ||
metadata: | ||
name: mongo-alert-rules | ||
labels: | ||
release: prometheus | ||
spec: | ||
groups: | ||
- name: MongodbExporter | ||
rules: | ||
- alert: MongodbDown | ||
expr: "max_over_time(mongodb_up[1m]) == 0" | ||
for: 0m | ||
labels: | ||
severity: critical | ||
annotations: | ||
summary: "MongoDB is Down" | ||
description: 'MongoDB instance is down\n VALUE = {{ $value }}\n LABELS = {{ $labels }}' | ||
|
||
- alert: MongodbRestarted | ||
expr: "mongodb_instance_uptime_seconds < 60" | ||
for: 0m | ||
labels: | ||
severity: info | ||
annotations: | ||
summary: "Mongodb has just been restarted (< 60s)" | ||
description: 'Mongodb has just been restarted {{ $value | printf "%.1f" }} seconds ago\n LABELS = {{ $labels }}' | ||
|
||
- alert: MongodbReplicaMemberUnhealthy | ||
expr: "max_over_time(mongodb_rs_members_health[1m]) == 0" | ||
for: 0m | ||
labels: | ||
severity: critical | ||
annotations: | ||
summary: "Mongodb replica member is unhealthy" | ||
description: 'MongoDB replica member is not healthy\n VALUE = {{ $value }}\n LABELS = {{ $labels }}' | ||
|
||
- alert: MongodbReplicationLag | ||
expr: '(mongodb_rs_members_optimeDate{member_state="PRIMARY"} - on (pod) group_right mongodb_rs_members_optimeDate{member_state="SECONDARY"}) / 1000 > 10' | ||
for: 0m | ||
labels: | ||
severity: critical | ||
annotations: | ||
summary: "MongoDB replication lag (> 10s)" | ||
description: 'Mongodb replication lag is more than 10s\n VALUE = {{ $value }}\n LABELS = {{ $labels }}' | ||
|
||
- alert: MongodbReplicationHeadroom | ||
expr: 'sum(avg(mongodb_mongod_replset_oplog_head_timestamp - mongodb_mongod_replset_oplog_tail_timestamp)) - sum(avg(mongodb_rs_members_optimeDate{member_state="PRIMARY"} - on (pod) group_right mongodb_rs_members_optimeDate{member_state="SECONDARY"})) <= 0' | ||
for: 0m | ||
labels: | ||
severity: critical | ||
annotations: | ||
summary: "MongoDB replication headroom (< 0)" | ||
description: 'MongoDB replication headroom is <= 0\n VALUE = {{ $value }}\n LABELS = {{ $labels }}' | ||
|
||
- alert: MongodbNumberCursorsOpen | ||
expr: 'mongodb_ss_metrics_cursor_open{csr_type="total"} > 10 * 1000' | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "MongoDB opened cursors num (> 10k)" | ||
description: 'Too many cursors opened by MongoDB for clients (> 10k)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}' | ||
|
||
- alert: MongodbCursorsTimeouts | ||
expr: "increase(mongodb_ss_metrics_cursor_timedOut[1m]) > 100" | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "MongoDB cursors timeouts (>100/minute)" | ||
description: 'Too many cursors are timing out (> 100/minute)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}' | ||
|
||
- alert: MongodbTooManyConnections | ||
expr: 'avg by(pod) (rate(mongodb_ss_connections{conn_type="current"}[1m])) / avg by(pod) (sum (mongodb_ss_connections) by(pod)) * 100 > 80' | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "MongoDB too many connections (> 80%)" | ||
description: 'Too many connections (> 80%)\n VALUE = {{ $value }}\n LABELS = {{ $labels }}' | ||
|
||
- alert: MongodbVirtualMemoryUsage | ||
expr: "(sum(mongodb_ss_mem_virtual) BY (pod) / sum(mongodb_ss_mem_resident) BY (pod)) > 100" | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: MongoDB virtual memory usage high | ||
description: "High memory usage: the quotient of (mem_virtual / mem_resident) is more than 100\n VALUE = {{ $value }}\n LABELS = {{ $labels }}" |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,81 @@ | ||
apiVersion: monitoring.coreos.com/v1 | ||
kind: PrometheusRule | ||
metadata: | ||
name: mysql-alert-rules | ||
labels: | ||
release: prometheus | ||
spec: | ||
groups: | ||
- name: MysqldExporter | ||
rules: | ||
- alert: MysqlDown | ||
expr: "max_over_time(mysql_up[1m]) == 0" | ||
for: 0m | ||
labels: | ||
severity: critical | ||
annotations: | ||
summary: "MySQL is down" | ||
description: "MySQL is down. (instance: {{ $labels.pod }})" | ||
|
||
- alert: MysqlRestarted | ||
expr: "mysql_global_status_uptime < 60" | ||
for: 0m | ||
labels: | ||
severity: info | ||
annotations: | ||
summary: "MySQL has just been restarted (< 60s)" | ||
description: 'MySQL has just been restarted {{ $value | printf "%.1f" }} seconds ago. (instance: {{ $labels.pod }})' | ||
|
||
- alert: MysqlTooManyConnections | ||
expr: "sum(max_over_time(mysql_global_status_threads_connected[1m]) / mysql_global_variables_max_connections) BY (namespace,app_kubernetes_io_instance,pod) * 100 > 80" | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "MySQL has too many connections (> 80%)" | ||
description: '{{ $value | printf "%.2f" }} percent of MySQL connections are in use. (instance: {{ $labels.pod }})' | ||
|
||
- alert: MysqlConnectionErrors | ||
expr: "sum(increase(mysql_global_status_connection_errors_total[1m])) BY (namespace,app_kubernetes_io_instance,pod) > 0" | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "MySQL connection errors" | ||
description: 'MySQL has connection errors and the value is {{ $value | printf "%.2f" }}. (instance: {{ $labels.pod }})' | ||
|
||
- alert: MysqlHighThreadsRunning | ||
expr: "sum(max_over_time(mysql_global_status_threads_running[1m]) / mysql_global_variables_max_connections) BY (namespace,app_kubernetes_io_instance,pod) * 100 > 60" | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "MySQL high threads running (> 60%)" | ||
description: '{{ $value | printf "%.2f" }} percent of MySQL connections are in running state. (instance: {{ $labels.pod }})' | ||
|
||
- alert: MysqlSlowQueries | ||
expr: "sum(increase(mysql_global_status_slow_queries[1m])) BY (namespace,app_kubernetes_io_instance,pod) > 0" | ||
for: 2m | ||
labels: | ||
severity: info | ||
annotations: | ||
summary: "MySQL slow queries" | ||
description: 'MySQL server has {{ $value | printf "%.2f" }} slow query. (instance: {{ $labels.pod }})' | ||
|
||
- alert: MysqlInnodbLogWaits | ||
expr: "sum(rate(mysql_global_status_innodb_log_waits[5m])) BY (namespace,app_kubernetes_io_instance,pod) > 10" | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "MySQL InnoDB log waits (> 10)" | ||
description: 'MySQL innodb log writes stalling and the value is {{ $value | printf "%.2f" }}. (instance: {{ $labels.pod }})' | ||
|
||
- alert: MysqlInnodbBufferPoolHits | ||
expr: "sum(rate(mysql_global_status_innodb_buffer_pool_reads[5m]) / rate(mysql_global_status_innodb_buffer_pool_read_requests[5m])) BY (namespace,app_kubernetes_io_instance,pod) * 100 > 5" | ||
for: 2m | ||
labels: | ||
severity: warning | ||
annotations: | ||
summary: "MySQL InnoDB high read requests rate hitting disk (> 5%)" | ||
description: 'High number of logical reads that InnoDB could not satisfy from the buffer pool, and had to read directly from disk. The value is {{ $value | printf "%.2f" }} percent. (instance: {{ $labels.pod }})' |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.