Skip to content

Commit

Permalink
Document the new ci-monitoring stack (#34823)
Browse files Browse the repository at this point in the history
* Document the new ci-monitoring stack

* Update clusters/app.ci/ci-grafana/README.md

Co-authored-by: Bruno Barcarol Guimarães <[email protected]>

* Update clusters/app.ci/openshift-user-workload-monitoring/README.md

Co-authored-by: Bruno Barcarol Guimarães <[email protected]>

Co-authored-by: Bruno Barcarol Guimarães <[email protected]>
  • Loading branch information
hongkailiu and bbguimaraes authored Dec 15, 2022
1 parent eced59f commit b21c662
Show file tree
Hide file tree
Showing 2 changed files with 41 additions and 0 deletions.
24 changes: 24 additions & 0 deletions clusters/app.ci/ci-grafana/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
# CI-Grafana

This folder contains the manifests for Grafana managed by [grafana-operator](https://github.com/grafana-operator/grafana-operator).
The grafana-operator is installed via [Operator Hub](https://console-openshift-console.apps.ci.l2s4.p1.openshiftapps.com/operatorhub) into `namespace/ci-grafana` and managed by operator-lifecycle-manager (OLM).

## Dashboards

The dashboards for Grafana are generated from [mixins](../openshift-user-workload-monitoring/mixins) with the command:

> make -C clusters/app.ci/openshift-user-workload-monitoring/mixins all
The generated dashboards are stored in [mixins/grafana_dashboards_out](../openshift-user-workload-monitoring/mixins/grafana_dashboards_out).
The `jsonnet` objects are there because it is easy for validation in CI if all the mixins generated manifests stay together.

## Staging

We do not have a staging grafana instance for developing dashboards any more.
With grafana-operator, we could apply the generated dashboard to preview the dashboard with the production instance.

The current grafana-operator, 4.8.0 as this readme is written, manages only one grafana instance.
We have to [deploy everything all over again](https://kubernetes.slack.com/archives/C019A1KTYKC/p1670534010925499) for staging instance into another namespace
which will be fixed when Version 5+ is available.
We can start up the staging instance then if needed.

17 changes: 17 additions & 0 deletions clusters/app.ci/openshift-user-workload-monitoring/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,17 @@
# Metrics and Alerts

This folder contains the manifests for user-workload-monitoring (UWM) based on [monitoring user-defined project on OSD cluster](https://docs.openshift.com/dedicated/osd_cluster_admin/osd_monitoring/osd-understanding-the-monitoring-stack.html) and managed by cluster-monitoring-operator (CMO).

The ServiceMonitors and the PodMonitors defines the scraping targets for Prometheus. The cluster console has the UI to run queries against the metrics from those targets.

The alerts are generated by [mixins](openshift-user-workload-monitoring/mixins/) with the following command:

> make -C clusters/app.ci/openshift-user-workload-monitoring/mixins all
The generated manifests are stored in [mixins/prometheus_out](mixins/prometheus_out).

[Here](../supplemental-ci-images/validation-images/dashboards/dashboards-validation.yaml) is the list of the required tools.

# Add an alert on Prow jobs

The metrics and the alerts defined here are for the TP team. CI users has [a more convenient way](https://docs.ci.openshift.org/docs/how-tos/notification/) if slack notifications are desired, e.g, on Prow job failures.

0 comments on commit b21c662

Please sign in to comment.