The Cluster Monitoring Operator manages and updates the Prometheus-based cluster monitoring stack deployed on top of OpenShift.
It contains the following components:
- Prometheus Operator
- Prometheus
- Alertmanager cluster for cluster and application level alerting
- kube-state-metrics
- node_exporter
The deployed Prometheus Operator is intended to be used only for cluster-level monitoring.
As such, the deployed Prometheus instance (prometheus-k8s
) is responsible for monitoring and alerting on cluster and OpenShift components; it should not be extended to monitor user applications.
Important: The Prometheus Operator managed by the Cluster Monitoring Operator will by default only look for ServiceMonitor
resources in openshift-monitoring
namespace.
Users interested in leveraging Prometheus for application monitoring on OpenShift should consider using OLM to easily deploy a Prometheus Operator and setup new Prometheus instances to monitor and alert on their applications.
Alertmanager is a cluster-global component for handling alerts generated by all Prometheus instances deployed in that cluster.
Metrics are collected from the following components:
- kube-state-metrics
- node_exporter
- Kubelets
- API server
- Prometheus (just
prometheus-k8s
for now) - Alertmanager
To add new metrics to be sent via telemetry, simply add a selector that matches the time-series to be sent in manifests/0000_50_cluster_monitoring_operator_04-config.yaml.
Documentation on the data sent can be found in the data collection documentation.
- Monitor etcd
- Adapt Tectonic inherited alerts with OpenShift operational knowledge
-
Unit tests:
make test-unit
-
End-to-end tests:
make test-e2e
Before a new OpenShift release happens make sure to pin the dependencies to the release branches:
- In kube-prometheus cut a release.
- In this repo set the "version" in
jsonnet/jsonnetfile.json
to the release branches for all the dependencies.