This section of tutorial will focus specifically how to collect your infrastructure metrics with OpenTelemetry Operator. The collection of infrastructure metrics consists of a few components that will be introduced. Parts of this document are based on "Important Components for Kubernetes" by OpenTelemetry authors which is licensed under CC BY 4.0. Some parts have been adjusted for the purpose of this tutorial.
Many Kubernetes related components in this part of tutorial use the Kubernetes API, therefore they require proper permissions to work correctly. For most cases, you should give the service account running the collector the following permissions via a ClusterRole. As we go through this secion of the tutorial, we will create appropriate service account and cluster roles. You can inspect them yourself in this file.
Applying the below YAML will install the OpenTelemetry Collectors configured to receive and scrape all necessary metrics, important for monitoring your Kubernetes cluster. Go ahead and run the following:
kubectl apply -f https://raw.githubusercontent.com/pavolloffay/kubecon-na-2023-opentelemetry-kubernetes-metrics-tutorial/main/backend/06-collector-k8s-cluster-metrics.yaml
This will create new instances of the OpenTelemetry collector (in statefulset
and deameonset
mode) and related objects, configured for Kubernetes metrics collection. Check your setup by running:
kubectl get -n observability-backend pod
Your output should look similar to this:
NAME READY STATUS RESTARTS AGE
otel-k8s-cluster-metrics-agent-collector-zc6cz 1/1 Running 0 16m
otel-k8s-cluster-metrics-collector-0 1/1 Running 0 16m
otel-k8s-cluster-metrics-collector-1 1/1 Running 0 16m
otel-k8s-cluster-metrics-collector-2 1/1 Running 0 16m
otel-k8s-cluster-metrics-targetallocator-5f5b954d7d-6sbvh 1/1 Running 0 16m
otel-k8s-cluster-metrics-targetallocator-5f5b954d7d-jnpsb 1/1 Running 0 16m
You're ready to receive metrics from your Kubernetes cluster! Let's go through the each component of the collector configuration and see what they do.
Each Kubernetes node runs a kubelet that includes an API server. The Kubernetes Receiver connects to that kubelet via the API server to collect metrics about the node and the workloads running on the node. Due to the nature of this component, we recommend to run it as a daemon set on each node.
There are different methods for authentication, but typically a service account is used (as is also the case for this tutorial). By default, metrics will be collected for pods and nodes, but you can configure the receiver to collect container and volume metrics as well. The receiver also allows configuring how often the metrics are collected. Inspect the following section of the configuration.
For specific details about which metrics are collected, see Default Metrics. For specific configuration details, see Kubeletstats Receiver.
Open the Kubelet Dashboard and you'll see information about the pod of your choice:
The Kubernetes Cluster Receiver collects metrics and entity events about the cluster as a whole using the Kubernetes API server. Use this receiver to answer questions about pod phases, node conditions, and other cluster-wide questions. Since the receiver gathers telemetry for the cluster as a whole, only one instance of the receiver is needed across the cluster in order to collect all the data.
There are different methods for authentication, but typically a service account
is used (as is also the case for this tutorial). For node conditions, the receiver only collects Ready
by default, but it can
be configured to collect more. The receiver can also be configured to report a
set of allocatable resources, such as cpu
and memory
. The k8s_cluster
receiver looks as follows:
To learn more about the metrics that are collected, see Default Metrics For configuration details, see Kubernetes Cluster Receiver.
The Host Metrics Receiver collects metrics from a host using a variety of scrapers. There are a number of scrapers that collect metrics for particular parts of the system. Overview of the available scrapers:
Scraper | Supported OSs | Description |
---|---|---|
cpu | All except Mac | CPU utilization metrics |
disk | All except Mac | Disk I/O metrics |
load | All | CPU load metrics |
filesystem | All | File System utilization metrics |
memory | All | Memory utilization metrics |
network | All | Network interface I/O metrics & TCP connection metrics |
paging | All | Paging/Swap space utilization and I/O metrics |
processes | Linux, Mac | Process count metrics |
process | Linux, Windows, Mac | Per process CPU, Memory, and Disk I/O metrics |
There is some overlap with the Kubeletstats Receiver so if you decide to use both, it may be worth it to disable these duplicate metrics.
In order to correctly scrape node metrics, make sure to mount the hostfs
volume if you want to collect the actual node's metrics. You can inspect the configuration to see how the hostfs
volume is mounted. Configuration for hostmetrics
, in simplest form, looks as follows:
receivers:
hostmetrics:
root_path: /hostfs
collection_interval: 10s
scrapers:
cpu:
load:
memory:
disk:
filesystem:
network:
However, the full-fledged configuration for our tutorials requires some extra scraper configurtaions and metrics enabled. To inspect the full hostmetrics
configuration, see the relevant part of the collector configuration.
Let's take a look at our dashboard now. Open the node dashboard and you'll see information about the node of your choice:
Kubernetes components emit metrics in Prometheus format and it has built-in support for hundreds of useful metrics that help understand the health of containers, pods, nodes, services, and internal system components, such as kube-controller-manager
, kube-proxy
, kube-apiserver
, kube-scheduler
, and kubelet
. Most of the metrics for the key components come embedded with the Kubelet. For specific metrics related to these components, deployments of exporters like kube-state-metrics
, node-exporter
, and Blackbox Exporter
are required.
For our tutorial, we set up the OpenTelemetry collector to scrape these embedded metrics. The Prometheus upstream repository provides a helpful reference for configuring scraping, which you can find here. It contains the necessary configurations for discovering pods and services in your Kubernetes cluster. Our Prometheus receiver scrape configuration includes the following defined scrape jobs:
- kubernetes-apiservers: This job pulls in metrics from the API servers.
- kubernetes-nodes: It collects metrics specific to Kubernetes nodes.
- kubernetes-pods: All pods with annotations for scraping and port specifications are scraped.
- kubernetes-service-endpoints: All service endpoints with annotations for scraping and port specifications are scraped.
- kubernetes-cadvisor: This job captures metrics from cAdvisor, providing container metrics.
OpenTelmetry collector configuration -
To view the list of all scrape jobs and the count of active targets for each job, you can access Grafana Explore
To view Prometheus metrics for the Kubernetes API server, you can access the k8s API Server Dashboard