Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some ServiceMonitors deployed by the garden operator have label prometheus: seed instead of prometheus: garden #11270

Open
plkokanov opened this issue Jan 31, 2025 · 3 comments · May be fixed by #11318
Assignees
Labels
area/monitoring Monitoring (including availability monitoring and alerting) related kind/bug Bug

Comments

@plkokanov
Copy link
Contributor

How to categorize this issue?

/area monitoring
/kind bug

What happened:
Some ServiceMonitors are deployed with label prometheus: seed in the garden cluster by the gardener-operator. However, the garden Prometheus which is used to collect metrics for components in the garden cluster is configured to select ServiceMonitors with label prometheus: garden. This causes metrics for the corresponding components to not be collected. Additionally, these ServiceMonitors are deployed with the seed- prefix in their name instead of garden-:

kubectl -n garden get ServiceMonitors
NAME                                             AGE
aggregate-fluent-bit                             16m
aggregate-fluent-bit-output-plugin               16m
aggregate-vali                                   16m
cache-etcd-druid                                 16m
garden-alertmanager-garden                       9m51s
garden-gardener-admission-controller             11m
garden-gardener-apiserver                        14m
garden-gardener-controller-manager               11m
garden-gardener-dashboard                        9m52s
garden-gardener-discovery-server                 11m
garden-gardener-metrics-exporter                 11m
garden-gardener-operator                         16m
garden-gardener-scheduler                        11m
garden-virtual-garden-etcd-events                16m
garden-virtual-garden-etcd-main                  16m
garden-virtual-garden-kube-apiserver             15m
garden-virtual-garden-kube-controller-manager    14m
--------------------------------------------------------
seed-gardener-resource-manager                   16m
seed-vpa-admission-controller                    16m
seed-vpa-recommender                             16m
--------------------------------------------------------
shoot-virtual-garden-gardener-resource-manager   14m
kubectl -n garden get servicemonitors -l prometheus=seed
NAME                             AGE
seed-gardener-resource-manager   18m
seed-vpa-admission-controller    18m
seed-vpa-recommender             18m
kubectl -n garden get prometheus garden -o json | jq '.spec.serviceMonitorSelector'
{
  "matchLabels": {
    "prometheus": "garden"
  }
}

What you expected to happen:
ServiceMonitors to be deployed with the correct labels so that metrics for the corresponding components can be scraped by the garden Prometheus running in the garden cluster.

How to reproduce it (as minimally and precisely as possible):

  1. Create a local dev setup following the garden operator local development guide
  2. Create a Garden
  3. Observe that the mentioned ServiceMonitors above are not created with appropriate labels
  4. Port-forward to the prometheus-garden service using kubectl -n garden port-forward service/prometheus-garden 9090:80, open the prometheus UI on http://localhost:9090/ and notice that metrics for gardener-resource-manager, vpa-admission-controller and vpa-recommender are missing from the garden prometheus.

Anything else we need to know?:
The name and labels used for these ServiceMonitors are calculated using the following function:

When the garden cluster is also a seed cluster, GRM and vpa-{recommender,admission-controller} are not deployed as part of the Seed reconciliation. However, the seed's prometheus instance is deployed and it will start scraping the seed-gardener-resource-manager, seed-vpa-admission-controller and seed-vpa-recommender ServiceMonitors that were deployed as part of the Garden reconciliation.

Environment:

  • Gardener version:
  • Kubernetes version (use kubectl version):
  • Cloud provider or hardware configuration:
  • Others:
@gardener-prow gardener-prow bot added area/monitoring Monitoring (including availability monitoring and alerting) related kind/bug Bug labels Jan 31, 2025
@plkokanov
Copy link
Contributor Author

/assign

@ialidzhikov
Copy link
Member

/assign @vitanovs

@plkokanov
Copy link
Contributor Author

While looking into this with @vitanovs in more detail, we also saw that the following ServiceMonitors are also not being scraped due to their labels:

aggregate-fluent-bit                             14m     prometheus=aggregate,resources.gardener.cloud/managed-by=gardener
aggregate-fluent-bit-output-plugin               14m     prometheus=aggregate,resources.gardener.cloud/managed-by=gardener
aggregate-shoot-prometheus                       5m11s   prometheus=aggregate,resources.gardener.cloud/managed-by=gardener
aggregate-vali                                   14m     prometheus=aggregate,resources.gardener.cloud/managed-by=gardener
cache-etcd-druid                                 14m     prometheus=cache,resources.gardener.cloud/managed-by=gardener
shoot-virtual-garden-gardener-resource-manager   12m     prometheus=shoot

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/monitoring Monitoring (including availability monitoring and alerting) related kind/bug Bug
Projects
None yet
3 participants