Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OpenCost metrics interfere with OpenShift's "degraded control plane" detection? #249

Open
kastl-ars opened this issue Jan 23, 2025 · 7 comments

Comments

@kastl-ars
Copy link
Contributor

Dear OpenCost maintainers,

since last week we noticed that our OpenShift cluster show a degradation warning, as only 50% of the apiservers are responding.

Turns out this seems to be related to metrics exposed by OpenCost, scraped by Prometheus and then returned by the query used for this degradation detection.

We have explictly disabled the emission of pod annotations, namespace annotations and ksm V1 metrics and the error vanished.

  opencost:
    metrics:
      serviceMonitor:
        enabled: true
      kubeStateMetrics:
        emitPodAnnotations: false
        emitNamespaceAnnotations: false
        emitKsmV1Metrics: false

The following lines appeared in the deployment:

        - name: EMIT_POD_ANNOTATIONS_METRIC
          value: 'false'
        - name: EMIT_NAMESPACE_ANNOTATIONS_METRIC
          value: 'false'
        - name: EMIT_KSM_V1_METRICS
          value: 'false'

I would like to see this added to the documentation that @mittal-ishaan was working on IIRC.

The query that went wrong was this:

count(kube_pod_labels{label_app="openshift-kube-apiserver", label_apiserver="true", namespace="openshift-kube-apiserver" })

Before we introduced the workaround described above, this returned 6 pods, while only three were really running. Hence the degradation warning as only 50% were working...

Kind Regards,
Johannes

@kastl-ars
Copy link
Contributor Author

Hmmm, this seems to completely break any cost calculation in OpenCost. After setting this, there are no more metrics visible. I enabled the emitNamespaceAnnotations again, let's see if this changes something...

@kastl-ars
Copy link
Contributor Author

kastl-ars commented Jan 24, 2025

Hmmm, this seems to completely break any cost calculation in OpenCost. After setting this, there are no more metrics visible. I enabled the emitNamespaceAnnotations again, let's see if this changes something...

Even after re-enabling emitNamespaceAnnotations (by removing setting the attribute to false in the values.yaml) and the emitPodAnnotations a little later I can no longer see any costs in OpenCost for the last couple of hours. Removing the disabling of the emitKsmV1Metrics makes OpenCost show values again almost instantaneously, but also the OpenShift degradation warning is back...

@kastl-ars
Copy link
Contributor Author

As just stated in #252 I am not sure if this issue should rather go to the opencost repository, as it seems (to me, with the knowledge I have today...) like not just a problem of disabling some things on OpenShift, but a general problem of OpenCost not working on OpenShift without interfering with OpenShift itself?

@mittal-ishaan
Copy link
Contributor

Hi @kastl-ars
I will look into this issue. Can you tell me the opencost version that you are using right now?

@kastl-ars
Copy link
Contributor Author

Thank you! We are using the latest chart version 1.43.1.

@kastl-ars
Copy link
Contributor Author

The more pressing issue would be #252 as a wrong CPU count sounds more problematic. But my guess is they are related...

@mittal-ishaan
Copy link
Contributor

Thank you,
Sure. Let me check that too

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants