-
Notifications
You must be signed in to change notification settings - Fork 41
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Prometheus metrics #419
Conversation
68723ee
to
a1e9262
Compare
77b1e74
to
aa1ef30
Compare
0e7b9d5
to
b3bf2d5
Compare
b3bf2d5
to
7542e8a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great! I had 1 inline question, and there's other similar examples that I didn't add the same question.
As an aside, have we thought about other providers / services besides prometheus? Would we be able to use OpenTelemetry to achieve the same thing without being tied specifically to prometheus?
_ = promauto.NewGaugeFunc(prometheus.GaugeOpts{ | ||
Namespace: promNamespace, | ||
Subsystem: promSubsystem, | ||
Name: "jobs_running", | ||
Help: "Current number of running jobs according to deduper", | ||
}, func() float64 { return float64(jobsRunningGaugeFunc()) }) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
With the _ =
is this merely a template for adding the rest of the vars below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's _
merely because this gauge doesn't need to be referred to within the package later on - promauto
registers the metric, then the value is obtained during metric scrape through the callback (unlike most of the other metrics, where the value is set or incremented or added or observed on the metric).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Excellent, thank you for the explanation!
Not particularly. I'm more familiar with Prometheus than anything else 🤷
I believe so. The types of metrics defined here are a subset of the types supported in OTel. Moving across shouldn't be too difficult given the OTel Go SDK has a Prometheus Exporter. |
What
Adds a Prometheus metrics endpoint, and a whole heap of metrics.
Why
Fixes #102.
Addresses the metric part of #278.
Among other reasons, extra observability into the controller will likely be needed to keep digging into #302.
Show me the charts
Two ways to get the number of jobs the deduper thinks is running:
Available limiter tokens:
Median time between querying a job and scheduling it: