Metrics API usability: instrument naming and do we need all of them? #1368
Replies: 2 comments 5 replies
-
As a user, what has been most useful to me have been examples, ideally multiple so that everyone can relate to at least one. With Prometheus, for example, it already needed some explaining how you can measure CPU utilization (percent) with a counter. But it was also a very useful example, because it's something that every software engineer understands as a measurable thing. Showing how the "CPU seconds used" transforms to a percentage illustrated both the use of non-integer counters for timing, and how to use rate windowing to average over different time frames, on demand, from the same source metric. Another useful source of examples is the weather. Everyone who has been outside lately can relate to that, and it has lots of quantities with different properties: temperature goes up and down, you can really only measure it at a point in time. Rain does not un-fall, so you can represent it with a counter that only goes up. Moisture on the ground does go up (through rain) and down (through evaporation) but you can represent it with two counters. |
Beta Was this translation helpful? Give feedback.
-
@matthiasr's example I think captured a good point that an UpDownCounter can (always?) be represented with two Counters (ups and downs separate). Taking an example for UpDownCounter from the API spec of "count queue size by instrumenting
The added benefit of option 2 is you can see the churn in queue size, even with a long duration between metric points. "There were 100 enqueues and 99 dequeues in the last 5 minutes" vs. "the queue size is 1". Downside ofc is producing a second metric. Maybe it is also easier to overflow a counter this way? |
Beta Was this translation helpful? Give feedback.
-
Naming of instruments is not intuitive, even to those familiar with metrics, see the list of the current instruments:
Counter
UpDownCounter
ValueRecorder
SumObserver
UpDownSumObserver
ValueObserver
Can we improve on this and have better compatibility with prior art?
Widely adopted, GA metrics libraries exist - it’s not clear what intentions are to interoperate with prior art beyond the never-GA OpenTracing and OpenCensus libraries. For example, Prometheus, Micrometer, Dropwizard Metrics, etc.
Do we need all these types in practice?
A minimal surface area for the 1.0 release would be better than defining too much API (that is perhaps based on theory more than practice). API can be added later but removal is much harder.
We touched this topic on the Metrics Workshop, several people agreed that we do need to focus on what the users need. Let me cc a few people who I know shared some opinions about the topic:
@shakuzen @brian-brazil @RichiH @bogdandrutu @alolita @jmacd @tedsuo @juliusv @rakyll @ebullient @tomwilkie @matthiasr @jsuereth
Beta Was this translation helpful? Give feedback.
All reactions