-
Notifications
You must be signed in to change notification settings - Fork 290
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Inaccurate Bucket Intervals for block_interval_seconds Metric #1307 #1308
base: main
Are you sure you want to change the base?
fix: Inaccurate Bucket Intervals for block_interval_seconds Metric #1307 #1308
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would almost feel like using a gauge is better than a histogram to see the block interval.
Adjust buckets as per @cmwaters suggestion. Co-authored-by: Callum Waters <[email protected]>
Hi @cmwaters, I think Histogram for this metric is overkill, but if we were to change it, I believe a Summary metric type would make more sense. A Gauge is useful for point-in-time status. A Summary metric gives us better visibility of data over time. This makes summaries more suited for understanding trends and variations in block intervals. I'm open to changing the metric type; it's not hard to do, and the previous state of this metric made the resulting data in Prometheus useless anyway. But with the change in this PR, the histogram metric type is now also serviceable. :) (I posted this reply in the wrong place yesterday, reposting it on the main thread instead.) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I think we'll merge this as is and look to revise it at a later date. We should also keep in mind that if we were to ever make changes to the block time than this would need to be updated
Description
This fix sets appropriate histogram buckets for the
celestia_consensus_block_interval_seconds_bucket
Prometheus metric.It addresses issue #1307
I also took a shot at improving the HELP description of the metric. I don't know if the metric records the time based on when the node sees the block or if it measures the time between blocks based on block timestamps. If it is the latter, I should adjust the HELP description to reflect that.
Here's a screenshot of what a Grafana graph looks like before/after updating this metric. Based on mochanet data.
Metrics before the change look like this;
Metrics after the fix look like this
PR checklist
I have not completed the checklist tasks. I can do these tomorrow where relevant
.changelog
(we useunclog to manage our changelog)
docs/
orspec/
) and code comments