You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
SSV Node emits many metrics, collecting these metrics might be a heavy task or require a lot of resources to store. Some users have expressed that they would like to only receive certain metrics. Currently it can be turned off by passing the metrics port as 0 but this will disable all metrics reporting as a whole.
Task
Introduce a metrics granularity system. Where some metrics can be turned off and not show up in the metrics endpoint.
Ideas
Do it based on prefixes or regex. NOTE: this might require to change some metric names to b able to create such system.
Example: passing a metrics filter with value ssv:consensus will only publish metrics that start with this prefix. ssv:* will do all metrics that is under ssv.
I think this is a good idea. I advise against the : delimiter for metrics, it's usually used for recording rules and not Prometheus metrics themselves. Recording rules are just metrics computed from other metrics by Prometheus.
The main thing to watch out for is cardinality, we can't have many possible values for a given label. See this article on cardinality. some_metric{peer="0x1234"} <- this is an anti-pattern that should be avoided where possible.
Historically, the way some Ethereum clients have solved this, is to limit via flag the amount of validators you are keeping metrics for, with some sensible default.
After all, the most important part is that we are flushing metrics, which I think is currently not the case. The default behavior of Prometheus is to persist metrics in-memory, there's no "garbage collection". If at some point you see some peer 123 and update a metric, and this peer goes offline and comes online with some other peer ID 321, the current /metrics endpoint would register:
At time0:
metric{peer=123} 0
At time1:
metric{peer=123} 0
metric{peer=321} 0
Now picture that happening with many peers and many metrics... it sends Prometheus memory consumption into overdrive. We need to use something like DeletePartialMatch to keep an up to date snapshot of peer metrics.
Description
SSV Node emits many metrics, collecting these metrics might be a heavy task or require a lot of resources to store. Some users have expressed that they would like to only receive certain metrics. Currently it can be turned off by passing the metrics port as 0 but this will disable all metrics reporting as a whole.
Task
Introduce a metrics granularity system. Where some metrics can be turned off and not show up in the metrics endpoint.
Ideas
Example: passing a metrics filter with value
ssv:consensus
will only publish metrics that start with this prefix.ssv:*
will do all metrics that is underssv
.Call for feedback
Long before we implement anything like this, would love to hear input from the team, especially @zktaiga @oleg-ssvlabs @vaclav-ssvlabs
The text was updated successfully, but these errors were encountered: