You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
nearcore has a lot of metrics which are scraped by prometheus and presented on Grafana dashboards. These metrics provide some insight into what's going on in the binary, but they are not very precise. All metrics are currently aggregated over 1-minute periods, which is around 60 blocks. Aggregating over this many blocks provides high level information, but it doesn't give much insight into individual chunks.
It would be great to have detailed metrics for every chunk - how many transactions were processed, how much gas was burned at each stage of chunk application, how many receipts were forwarded, which limits were hit, etc, etc. I would love to be able to use a neard command/debug-ui page to view detailed chunk application metrics about any chosen chunk.
Advantages:
Easier to analyse runtime performance and receipt flow Currently I often find myself adding random logs throughout the runtime to understand which receipts are being processed and how, but this is pretty annoying. It would be much better to view generic chunk application metrics and get information from there. There are a lot of performance and scalability improvements on the roadmap, per-chunk metrics would be very helpful for that.
Easier to debug issues With aggregated metrics it's often hard to say what happened at the level of individual block or chunk. These metrics could provide additional insight into issues that happen for a block or two. (For example we could see that witness size was huge for one chunk on epoch boundary)
Availability in tests Prometheus metrics are not available in integration tests, which makes it hard to debug them. Per-chunk metrics could be stored in the local database and printed out during tests, providing more information about what's going on in the test.
Custom aggregators Grafana works fine, but I personally don't like the PromQL langauge. It's hard to make it do what I want, and I'm never really sure if I'm really aggregating things the way I wanted to. With per-chunk metrics it would be easy to write custom aggregation code in Rust and present that in some way.
The text was updated successfully, but these errors were encountered:
nearcore
has a lot of metrics which are scraped by prometheus and presented on Grafana dashboards. These metrics provide some insight into what's going on in the binary, but they are not very precise. All metrics are currently aggregated over 1-minute periods, which is around 60 blocks. Aggregating over this many blocks provides high level information, but it doesn't give much insight into individual chunks.It would be great to have detailed metrics for every chunk - how many transactions were processed, how much gas was burned at each stage of chunk application, how many receipts were forwarded, which limits were hit, etc, etc. I would love to be able to use a neard command/debug-ui page to view detailed chunk application metrics about any chosen chunk.
Advantages:
The text was updated successfully, but these errors were encountered: