Per-chunk metrics #12758

jancionear · 2025-01-20T13:05:16Z

nearcore has a lot of metrics which are scraped by prometheus and presented on Grafana dashboards. These metrics provide some insight into what's going on in the binary, but they are not very precise. All metrics are currently aggregated over 1-minute periods, which is around 60 blocks. Aggregating over this many blocks provides high level information, but it doesn't give much insight into individual chunks.

It would be great to have detailed metrics for every chunk - how many transactions were processed, how much gas was burned at each stage of chunk application, how many receipts were forwarded, which limits were hit, etc, etc. I would love to be able to use a neard command/debug-ui page to view detailed chunk application metrics about any chosen chunk.

Advantages:

Easier to analyse runtime performance and receipt flow Currently I often find myself adding random logs throughout the runtime to understand which receipts are being processed and how, but this is pretty annoying. It would be much better to view generic chunk application metrics and get information from there. There are a lot of performance and scalability improvements on the roadmap, per-chunk metrics would be very helpful for that.
Easier to debug issues With aggregated metrics it's often hard to say what happened at the level of individual block or chunk. These metrics could provide additional insight into issues that happen for a block or two. (For example we could see that witness size was huge for one chunk on epoch boundary)
Availability in tests Prometheus metrics are not available in integration tests, which makes it hard to debug them. Per-chunk metrics could be stored in the local database and printed out during tests, providing more information about what's going on in the test.
Custom aggregators Grafana works fine, but I personally don't like the PromQL langauge. It's hard to make it do what I want, and I'm never really sure if I'm really aggregating things the way I wanted to. With per-chunk metrics it would be easy to write custom aggregation code in Rust and present that in some way.

The text was updated successfully, but these errors were encountered:

walnut-the-cat · 2025-01-21T14:20:05Z

Could you add high level engineering cost?

jancionear · 2025-01-21T17:00:50Z

Could you add high level engineering cost?

I'd say about a week of work to get the basic functionality going. Things like integration with debug-ui or custom aggregators can be added later.

jancionear added A-observability Area: observability C-enhancement Category: An issue proposing an enhancement or a PR with one. labels Jan 20, 2025

jancionear self-assigned this Jan 20, 2025

jancionear mentioned this issue Jan 24, 2025

feat: add chunk application stats #12797

Open

github-actions bot mentioned this issue Feb 1, 2025

Monthly issue metrics report #12859

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Per-chunk metrics #12758

Per-chunk metrics #12758

jancionear commented Jan 20, 2025 •

edited

Loading

walnut-the-cat commented Jan 21, 2025

jancionear commented Jan 21, 2025

Per-chunk metrics #12758

Per-chunk metrics #12758

Comments

jancionear commented Jan 20, 2025 • edited Loading

walnut-the-cat commented Jan 21, 2025

jancionear commented Jan 21, 2025

jancionear commented Jan 20, 2025 •

edited

Loading