Performance: Replay Stage #4270

bw-solana · 2025-01-03T20:26:27Z

Top level of replay stage is completely serialized in how it marches through a ton of steps including:

Generate new bank forks
Replay active banks
Reset dead slots
Check for newly confirmed slots (from gossip)
Ingest verified gossip votes (important for fork choice)
Remove duplicated slots from fork choice
Compute bank stats
Compute slot stats
Select fork based on heaviest bank/subtree
Select vote and reset forks
Heaviest fork failures
Vote on a fork
Reset onto a fork (if necessary)
Dump then repair correct slots
Retransmit latest unpropagated leader slot
Maybe start leader
Wait for signal (from blockstore)
Report timing metrics

It should be spending most of its times doing 2 above. It would be nice to separate things such that replaying banks could be done in some thread pool while the rest of the management things are performed separately.

The current time split on mainnet looks something like:
57% replaying, 38% idle (waiting for shreds), 5% on other

Generating bank forks is one activity prone to spikes (maybe lock contention?).

bw-solana added this to Agave Performance Jan 3, 2025

bw-solana moved this to Backlog in Agave Performance Jan 3, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance: Replay Stage #4270

Performance: Replay Stage #4270

bw-solana commented Jan 3, 2025 •

edited

Loading

Performance: Replay Stage #4270

Performance: Replay Stage #4270

Comments

bw-solana commented Jan 3, 2025 • edited Loading

bw-solana commented Jan 3, 2025 •

edited

Loading