Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Performance: Replay Stage #4270

Open
bw-solana opened this issue Jan 3, 2025 · 0 comments
Open

Performance: Replay Stage #4270

bw-solana opened this issue Jan 3, 2025 · 0 comments

Comments

@bw-solana
Copy link

bw-solana commented Jan 3, 2025

Top level of replay stage is completely serialized in how it marches through a ton of steps including:

  1. Generate new bank forks
  2. Replay active banks
  3. Reset dead slots
  4. Check for newly confirmed slots (from gossip)
  5. Ingest verified gossip votes (important for fork choice)
  6. Remove duplicated slots from fork choice
  7. Compute bank stats
  8. Compute slot stats
  9. Select fork based on heaviest bank/subtree
  10. Select vote and reset forks
  11. Heaviest fork failures
  12. Vote on a fork
  13. Reset onto a fork (if necessary)
  14. Dump then repair correct slots
  15. Retransmit latest unpropagated leader slot
  16. Maybe start leader
  17. Wait for signal (from blockstore)
  18. Report timing metrics

It should be spending most of its times doing 2 above. It would be nice to separate things such that replaying banks could be done in some thread pool while the rest of the management things are performed separately.

The current time split on mainnet looks something like:
57% replaying, 38% idle (waiting for shreds), 5% on other

Generating bank forks is one activity prone to spikes (maybe lock contention?).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Backlog
Development

No branches or pull requests

1 participant