Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Benchmark Prometheus restart scenarios (for WAL + snapshot cost and timing) #820

Open
bwplotka opened this issue Jan 31, 2025 · 2 comments

Comments

@bwplotka
Copy link
Member

bwplotka commented Jan 31, 2025

I propose we restart Prometheus-es during the standard prombench runs e.g.

  • graceful restart (kubectl pod delete) after 3h of prombench run.
  • forceful restart ((kubectl pod delete --grace-period=0) after 6h of prombench run (so 3h after first restart).

This allows us to test important Prometheus features like using checkpoints WAL and memory snapshots during replay that in the past were causing resource spike and can take some time. We also planned more work to improve this flow, so reliable metrics would be nice to have.

This killing logic could be implemented in scaler perhaps, which already has access to Kube API.

On top of that I would ensure we:

  • Add dashboard panel for startup time metric (if such metric does not exist we might want to add one (time to readiness).
  • Add some vertical lines/threshold in dashboards to show that the drop in all metrics is expected, or maybe another panel/metric? (This could be perhaps done with some events?).

WDYT? @bboreham @kakkoyun

@bwplotka
Copy link
Member Author

bwplotka commented Jan 31, 2025

I just used this technique manually for metadata in WAL feature to check if amount of WAL records in metadata makes a different during replay:

Image

Image

prometheus/prometheus#15907

@bwplotka bwplotka changed the title Benchmark Prometheus restart scenarios (for WAL + checkpointing cost and timing) Benchmark Prometheus restart scenarios (for WAL + snapshot cost and timing) Jan 31, 2025
@bboreham
Copy link
Member

Agreed with the basic idea, but doesn’t the current startup rebuild Prometheus?
We need to separate those things to get proper timing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants