You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I propose we restart Prometheus-es during the standard prombench runs e.g.
graceful restart (kubectl pod delete) after 3h of prombench run.
forceful restart ((kubectl pod delete --grace-period=0) after 6h of prombench run (so 3h after first restart).
This allows us to test important Prometheus features like using checkpoints WAL and memory snapshots during replay that in the past were causing resource spike and can take some time. We also planned more work to improve this flow, so reliable metrics would be nice to have.
This killing logic could be implemented in scaler perhaps, which already has access to Kube API.
On top of that I would ensure we:
Add dashboard panel for startup time metric (if such metric does not exist we might want to add one (time to readiness).
Add some vertical lines/threshold in dashboards to show that the drop in all metrics is expected, or maybe another panel/metric? (This could be perhaps done with some events?).
I propose we restart Prometheus-es during the standard prombench runs e.g.
This allows us to test important Prometheus features like using checkpoints WAL and memory snapshots during replay that in the past were causing resource spike and can take some time. We also planned more work to improve this flow, so reliable metrics would be nice to have.
This killing logic could be implemented in
scaler
perhaps, which already has access to Kube API.On top of that I would ensure we:
WDYT? @bboreham @kakkoyun
The text was updated successfully, but these errors were encountered: