Ensure equal query load for fair efficiency comparisons. #822

bwplotka · 2025-01-31T12:13:58Z

Discussed in https://cloud-native.slack.com/archives/C07TT6DTQ02/p1737726558098399

This happens from time to time where HTTP requests total metric is different across Prometheus-es. Looking on the current loadgen-querier code, I think we can easily hit this case where one Prometheus is slow enough that it makes groups not only go off sync (different queries called in different times vs other Prometheus), but more relevant all queries in the group are slower in total than the configured interval, so load is decreasing for that one Prometheus. In turn it might look like Prometheus is doing just fine vs it's just delaying loadgen scheduling.

Couple of things we could do:

Have a metric that shows loadgen delays caused by (potentially) slow Prometheus
Cancel too long groups (context deadline equal to group interval). This makes the load “easier” on slow Prometheus which might hide the perf issues.
Start new group queries no matter if the previous group iteration finished or now. This will give us the most “fair” and equal load for both Prometheus-es. However this will lead to cascading failures on both slow Prometheus and loadgen itself spamming goroutines.

It feels 1 is some improvement (comparing delays across clients), but isn't this equivalent to server total requests counter rate going down? 🤔

Doing (3) and allowing whole Prometheus slowly timeout things and/or OOM/get slower is a fair approach, but how to not starve the client (if it' s getting slower or OOMing it might "recover" Prometheus situation).

What stopping us from assuming that dropping request count == Prometheus is too slow? cc @bboreham

The text was updated successfully, but these errors were encountered:

bwplotka · 2025-01-31T12:16:32Z

For the record, CPU load is quite low on loadgen-queriers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Ensure equal query load for fair efficiency comparisons. #822

Ensure equal query load for fair efficiency comparisons. #822

bwplotka commented Jan 31, 2025

bwplotka commented Jan 31, 2025

Ensure equal query load for fair efficiency comparisons. #822

Ensure equal query load for fair efficiency comparisons. #822

Comments

bwplotka commented Jan 31, 2025

bwplotka commented Jan 31, 2025