-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cron schedule off-by-one error in desiredReplicas scaling #5263
Comments
If the reported metric is correct, then we should probably check HPA. It is doing the final calculation, maybe some rounding problem? |
I see no errors on the HPA itself, I also increased the maxReplicas to be 90 to rule that out as a potential edge case. |
I think that this issue is related with rounding and tolerance problems. Once you've reached 88 instances, you won't never reach 89 because the variation (1012m over 1000m) is 0,012% as the HPA tolerance is 10%. |
A candidate for Troubleshooting/FAQ? |
Could be |
Understood, appreciate this is a known (or at least expected) issue, happy to take a stab at updating the docs if that would be helpful. |
@OfficiallySomeGuy that would be great! |
Reference kedacore/keda#5263, adding some more documentation for guiding debugging, and highlighting known issues Signed-off-by: OfficiallySomeGuy <[email protected]>
Raised kedacore/keda-docs#1298, apologies for the delay. I request we leave this issue open to track the fact this is unexpected behaviour. |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed in 7 days if no further activity occurs. Thank you for your contributions. |
This issue has been automatically closed due to inactivity. |
Report
We are seeing an issue where KEDA does not consistently set the correct
desiredReplicas
in the ScaledObject HPA. We are using this with a fallback metric which should scale based on load if the cron schedule fails to behave as expected. We run a large deployment of multiple Kubernetes clusters, and see this come up occasionally in multiple clusters.Below is a redacted YAML output of the ScaledObject which shows our cron schedule set to 89 during the US peak.
And the HPA is below
The interesting thing I see about the HPA, is that the metric for
s0-cron-America-Los_Angeles-504xx1-5-518xx1-5
correctly identifies it needs to be 1.012x the currentdesiredReplicas
(1.012x88=~89)."s0-cron-America-Los_Angeles-504xx1-5-518xx1-5" (target average value): 1012m / 1
The conditions on the HPA also seem reasonable
Looking for some help or advice debugging this further, unfortunately a large number of clusters we run are running 1.21 and therefore need to run this older KEDA version, but I can't find a similar issue scouring the known issues or in the changelog, so therefore believe this may also be an issue in the current mainline.
Expected Behavior
The HPA desiredReplicas to be correctly set to 89
Actual Behavior
The HPA set desiredReplicas to 88 (off by one), however metric appears correct
Steps to Reproduce the Problem
This issue appears to be intermittent, however we are seeing it on multiple clusters.
Logs from KEDA operator
With debug logging enabled, KEDA seems to be running happily
KEDA Version
2.8.1
Kubernetes Version
< 1.23
Platform
Other
Scaler Details
No response
Anything else?
No response
The text was updated successfully, but these errors were encountered: