-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Old unreachable revision is causing new pods to get created when it should scale down #14115
Comments
FYI - this is probably a duplicate of #13677 |
We can only guarantee a resource is
I'd rather we not change the semantics of |
Thanks @dprotaso for checking.
That's fine. The alternative that I had in mind is to change https://github.com/knative/serving/blob/v0.37.2/pkg/reconciler/autoscaling/kpa/kpa.go#L169 from if sks.IsReady() { to if sks.Status.GetCondition(nv1alpha1.ServerlessServiceConditionReady).IsTrue() { Then we can leave |
Let me think of the consequences of that change and get back to you |
The scenario at least sounds different. And the podautoscaler in unknown status seems to be perstent there for a longer time while it here is only in that status for a millisecond at most. Reminds me that I forgot to mention this ... eventually, in my scenario, an old revision would get fully cleaned up also without any code change. It just takes a few or many minutes and in the meantime many pods had been created and deleted. A much higher value of |
/area autoscale
What version of Knative?
Expected Behavior
When an old revision is unreachable, no more pods should be created for it
Actual Behavior
New (and many) pods are created if the old revision has a large minScale.
Steps to Reproduce the Problem
kn quickstart kind
(issue also happens outside of KinD)kn service create testsvc --image ghcr.io/knative/helloworld-go:latest --scale 9
kubectl
or a tool likek9s
kn service update testsvc --scale 1
After the second revision is ready, it starts to terminate the old pods, but also brings up many additional pods for the old revision. Depending on how long pods are stuck in Terminating you can reach more than 60 existing pods for the old revision.
--
The problem is the
IsReady
function of the ServerlessService which is returning false if the generation is different in metadata and status.Due to it temporarily not being ready, the PodAutoscaler is set to SKSReady=Unknown here.
The PodAutoscaler then goes Ready=False because SKSReady is part of Ready here.
The revision inherits the PodAutoscaler status and sets Active=Unknown here.
This causes the PodAutoscaler's Reachability to be set to Unknown here.
Once the PodAutoscaler is not anymore marked as unreachable, the scaling boundary will be set to the min value from the annotation again here. For an unreachable revision, it would otherwise always use min = 0 so that scaleDown happens. But now it increases it to 9 again.
I will open a PR in a few minutes that changes
IsReady
of the ServerlessService to only check the condition. This fixes the problem and I have not seen negative impact. Though, it undos the work of IsReady should take ObservedGeneration into account #8004 which sounds to me as if it tried to resolve an abstract issue while I here have a real one. Alternative could be to leaveIsReady
unchanged und look at the condition directly in kpa.go.So, I need somebody with experience to carefully assess if there are side-effects.
The text was updated successfully, but these errors were encountered: