-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aws-node-termination-handler pod is stuck in pending right after "kops rolling-update cluster --yes" #16870
Comments
I have tried killing the running pod, and now I again have one pod running and one pending:
|
Hi @stl-victor-sudakov Most of times , that means the new controller nodes are not joined to the cluster properly and that's why scheduler could not deploy the service on the target nodes . |
@nuved I think I have already posted the error message above but I don't mind repeating, the relevant part of "kubectl -n kube-system describe pod aws-node-termination-handler-577f866468-bj4gd" is
There is actually only one control node in the cluster. Is there any additional information I could provide? UPD the complete "describe pod" output can be seen here: https://termbin.com/0sy6 (not to clutter the conversation with excessive output). |
Well, that means there are not enough nodes. You should make sure if all
nodes are up and ready . Kubectl get nodes -o wide
I guess one of the controllers has an issue .
…On Fri, Oct 4, 2024, 4:15 PM Victor Sudakov ***@***.***> wrote:
@nuved <https://github.com/nuved> I think I have alredy posted the error
message above but I don't mind repeating, the relevant part of "kubectl -n
kube-system describe pod aws-node-termination-handler-577f866468-bj4gd" is
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 13m (x6180 over 2d3h) default-scheduler 0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 node(s) didn't match Pod's node affinity/selector. preemption: 0/5 nodes are available: 1 node(s) didn't have free ports for the requested pod ports, 4 Preemption is not helpful for scheduling.
There is actually only one control node in the cluster. Is there any
additional information I could provide?
—
Reply to this email directly, view it on GitHub
<#16870 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/ABCQEOGEDNMJV4VIZVKC2DLZZ2PG3AVCNFSM6AAAAABPHPKE2SVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJTHAYTOOBYGI>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
It is a single-control-plane cluster. Also:
|
There is exactly one instance i-02cf4b0fed779eb54 in the control-plane-us-west-2c.masters.dev2XXXXX AWS autoscaling group, it is healthy according to AWS. |
Probably you just need to adjust the replica set manually , set it to 1 . I'm not sure how you can change the replica size via kops . but it should be work. |
Manually deleting the replicaset which had contained the old aws-node-termination-handler pod did the trick (the pod was finally replaced), but this should happen automatically and not prevent "kops rolling-update cluster" command from running smoothly. |
It solved my prolem! Many thanks! |
@axclever Actually it was not my idea, I received this advice in the kOps office hours. It is a problem which manifests itself only in clusters with a single control-plane node. |
/kind bug
1. What
kops
version are you running? The commandkops version
, will displaythis information.
1.30.1
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.v1.29.3
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
kops upgrade cluster --name XXX --kubernetes-version 1.29.9 --yes
kops --name XXX update cluster --yes --admin
kops --name XXX rolling-update cluster --yes
5. What happened after the commands executed?
Cluster did not pass validation at the very beginning of the upgrade procedure:
When I looked up why the pod was pending, I found the following in "describe pod aws-node-termination-handler-577f866468-mmlx7":
There is another aws-node-termination-handler- pod running at the moment (the old one):
6. What did you expect to happen?
I expected the cluster to be upgraded go Kubernetes 1.29.9
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
Please see above the validation log.
9. Anything else do we need to know?
Now I would like to know how to recover from this situation and how to get rid of the aws-node-termination-handler-577f866468-mmlx7 pod which is now left in Pending state.
The text was updated successfully, but these errors were encountered: