-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Karpenter provisioned nodes become "NotReady" #4200
Comments
discussion context (cross referencing) https://kubernetes.slack.com/archives/C02SFFZSA2K/p1688048847670199 |
Is the node going NotReady by itself, pods are evicted and then Karpenter removes the node? -or- Does Karpenter start to deprovision the node and then cordon/drain it? Trying to determine if its the node going bad, or just standard consolidation. |
@tzneal I'm not sure. We're currently not logging kubectl / node level logs so LMK if that's relevant and i'll make sure to catch it the next time, as this issue is a recurring problem for us, or if there's a different source for retrieving this information? In which case I'd appreciate guides for how to fetch. |
Yes, when the node goes NotReady can you capture the logs with this log collector tool and supply them? It may contain sensitive data, so you can submit it as a support ticket. https://github.com/awslabs/amazon-eks-ami/tree/master/log-collector-script/linux |
We've analyzed this issue internally. I don't think it's karpenter related directly. The nodes have been terminating due resource over utilization, both memory and CPU. In both cases the kubelet process would become non responsive, which would eventually lead to not being marked unhealthy and than recycled by karpenter. The problem is this being a To explain why it only started to appear with the introduction of karpenter - the This turned it practically unusable for us in default mode, to work around it we've results to hard coding the lower bound of the node selection for the provisioner which works for us (for now).
The long term solution I believe should be ability to affect the provisioner resource calculation via a "profile", so that you will have as an example:
@tzneal what's your take on this? (btw, cool script. ty!) |
I think the best way to get around this is to set request limits on pods to larger values. You can refer to this issue if you want to dynamically size the kubeReserved resources based on the instance type. |
Generally, you should be setting request and limits as Kubelet will allow workloads to burst up to the limits value that is configured. The case were users don't define limits workloads will consume as much resource as they need. The KubeReserved and systemReserved are maintained when calculating how much resource can fit on a node, not maximum resource that can be use by a pod: https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#requests-and-limits This can result in some workloads consuming more resource then scheduler is expecting. The request and limits of the node would suggest that the pods could have been bursting beyond the capacity of what is available. The provisioning nodes Karpenter considers the resource requests and not the resources limits. Setting systemReserved may help, however the customer should consider adjusting their resources request and limits for their pods. If a container exceeds its memory request and the node that it runs on becomes short of memory overall, it is likely that the Pod the container belongs to will be evicted. A container might or might not be allowed to exceed its CPU limit for extended periods of time. However, container runtimes don't terminate Pods or containers for excessive CPU usage. If these pods were bursting and using more memory then requested, we can see this behaviors describe by the customer. |
Description
Observed Behavior:
We are running EKS 1.24 dev cluster consisting of 6 nodes:
m5.4xlarge
aimed for dev deployment workloadsr6i.large
tainted for stsThe environment is moderately volatile, consisting of CI jobs and other CronJob as well as ~20-30 dev environments (using namespaces), each consisting of ~10 Deployment(scale=1) and 1 DaemonSet k8s objects. Totalling in ~15 pods.
We've introduced Karpenter 0.28.1 to assist with dynamic load, so that we can provision additional capacity.
The problem we are seeing is that while the "EKS Node Group" based nodes remain stable, the Karpenter nodes tend to get to:
Which than turns into
node.kubernetes.io/unschedulable:NoSchedule
Which than gets killed by the Karpenter controller.
Expected Behavior:
Karpenter provisioned nodes should not become Unschedulable.
Reproduction Steps (Please include YAML):
I have managed to catch a snapshot of the a node going through this cycle. Please see a gist of
kubectl describe node
here taking 2 snapshots: 1st when node is marked as NoSchedule and the 2nd when it's already in the process of being terminated by Karpenter https://gist.github.com/maximveksler/48e303dc5782c90d7c6d4b5b167351f2Provider spec:
Full the complete installation steps, please see https://gist.github.com/maximveksler/38ec0cefa0ca2acccab748e71e5aebc0
Versions:
kubectl version
):The text was updated successfully, but these errors were encountered: