-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels #1106
Comments
@nantiferov The team has done a patch release to fix this issue in v0.32.8 aws/karpenter-provider-aws#5816. Can you try upgrading to v0.32.8 and see if this issue is fixed for you? |
Thank you @engedaam for update. Right now one of the clusters is updated to 0.35.2, which has this fix included and it still has errors like this in logs I will try to update another cluster (which is now on 0.32.8) to v0.32.8 to see if there are differences with consolidation. |
Can you share the |
Added. It probably was created by old version of Karpenter. Is it the reason of these errors?
|
Yes, that would seem so. Can you roll the node and see if that fixes your issue? |
Thanks, I will re-rollout them and check for errors and consolidation. |
Can confirm that after re-provisioning nodes from NodePool with new version of Karpenter related errors disappeared from logs. Consolidation also looks like working better, will observe it in next weeks. |
Description
Observed Behavior:
Apparently something happened from v0.32, which causes both old Provisioners and new NodePools not able to consolidate initially launched instances with smaller/cheaper ones. I tried update to latest v0.35.2 with same results.
Quick search around issues in this repo and https://github.com/aws/karpenter-provider-aws didn't help to find anything similar.
My use case is that I need EBS volume be relative to total memory on EC2 instance. Since it's impossible, I create couple of EC2NodeClass/NodePool with different volumeSize in blockDeviceMappings.
Expected Behavior:
Before v0.32, karpenter was consolidating launched nodes with smaller ones according to nodes utilisation. Right now it seems that both old Provisioners and new NodePools are not consolidated.
In karpenter logs I have this
{"level":"ERROR","time":"2024-03-15T16:35:54.861Z","logger":"controller.nodeclaim.consistency","message":"check failed, expected 20Gi of resource ephemeral-storage, but found 12561388Ki (59.9% of expected)","commit":"8b2d1d7","nodeclaim":"eng-8gb-7v6zh"}
and in node describe this:Reproduction Steps (Please include YAML):
Yaml are based on current setup and slightly simplified (removed tags, keep only 2 )
EC2NodeClass
NodePool
And then some deployment or couple of them to schedule on these nodes:
So before 0.32 karpenter was launching slightly bigger nodes in the beginning and then after ~30 min was optimising them. Right now it's launching suboptimal nodes and keeps them with aforementioned errors in logs and messages in events.
Versions:
kubectl version
): v1.28.5-eks-5e0fddeThe text was updated successfully, but these errors were encountered: