-
Notifications
You must be signed in to change notification settings - Fork 216
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add consolidationPolicy: Underweight #1829
Comments
In general, this is the way that Karpenter should perform. Can you walk through the exact scenario that you are seeing? When Karpenter performs its simulations, it's going to consider the highest weight NodePool first (which should be your spot NodePool). Once it finds a scheduling decision, so long as the newer instance type is cheaper, it will consolidate it. All of this should be true in your scenario where you are moving back from on-demand to spot. |
/triage needs-information |
The behavior we observed was that the lower weighted nodes (which were the on-demand nodes) would never be consolidated. Over time, we end up with more lower weighted nodes than higher weighted nodes since they don't experience spot terminations. So most of our nodes are actually the on-demand nodes which have a lower weight. I wouldn't expect the consolidation to work as is, though. Our consolidationPolicy is set to "Empty". It would seem like a bug to me then, if Karpenter was consolidating those nodes for budgetary reasons. We cannot use the consolidationPolicy "Underutilized" because it results in a massive turnover of nodes which costs a ton through AWS Config costs. I'm proposing a 3rd reason to consolidate, which would be if a higher weighted node would be available to schedule on. Which, seems distinctly different to me than "Underutilized" which should consolidate if the total CPU/Mem requested could be reduced by shuffling pods and shutting down nodes. |
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity. |
@koreyGambill I feel we have been running into a similar situation, what seems to work for us at the moment is to use That said, we have also been feeling like a consolidation policy that captures the case of "I have both spot and on-demand instances in this nodepool and I want to only ever consolidate on-demand nodes back to spot instances" is missing. |
This issue has been inactive for 14 days. StaleBot will close this stale issue after 14 more days of inactivity. |
Description
What problem are you trying to solve?
We've created fallback on-demand NodePools with lower scheduling weight than our spot instance NodePools (AWS). When spot instances are hard to find, Karpenter schedules our (fallback) on-demand ec2, but it never consolidates back to the spot instances so it ends up being really expensive. I would love an official setting that allows Karpenter to consolidate based on weighted preferences rather than just utilization.
In this feature, if all the pods on a low-weight node are compatible with a higher-weight node, Karpenter should work to create the higher-weight node and re-schedule the pods. For us, it would help reduce costs, but in general it makes sense that users would care about using higher weighted nodes. I would expect this to still obey the
consolidateAfter
setting.Something like this could work in the yaml
How important is this feature to you?
Low-Medium - I have a workaround (setting the
on-demand
NodePool to expire after 4hrs), but it has a couple drawbacks.The text was updated successfully, but these errors were encountered: