-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support handling custom kube reserved values #1803
Comments
CC @bwagner5 |
We'll be setting ephemeral storage accurately soon. But agreed on Karpenter respecting a custom kube-reserved and, in general, overhead values for nodes. |
Would you take a PR to use the GKE logic when |
We're going to respect custom kube-reserved values. We might just do it as part of Changing the memoryReserved logic might surprise our other users so I'm hesitant to make that change. |
@stevehipwell I guess respecting a static value wouldn't help you much since you still need it to be a dynamic calc? |
@suket22 having it available as part of
I'm yet to see any documentation as to how the value was calculated and the goalposts have moved since then so I'm not sure why you're sticking with it as a calculation? Karpenter is pre v1 so each release is technically a breaking change. But new logic could be added under a new env variable to act as a pair to @bwagner5 doesn't Karpenter know the instance type when it calculates the reserved values? I think it currently uses the number of CPUs. |
Yes, Karpenter knows the instance type and all specs of it, but it would need to be a calculation within the controller. |
@bwagner5 you're absolutely right! So I need the alternative algorithm and to make it match my user data. |
Good point on a static value in the provisioner not helping here. But does a calculation inside Karpenter help either? It sounds to me that the right place this alternative calculation should exist is within the EKS AMI itself. In the short term, I think the right approach for Karpenter is probably letting you set these values within a UserData script. |
@suket22 if Karpenter can calculate the CPU from the instance CPUs then it can calculate the memory as it's a function of instance memory. No matter what the calculation used is you're going to either need a deterministic calculation based on CPUs & memory that can be run on both sides, or a way to pass the calculated values back from the instance. |
@bwagner5 where have we got with this? Not being able to configure our nodes to use the GKE calculation for kube-reserved memory is blocking our GA adoption of Karpenter. |
After seeing the new spec for AKS reservations I've realised that this implementation should be flexible. I've got a proposed API for handling kube-reserved configuration as part of the CurrentOnly supports single node size per pool. kubelet:
kubeReserved:
cpu: 200m
memory: 100Mi CPU Core BasedSupports EKS current algorithm.
kubelet:
kubeReserved:
cpuCalc:
reserved: ["600m", "100m", "50m", "50m", "25m"] Memory Pod BasedSupports current EKS algorithm.
kubelet:
kubeReserved:
memoryCalc:
mode: pods
multiple: 11m
constant: 255m
max:
min: New AKS pattern. kubelet:
kubeReserved:
memoryCalc:
mode: pods
multiple: 20m
constant: 50m
max: 25%
min: Memory System Memory BasedSupports GKE algorithm (example shows up to 16GB memory).
kubelet:
kubeReserved:
memoryCalc:
mode: memory
reserved: ["256Mi", "256Mi", "256Mi", "256Mi", "205Mi", "205Mi", "205Mi", "205Mi", "102Mi", "102Mi", "102Mi", "102Mi", "102Mi", "102Mi", "102Mi", "102Mi"]
max:
min: 255Mi NOTE - I updated this comment to fix the API to avoid ambiguous types. |
Just doing a cursory look over the proposed API, this strikes me as really difficult for someone to reason about how Karpenter is actually going to reason about the way that Karpenter is going to go about calculating these values. I still find myself leaning towards a static value approach (something similar but maybe not looking exactly like the instance type overrides approach) where some other component is responsible for reconciling those static values in a way that allows scenarios like the ones that you have called-out above |
@jonathan-innis the problem here is that we need to model a complex system so the API will be complex; if you need this functionality then I think you'd be able to understand it.
Yes that could work, are you thinking named algorithms. In a dynamic reserved scenario it'd be pretty easy to get it working in AL2 but for Bottlerocket we need a bootstrap container bottlerocket-os/bottlerocket#2010. Ideally published by Bottlerocket but Karpenter could create it's own as within AWS this would have trivial cost as it should just be the public ECR image with an alternate entrypoint. All of that said I've had some more thoughts on this whole area and I think the following logic describes the whole system.
I think the current EKS memory calculation is probably the worst of all worlds as it's based on ENI limits for pod density and Docker as the container runtime. I think the GKE memory calculation covers most of the above logic but isn't perfect; it's based on Docker, all dynamic reserved are in kube reserved & small nodes may still not have enough memory reserved. I think the current CPU calculation (all are the same) might be able to be simplified as I don't think it's a function of available cores. TL;DR - Maybe static values would work on nodes over a certain size but we'd still need a pattern for small nodes unless using pod overhead. |
We're currently experimenting running static systemReserved:
cpu: 100m
memory: 100Mi
ephemeral-storage: 1Gi
kubeReserved:
cpu: 100m
memory: 1465Mi
ephemeral-storage: 1Gi |
Tell us about your request
I'd like Karpenter to work correctly when not using the EKS optimised AMI calculations for
--kube-reserved
(as already discussed in #1490). The current logic in Karpenter is incorrect as it makes the assumption that pod count is directly related to the number of ENIs on an instance and that the kube reserved memory calculation is a function of pods rather than memory.I'd like Karpenter to use the GKE memory calculation when
AWS_ENI_LIMITED_POD_DENSITY=false
as the CPU limit calculation already comes from this source.Karpenter should probably also set the value for ephemeral storage too.
Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
We run EKS nodes using either custom networking and IPv4 IP prefixes or IPv6 IP prefixes which means that IP addresses aren't a scarcity. We follow the K8s large clusters guide so have a maximum 110 pods with custom kube reserved values based on the GKE calculations and we need Karpenter to respect these settings.
Are you currently working around this issue?
I'm hoping that Karpenter doesn't break with custom values set and if it does we'll move back to Cluster Autoscaler.
Additional context
I've been using this configuration in production ever since the AWS VPC CNI supported IP prefixes and it's solved a number of issues which we had with nodes before making this change.
Attachments
n/a
Community Note
The text was updated successfully, but these errors were encountered: