Add support for limiting max pods per node to 110 #1490

stevehipwell · 2022-03-09T17:07:11Z

Tell us about your request
I'd like Karpenter to support setting a max pods per node limit, specifically 110 based on the K8s large clusters guide.

Tell us about the problem you're trying to solve. What are you trying to do, and why is it hard?
The current EKS logic for max pods contradicts the official Kubernetes guidance and with ENI prefixes a node can theoretically support a huge number of pods. This also makes the kube reserved calculation much simpler as all nodes would use the same value.

Are you currently working around this issue?
n/a

Additional context
Anything else we should know?

Attachments
n/a

Community Note

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

stevehipwell · 2022-03-09T17:07:21Z

FYI @bwagner5.

bwagner5 · 2022-03-09T21:21:49Z

Setting --AWSENILimitedPodDensity=false (or env var AWS_ENI_LIMITED_POD_DENSITY=false) should do what you want.

https://github.com/aws/karpenter/blob/e37909b223fd7625b57b59d28681561895643f09/pkg/utils/options/options.go#L45

bwagner5 · 2022-03-09T21:23:04Z

We should definitely document that...

stevehipwell · 2022-03-09T21:35:50Z

So jumping the gun on the docs, does this switch to using 110 pods? Also jumping the gun on some other comments I made, does Karpenter uses custom kube reserved calculations?

bwagner5 · 2022-03-09T22:11:27Z

Apologies, was jumping the gun a bit :)

Yes, it does use 110 pods when setting AWS_ENI_LIMITED_POD_DENSITY to false (https://github.com/aws/karpenter/blob/e37909b223fd7625b57b59d28681561895643f09/pkg/cloudprovider/aws/launchtemplate.go#L364-L368)

You are correct that the kube-reserved calc still uses the eni limit:
https://github.com/aws/karpenter/blob/e37909b223fd7625b57b59d28681561895643f09/pkg/cloudprovider/aws/instancetype.go#L136

Let me change this issue back to track that issue and I'll create a new one for the AWS_ENI_LIMITED_POD_DENSITY docs.

stevehipwell · 2022-03-10T11:50:55Z

@bwagner5 I think you were right to rename this issue if the flag sets the max pods to 110, there should probably be a new issue for changing the overhead calculation and setting the kubelet flags with the correct overheads?

On a related topic is it documented somewhere what node overrides you provide to bootstrap.sh or the BottleRocket TOML? I've got a list of functionality that we have to currently use a LaunchTemplate for that should probably be covered by Native Karpenter config. I'd be happy to contribute towards implementing some of these if they're not already done.

bwagner5 · 2022-03-10T14:38:23Z

We would love to work towards that list of needed LT parameters! I have a PR out that hopefully can get merged today that makes adding parameters to AMI families much easier (and also supports block device mappings). We'll probably go full steam ahead on getting LT parameters within Karpenter once that is merged. #1420

suket22 · 2022-03-10T15:13:49Z

I think you were right to rename this issue if the flag sets the max pods to 110, there should probably be a new issue for changing the overhead calculation and setting the kubelet flags with the correct overheads?

I don't think we're settled on what approach we want to take, but I'd want us to try and fix this in the EKS-optimized AMI and not have Karpenter repeat the calculation and set that on the kubelet. The issue you've cut in the EKS-optimized AMI repo is what needs to be addressed first IMO. It needs extensive testing and the blast radius of making changes there is pretty large - new overheads being too large will lead to less dense nodes and higher EC2 costs, and overheads being too low (which I think is what you're running into), will lead to other node stability issues.

The BottleRocket AMI has just copied over the numbers that the EKS-Optimized AMIs have so whatever new calculations we do, can probably be ported there too.

@ellistarn and I have been discussing this a bunch, and we're honestly just looking for resourcing to come up with better values. We absolutely intend on redoing our calculation for kube-reserved cpu and mem.

stevehipwell · 2022-03-10T15:56:29Z

We would love to work towards that list of needed LT parameters!

@bwagner5 I added the high value ones to #1327 and opened #1495 for the AMI version. Is that enough?

@suket22 have you seen how old the AMI issue is? If you look around on other issues and PRs you'll see that I've been asking this question for even longer. Furthermore if you look at the source of the AL2 calculations you'll see that they copied the GKE values for some but not all of it (due to the ENI pod limit). My issue has the actual K8s docs linked and explains why this should all have been re-worked when the CNI started supporting prefixes (if not before). Finally I'm suggesting it as an optional parameter, but I can say that the current logic is actually incorrect and can result in stuck nodes; I'd expect most users to be overriding them by default.

suket22 · 2022-03-10T21:27:42Z

My issue has the actual K8s docs linked and explains why this should all have been re-worked when the CNI started supporting prefixes (if not before).

I think I called this out in the other issue, but while I agree with your overall sentiment, keep in mind the bootstrap script has no notion of what version of the CNI is running, so we can't make a change that benefits just those that has PrefixDelegation enabled. If we move to the GKE numbers for mem reserved, we will break a ton of customers (like before).

I agree with you it needs re-work, and it's been a long time, there's just been a lack of resourcing on the EKS-Optimized AMI front.

stevehipwell · 2022-03-10T21:41:23Z

@suket22 as an optional variable the consumer knows which mode to use. The Karpenter RBAC could also be extended to automate this by looking up the CNI and it's settings.

RE the reserved memory, it wouldn't break customers it would actually stop potential issues (we've hit this a number of times and always specify the GKE values manually). What it would do is lower pod density, which would be comparable to GKE but still higher than AKS with it's large eviction threshold.

I'm pretty sure the 110 number is based on the underlying OS kernel as well as capacity for single components such as kubelet and DaemonSets. I get why the ENI limits were lower than this but I can't conceive how a greater number can be justified?

bwagner5 · 2022-03-10T23:34:47Z

@bwagner5 I added the high value ones to #1327 and opened #1495 for the AMI version. Is that enough?

Yep! Thanks!

stevehipwell · 2022-05-13T12:12:27Z

@bwagner5 where are we with this, is it just the documentation missing?

bwagner5 · 2022-05-13T15:33:01Z

This is mostly a docs issue, but I'd still like to move away from a boolean flag to switch max-pods. @suket22 has a PR out (#1720) for bottlerocket config support and is working on AL2 as well (I realize you're not using either of these but it should carry over to generic bootstraps as well). We'd like to get to the point where the user is in full-control over these params and Karpenter respects them in packing decisions when passed via the KubeletConfiguration in the provisioner spec.

stevehipwell · 2022-05-13T15:42:06Z

@bwagner5 I was literally just looking at the docs around this functionality as I've been trying to see what currently missing from Karpenter that we can't live without.

jonathan-innis · 2022-08-26T19:32:13Z

Provisioner-level --max-pods can now be set with the addition of #2261

bwagner5 · 2022-08-26T22:29:11Z

@stevehipwell Let us know if there's more here, but should be good to go!

stevehipwell added the feature New feature or request label Mar 9, 2022

bwagner5 added the documentation Improvements or additions to documentation label Mar 9, 2022

bwagner5 changed the title ~~Add support for limiting max pods per node to 110~~ Document AWSENILimitedPodDensity to limit max-pods to 110 Mar 9, 2022

bwagner5 changed the title ~~Document AWSENILimitedPodDensity to limit max-pods to 110~~ Add support for limiting max pods per node to 110 Mar 9, 2022

bwagner5 removed the documentation Improvements or additions to documentation label Mar 9, 2022

felix-zhe-huang added the documentation Improvements or additions to documentation label Mar 14, 2022

stevehipwell mentioned this issue May 13, 2022

Support handling custom kube reserved values #1803

Open

bwagner5 closed this as completed Aug 26, 2022

jonathan-innis mentioned this issue Aug 29, 2022

RFC: Add instance type settings design proposal #2390

Closed

3 tasks

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add support for limiting max pods per node to 110 #1490

Add support for limiting max pods per node to 110 #1490

stevehipwell commented Mar 9, 2022

stevehipwell commented Mar 9, 2022

bwagner5 commented Mar 9, 2022

bwagner5 commented Mar 9, 2022

stevehipwell commented Mar 9, 2022

bwagner5 commented Mar 9, 2022

stevehipwell commented Mar 10, 2022

bwagner5 commented Mar 10, 2022

suket22 commented Mar 10, 2022 •

edited

Loading

stevehipwell commented Mar 10, 2022

suket22 commented Mar 10, 2022 •

edited

Loading

stevehipwell commented Mar 10, 2022

bwagner5 commented Mar 10, 2022

stevehipwell commented May 13, 2022

bwagner5 commented May 13, 2022

stevehipwell commented May 13, 2022

jonathan-innis commented Aug 26, 2022

bwagner5 commented Aug 26, 2022

Add support for limiting max pods per node to 110 #1490

Add support for limiting max pods per node to 110 #1490

Comments

stevehipwell commented Mar 9, 2022

Community Note

stevehipwell commented Mar 9, 2022

bwagner5 commented Mar 9, 2022

bwagner5 commented Mar 9, 2022

stevehipwell commented Mar 9, 2022

bwagner5 commented Mar 9, 2022

stevehipwell commented Mar 10, 2022

bwagner5 commented Mar 10, 2022

suket22 commented Mar 10, 2022 • edited Loading

stevehipwell commented Mar 10, 2022

suket22 commented Mar 10, 2022 • edited Loading

stevehipwell commented Mar 10, 2022

bwagner5 commented Mar 10, 2022

stevehipwell commented May 13, 2022

bwagner5 commented May 13, 2022

stevehipwell commented May 13, 2022

jonathan-innis commented Aug 26, 2022

bwagner5 commented Aug 26, 2022

suket22 commented Mar 10, 2022 •

edited

Loading

suket22 commented Mar 10, 2022 •

edited

Loading