Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Karpenter nodes aren't getting proper max-pods set with prefix-delegation on #6102

Closed
taer opened this issue Apr 26, 2024 · 7 comments
Closed
Labels
bug Something isn't working

Comments

@taer
Copy link

taer commented Apr 26, 2024

Description

This is similar to this thread #2273

I have an EKS cluster with Karpenter. I currently have the kubelet.maxPods hardcoded.

       kubelet:
         maxPods: 100

I'm trying to remove that since I'm 100% positive that one day it will bite me. :)

I launched a load and Karpenter spun up a c7g.medium. The calculator says

$ ./max-pods-calculator.sh --instance-type c7g.medium --cni-version 1.18.0-eksbuild.1 --cni-prefix-delegation-enabled 
98

My node describe shows

Capacity:
  cpu:                1
  ephemeral-storage:  20414Mi
  hugepages-1Gi:      0
  hugepages-2Mi:      0
  hugepages-32Mi:     0
  hugepages-64Ki:     0
  memory:             1873556Ki
  pods:               8

8 instead of 98. The aws-node daemon has the right env var set

│       ENABLE_PREFIX_DELEGATION:               true                                                                                                          │

The managed-node-group running on the box is a m6g.medium and the node describe shows 98 max pods, as the calculator suggests it should. The managed node and the karpenter node are running the exact same AMI.

I am NOT using custom networking. So the reserved ENI thing isn't relevant. If i have the maxPods in the spec, it works, but I'd rather have the maxPods dynamic based on the hardware the karpenter provisioned.

the vpc-cni module is installed via the terraform-eks module with config like this

    vpc-cni = {
      # Specify the VPC CNI addon should be deployed before compute to ensure
      # the addon is configured before data plane compute resources are created
      # See README for further details
      before_compute = true
      most_recent    = true # To ensure access to the latest settings provided
      configuration_values = jsonencode({
        env = {
          # Reference docs https://docs.aws.amazon.com/eks/latest/userguide/cni-increase-ip-addresses.html
          ENABLE_PREFIX_DELEGATION = "true"
          WARM_PREFIX_TARGET       = "1"
        }
      })
    }

That works for the managed nodes, so I'm not sure why it's not applying to my karpenter one.
Nothing of note in the events.

Events:
  Type     Reason                   Age                From                   Message
  ----     ------                   ----               ----                   -------
  Normal   Starting                 13m                kube-proxy             
  Normal   Starting                 13m                kubelet                Starting kubelet.
  Warning  InvalidDiskCapacity      13m                kubelet                invalid capacity 0 on image filesystem
  Normal   NodeHasSufficientMemory  13m (x2 over 13m)  kubelet                Node ip-172-22-179-43.ec2.internal status is now: NodeHasSufficientMemory
  Normal   NodeHasNoDiskPressure    13m (x2 over 13m)  kubelet                Node ip-172-22-179-43.ec2.internal status is now: NodeHasNoDiskPressure
  Normal   NodeHasSufficientPID     13m (x2 over 13m)  kubelet                Node ip-172-22-179-43.ec2.internal status is now: NodeHasSufficientPID
  Normal   Synced                   13m                cloud-node-controller  Node synced successfully
  Normal   RegisteredNode           13m                node-controller        Node ip-172-22-179-43.ec2.internal event: Registered Node ip-172-22-179-43.ec2.internal in Controller
  Normal   NodeAllocatableEnforced  13m                kubelet                Updated Node Allocatable limit across pods
  Normal   NodeReady                13m                kubelet                Node ip-172-22-179-43.ec2.internal status is now: NodeReady
  Normal   DisruptionBlocked        12m                karpenter              Cannot disrupt Node: Nominated for a pending pod
  Normal   Unconsolidatable         7m57s              karpenter              SpotToSpotConsolidation is disabled, can't replace a spot node with a spot node

Any suggestions and/or more data needed?

Thanks!

@taer taer added bug Something isn't working needs-triage Issues that need to be triaged labels Apr 26, 2024
@taer
Copy link
Author

taer commented Apr 26, 2024

It's a bottlerocket if that helps. Learned more about that, and found the kubelet config.

# cat config
---
kind: KubeletConfiguration
apiVersion: kubelet.config.k8s.io/v1beta1
address: 0.0.0.0
authentication:
  anonymous:
    enabled: false
  webhook:
    cacheTTL: 2m0s
    enabled: true
  x509:
    clientCAFile: "/etc/kubernetes/pki/ca.crt"
authorization:
  mode: Webhook
  webhook:
    cacheAuthorizedTTL: 5m0s
    cacheUnauthorizedTTL: 30s
clusterDomain: cluster.local
clusterDNS:
- 10.100.0.10
kubeReserved:
  cpu: "60m"
  memory: "343Mi"
  ephemeral-storage: "1Gi"
kubeReservedCgroup: "/runtime"
cpuCFSQuota: true
cpuManagerPolicy: none
podPidsLimit: 1048576
providerID: aws:///us-east-1b/i-022b42fadeaac6565
resolvConf: "/run/netdog/resolv.conf"
hairpinMode: hairpin-veth
readOnlyPort: 0
cgroupDriver: systemd
cgroupRoot: "/"
runtimeRequestTimeout: 15m
protectKernelDefaults: true
serializeImagePulls: false
seccompDefault: false
serverTLSBootstrap: true
tlsCipherSuites:
- TLS_ECDHE_ECDSA_WITH_AES_128_GCM_SHA256
volumePluginDir: "/var/lib/kubelet/plugins/volume/exec"
maxPods: 8
staticPodPath: "/etc/kubernetes/static-pods/"

there's the 8.. I'll keep digging, but bottlerocket makes that so fun,. :)

@taer
Copy link
Author

taer commented Apr 26, 2024

Is it releated to this perhaps? bottlerocket-os/bottlerocket#1721

@engedaam
Copy link
Contributor

Can you share your EC2NodeClass? Do you have userData defined?

@taer
Copy link
Author

taer commented May 1, 2024

No userData defined

apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  annotations:
    argocd.argoproj.io/tracking-id: >-
      addon-k8s-wl-dev-use1-default-karpenter:karpenter.k8s.aws/EC2NodeClass:kube-system/default
    karpenter.k8s.aws/ec2nodeclass-hash: '4041356734480352423'
  creationTimestamp: '2024-03-11T22:34:57Z'
  finalizers:
    - karpenter.k8s.aws/termination
  generation: 4
  name: default
  resourceVersion: '27363788'
  uid: d3229e0d-2367-42d6-be9e-a7a403e41c81
spec:
  amiFamily: Bottlerocket
  metadataOptions:
    httpEndpoint: enabled
    httpProtocolIPv6: disabled
    httpPutResponseHopLimit: 2
    httpTokens: required
  role: Karpenter-k8s-wl-dev-use1-default-20240311223008495900000014
  securityGroupSelectorTerms:
    - tags:
        karpenter.sh/discovery: k8s-wl-dev-use1-default
  subnetSelectorTerms:
    - tags:
        private: '1'
  tags:
    Name: karpenter-node-k8s-wl-dev-use1-default
    created-by: karpneter
    env: dev
    karpenter.sh/discovery: k8s-wl-dev-use1-default
status:
  amis:
    - id: ami-0d31d8d1285f91827
      name: bottlerocket-aws-k8s-1.29-nvidia-x86_64-v1.19.4-4f0a078e
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: karpenter.k8s.aws/instance-gpu-count
          operator: Exists
    - id: ami-0d31d8d1285f91827
      name: bottlerocket-aws-k8s-1.29-nvidia-x86_64-v1.19.4-4f0a078e
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: karpenter.k8s.aws/instance-accelerator-count
          operator: Exists
    - id: ami-09a453d0790846390
      name: bottlerocket-aws-k8s-1.29-nvidia-aarch64-v1.19.4-4f0a078e
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-gpu-count
          operator: Exists
    - id: ami-09a453d0790846390
      name: bottlerocket-aws-k8s-1.29-nvidia-aarch64-v1.19.4-4f0a078e
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-accelerator-count
          operator: Exists
    - id: ami-0c6262d1506cc714c
      name: bottlerocket-aws-k8s-1.29-aarch64-v1.19.4-4f0a078e
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-gpu-count
          operator: DoesNotExist
        - key: karpenter.k8s.aws/instance-accelerator-count
          operator: DoesNotExist
    - id: ami-09c57287987b90676
      name: bottlerocket-aws-k8s-1.29-x86_64-v1.19.4-4f0a078e
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - amd64
        - key: karpenter.k8s.aws/instance-gpu-count
          operator: DoesNotExist
        - key: karpenter.k8s.aws/instance-accelerator-count
          operator: DoesNotExist
  instanceProfile: k8s-wl-dev-use1-default_15843455441266977890
  securityGroups:
    - id: sg-06df60c1a6afb34ca
      name: k8s-wl-dev-use1-default-node-20240311221953005500000003
  subnets:
    - id: subnet-XXX
      zone: us-east-1b
    - id: subnet-XXX
      zone: us-east-1d
    - id: subnet-XXX
      zone: us-east-1a

@taer
Copy link
Author

taer commented May 1, 2024

Part of my node pool

piVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  annotations:
    argocd.argoproj.io/tracking-id: >-
      addon-k8s-wl-dev-use1-default-karpenter:karpenter.sh/NodePool:kube-system/default
    karpenter.sh/nodepool-hash: '15409987441336338835'
  creationTimestamp: '2024-03-11T22:34:57Z'
  generation: 3
  name: default
  resourceVersion: '27361396'
  uid: 80076e6e-cb1b-43f7-b4a9-e8b371cab394
spec:
  disruption:
    budgets:
      - nodes: 10%
    consolidationPolicy: WhenUnderutilized
    expireAfter: 720h
  limits:
    memory: 1000Gi
  template:
    metadata:
      labels:
        created-by: karpenter
    spec:
      kubelet:
        maxPods: 100

That last bit is what I'm trying to remove in favor of it becoming dynamic. I think it was related to that bottlerocket link I posted above.

@engedaam
Copy link
Contributor

engedaam commented May 6, 2024

Yes, after reading the issue, It would seem that this issue is related to max-pods not being set properly by CNI and Bottlerocket OS bottlerocket-os/bottlerocket#1721

@engedaam engedaam removed the needs-triage Issues that need to be triaged label May 6, 2024
@engedaam
Copy link
Contributor

Closing in favor of tracking the upstream issue.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants