Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels #1106

nantiferov · 2024-03-15T17:31:48Z

Description

Observed Behavior:

Apparently something happened from v0.32, which causes both old Provisioners and new NodePools not able to consolidate initially launched instances with smaller/cheaper ones. I tried update to latest v0.35.2 with same results.

Quick search around issues in this repo and https://github.com/aws/karpenter-provider-aws didn't help to find anything similar.

My use case is that I need EBS volume be relative to total memory on EC2 instance. Since it's impossible, I create couple of EC2NodeClass/NodePool with different volumeSize in blockDeviceMappings.

Expected Behavior:

Before v0.32, karpenter was consolidating launched nodes with smaller ones according to nodes utilisation. Right now it seems that both old Provisioners and new NodePools are not consolidated.

In karpenter logs I have this {"level":"ERROR","time":"2024-03-15T16:35:54.861Z","logger":"controller.nodeclaim.consistency","message":"check failed, expected 20Gi of resource ephemeral-storage, but found 12561388Ki (59.9% of expected)","commit":"8b2d1d7","nodeclaim":"eng-8gb-7v6zh"} and in node describe this:

Events:
  Type    Reason            Age                    From       Message
  ----    ------            ----                   ----       -------
  Normal  Unconsolidatable  42m (x223 over 3d11h)  karpenter  Can't replace with a cheaper node

Reproduction Steps (Please include YAML):

Yaml are based on current setup and slightly simplified (removed tags, keep only 2 )

EC2NodeClass

---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: eng-12gb # EBS volume size
spec:
  amiFamily: AL2
  role: Karpenter-clustername

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 12Gi # add 4Gb to instance memory for host OS
        volumeType: gp3

  securityGroupSelectorTerms:
    - tags:
        Name: clustername-node

  subnetSelectorTerms:
    - tags:
        Name: SomeSG-a
    - tags:
        Name: SomeSG-b
    - tags:
        Name: SomeSG-c

---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: eng-20gb # EBS volume size
spec:
  amiFamily: AL2
  role: Karpenter-clustername

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 20Gi # add 4Gb to instance memory for host OS
        volumeType: gp3

  securityGroupSelectorTerms:
    - tags:
        Name: clustername-node

  subnetSelectorTerms:
    - tags:
        Name: SomeSG-a
    - tags:
        Name: SomeSG-b
    - tags:
        Name: SomeSG-c

NodePool

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: eng-8gb # EC2 total memory
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never

  limits:
    cpu: "50"
    memory: 100Gi

  template:
    spec:
      metadata:
        labels:
          some: label
      nodeClassRef:
        name: eng-12gb # metadata.name from EC2NodeClass
      kubelet:
        systemReserved:
          memory: 300Mi
      taints:
        - key: something
          value: special
          effect: NoSchedule
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - t4g
        - key: karpenter.k8s.aws/instance-memory
          operator: In
          values:
            - "8192"
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - eu-central-1a
            - eu-central-1b
            - eu-central-1c
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: eng-16gb # EC2 total memory
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never

  limits:
    cpu: "50"
    memory: 100Gi

  template:
    spec:
      metadata:
        labels:
          some: label
      nodeClassRef:
        name: eng-20gb # metadata.name from EC2NodeClass
      kubelet:
        systemReserved:
          memory: 1Gi
      taints:
        - key: something
          value: special
          effect: NoSchedule
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - t4g
        - key: karpenter.k8s.aws/instance-memory
          operator: In
          values:
            - "16384"
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - eu-central-1a
            - eu-central-1b
            - eu-central-1c
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand

And then some deployment or couple of them to schedule on these nodes:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: some-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: some
                    operator: In
                    values:
                      - label
      tolerations:
        - key: something
          value: special
          effect: NoSchedule
      containers:
        - name: nginx
          image: nginx:stable
          resources:
            limits:
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 1Gi

So before 0.32 karpenter was launching slightly bigger nodes in the beginning and then after ~30 min was optimising them. Right now it's launching suboptimal nodes and keeps them with aforementioned errors in logs and messages in events.

Versions:

Chart Version: karpenter-v0.32.7
Kubernetes Version (kubectl version): v1.28.5-eks-5e0fdde

Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
If you are interested in working on this issue or have submitted a pull request, please leave a comment

The text was updated successfully, but these errors were encountered:

engedaam · 2024-03-18T22:05:59Z

@nantiferov The team has done a patch release to fix this issue in v0.32.8 aws/karpenter-provider-aws#5816. Can you try upgrading to v0.32.8 and see if this issue is fixed for you?

nantiferov · 2024-03-19T07:57:39Z

Thank you @engedaam for update.

Right now one of the clusters is updated to 0.35.2, which has this fix included and it still has errors like this in logs {"level":"ERROR","time":"2024-03-19T05:55:57.667Z","logger":"controller.nodeclaim.consistency","message":"check failed, expected 20Gi of resource ephemeral-storage, but found 12561388Ki (59.9% of expected)","commit":"8b2d1d7","nodeclaim":"eng-8gb-9v2sj"}

I will try to update another cluster (which is now on 0.32.8) to v0.32.8 to see if there are differences with consolidation.

engedaam · 2024-03-19T15:54:46Z

Can you share the NodeClaim that is created by karpenter with you are seeing this error?

nantiferov · 2024-03-19T17:15:41Z

Added.

It probably was created by old version of Karpenter. Is it the reason of these errors?

apiVersion: karpenter.sh/v1beta1
kind: NodeClaim
metadata:
  annotations:
    karpenter.k8s.aws/ec2nodeclass-hash: "5873637210801609490"
    karpenter.k8s.aws/tagged: "true"
    karpenter.sh/managed-by: karpenter-eu-central-1
    karpenter.sh/nodepool-hash: "9384252991043097688"
  creationTimestamp: "2024-03-12T12:57:23Z"
  finalizers:
  - karpenter.sh/termination
  generateName: eng-8gb-
  generation: 1
  labels:
    some: label
    karpenter.k8s.aws/instance-category: t
    karpenter.k8s.aws/instance-cpu: "2"
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "false"
    karpenter.k8s.aws/instance-family: t4g
    karpenter.k8s.aws/instance-generation: "4"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "8192"
    karpenter.k8s.aws/instance-network-bandwidth: "512"
    karpenter.k8s.aws/instance-size: large
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/nodepool: eng-8gb
    kubernetes.io/arch: arm64
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: t4g.large
    topology.kubernetes.io/region: eu-central-1
    topology.kubernetes.io/zone: eu-central-1c
  name: eng-8gb-9v2sj
  ownerReferences:
  - apiVersion: karpenter.sh/v1beta1
    blockOwnerDeletion: true
    kind: NodePool
    name: eng-8gb
    uid: d742e836-1419-4c05-b7e6-7fab470ffac9
  resourceVersion: "273778814"
  uid: 92d71c0d-b266-46a9-9926-1d8df1accfac
spec:
  kubelet:
    systemReserved:
      memory: 300Mi
  nodeClassRef:
    name: eng-12gb
  requirements:
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values:
    - t4g
  - key: kubernetes.io/arch
    operator: In
    values:
    - arm64
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - eu-central-1b
    - eu-central-1c
  - key: karpenter.k8s.aws/instance-memory
    operator: In
    values:
    - "8192"
  - key: something
    operator: In
    values:
    - special
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - eng-8gb
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - t4g.large
  resources:
    requests:
      cpu: 305m
      ephemeral-storage: 128Mi
      memory: 428Mi
      pods: "6"
  taints:
  - effect: NoSchedule
    key: something
    value: special
status:
  allocatable:
    cpu: 1930m
    ephemeral-storage: 17Gi
    memory: 5754Mi
    pods: "35"
  capacity:
    cpu: "2"
    ephemeral-storage: 20Gi
    memory: 7518Mi
    pods: "35"
  conditions:
  - lastTransitionTime: "2024-03-12T12:58:10Z"
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-03-12T12:57:26Z"
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-03-12T12:58:10Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-03-12T12:57:56Z"
    status: "True"
    type: Registered
  imageID: ami-0df7e537324849d34
  nodeName: ip-xxx-xxx-xxx-xxx.eu-central-1.compute.internal
  providerID: aws:///eu-central-1c/i-instance-id

engedaam · 2024-03-19T21:31:41Z

Yes, that would seem so. Can you roll the node and see if that fixes your issue?

nantiferov · 2024-03-20T08:39:34Z

Thanks, I will re-rollout them and check for errors and consolidation.

nantiferov · 2024-03-21T14:18:32Z

Can confirm that after re-provisioning nodes from NodePool with new version of Karpenter related errors disappeared from logs. Consolidation also looks like working better, will observe it in next weeks.

nantiferov added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 15, 2024

nantiferov changed the title ~~Non-working consolidation starting from v0.32 with multiple NodePools with same taints~~ Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels Mar 15, 2024

nantiferov closed this as completed Mar 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels #1106

Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels #1106

nantiferov commented Mar 15, 2024 •

edited

Loading

engedaam commented Mar 18, 2024 •

edited

Loading

nantiferov commented Mar 19, 2024

engedaam commented Mar 19, 2024

nantiferov commented Mar 19, 2024

engedaam commented Mar 19, 2024

nantiferov commented Mar 20, 2024

nantiferov commented Mar 21, 2024

Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels #1106

Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels #1106

Comments

nantiferov commented Mar 15, 2024 • edited Loading

Description

engedaam commented Mar 18, 2024 • edited Loading

nantiferov commented Mar 19, 2024

engedaam commented Mar 19, 2024

nantiferov commented Mar 19, 2024

engedaam commented Mar 19, 2024

nantiferov commented Mar 20, 2024

nantiferov commented Mar 21, 2024

nantiferov commented Mar 15, 2024 •

edited

Loading

engedaam commented Mar 18, 2024 •

edited

Loading