Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels #1106

Closed
nantiferov opened this issue Mar 15, 2024 · 7 comments
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.

Comments

@nantiferov
Copy link

nantiferov commented Mar 15, 2024

Description

Observed Behavior:

Apparently something happened from v0.32, which causes both old Provisioners and new NodePools not able to consolidate initially launched instances with smaller/cheaper ones. I tried update to latest v0.35.2 with same results.

Quick search around issues in this repo and https://github.com/aws/karpenter-provider-aws didn't help to find anything similar.

My use case is that I need EBS volume be relative to total memory on EC2 instance. Since it's impossible, I create couple of EC2NodeClass/NodePool with different volumeSize in blockDeviceMappings.

Expected Behavior:

Before v0.32, karpenter was consolidating launched nodes with smaller ones according to nodes utilisation. Right now it seems that both old Provisioners and new NodePools are not consolidated.

In karpenter logs I have this {"level":"ERROR","time":"2024-03-15T16:35:54.861Z","logger":"controller.nodeclaim.consistency","message":"check failed, expected 20Gi of resource ephemeral-storage, but found 12561388Ki (59.9% of expected)","commit":"8b2d1d7","nodeclaim":"eng-8gb-7v6zh"} and in node describe this:

Events:
  Type    Reason            Age                    From       Message
  ----    ------            ----                   ----       -------
  Normal  Unconsolidatable  42m (x223 over 3d11h)  karpenter  Can't replace with a cheaper node

Reproduction Steps (Please include YAML):

Yaml are based on current setup and slightly simplified (removed tags, keep only 2 )

EC2NodeClass

---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: eng-12gb # EBS volume size
spec:
  amiFamily: AL2
  role: Karpenter-clustername

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 12Gi # add 4Gb to instance memory for host OS
        volumeType: gp3

  securityGroupSelectorTerms:
    - tags:
        Name: clustername-node

  subnetSelectorTerms:
    - tags:
        Name: SomeSG-a
    - tags:
        Name: SomeSG-b
    - tags:
        Name: SomeSG-c
---
apiVersion: karpenter.k8s.aws/v1beta1
kind: EC2NodeClass
metadata:
  name: eng-20gb # EBS volume size
spec:
  amiFamily: AL2
  role: Karpenter-clustername

  blockDeviceMappings:
    - deviceName: /dev/xvda
      ebs:
        volumeSize: 20Gi # add 4Gb to instance memory for host OS
        volumeType: gp3

  securityGroupSelectorTerms:
    - tags:
        Name: clustername-node

  subnetSelectorTerms:
    - tags:
        Name: SomeSG-a
    - tags:
        Name: SomeSG-b
    - tags:
        Name: SomeSG-c

NodePool

---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: eng-8gb # EC2 total memory
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never

  limits:
    cpu: "50"
    memory: 100Gi

  template:
    spec:
      metadata:
        labels:
          some: label
      nodeClassRef:
        name: eng-12gb # metadata.name from EC2NodeClass
      kubelet:
        systemReserved:
          memory: 300Mi
      taints:
        - key: something
          value: special
          effect: NoSchedule
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - t4g
        - key: karpenter.k8s.aws/instance-memory
          operator: In
          values:
            - "8192"
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - eu-central-1a
            - eu-central-1b
            - eu-central-1c
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand
---
apiVersion: karpenter.sh/v1beta1
kind: NodePool
metadata:
  name: eng-16gb # EC2 total memory
spec:
  disruption:
    consolidationPolicy: WhenUnderutilized
    expireAfter: Never

  limits:
    cpu: "50"
    memory: 100Gi

  template:
    spec:
      metadata:
        labels:
          some: label
      nodeClassRef:
        name: eng-20gb # metadata.name from EC2NodeClass
      kubelet:
        systemReserved:
          memory: 1Gi
      taints:
        - key: something
          value: special
          effect: NoSchedule
      requirements:
        - key: kubernetes.io/arch
          operator: In
          values:
            - arm64
        - key: karpenter.k8s.aws/instance-family
          operator: In
          values:
            - t4g
        - key: karpenter.k8s.aws/instance-memory
          operator: In
          values:
            - "16384"
        - key: topology.kubernetes.io/zone
          operator: In
          values:
            - eu-central-1a
            - eu-central-1b
            - eu-central-1c
        - key: kubernetes.io/os
          operator: In
          values:
            - linux
        - key: karpenter.sh/capacity-type
          operator: In
          values:
            - on-demand

And then some deployment or couple of them to schedule on these nodes:

---
apiVersion: apps/v1
kind: Deployment
metadata:
  name: some-deployment
spec:
  replicas: 2
  selector:
    matchLabels:
      app: nginx
  template:
    metadata:
      labels:
        app: nginx
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
              - matchExpressions:
                  - key: some
                    operator: In
                    values:
                      - label
      tolerations:
        - key: something
          value: special
          effect: NoSchedule
      containers:
        - name: nginx
          image: nginx:stable
          resources:
            limits:
              memory: 1Gi
            requests:
              cpu: 100m
              memory: 1Gi

So before 0.32 karpenter was launching slightly bigger nodes in the beginning and then after ~30 min was optimising them. Right now it's launching suboptimal nodes and keeps them with aforementioned errors in logs and messages in events.

Versions:

  • Chart Version: karpenter-v0.32.7
  • Kubernetes Version (kubectl version): v1.28.5-eks-5e0fdde
  • Please vote on this issue by adding a 👍 reaction to the original issue to help the community and maintainers prioritize this request
  • Please do not leave "+1" or "me too" comments, they generate extra noise for issue followers and do not help prioritize the request
  • If you are interested in working on this issue or have submitted a pull request, please leave a comment
@nantiferov nantiferov added kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one. labels Mar 15, 2024
@nantiferov nantiferov changed the title Non-working consolidation starting from v0.32 with multiple NodePools with same taints Non-working consolidation starting from v0.32 with multiple NodePools with metadata labels Mar 15, 2024
@engedaam
Copy link
Contributor

engedaam commented Mar 18, 2024

@nantiferov The team has done a patch release to fix this issue in v0.32.8 aws/karpenter-provider-aws#5816. Can you try upgrading to v0.32.8 and see if this issue is fixed for you?

@nantiferov
Copy link
Author

Thank you @engedaam for update.

Right now one of the clusters is updated to 0.35.2, which has this fix included and it still has errors like this in logs {"level":"ERROR","time":"2024-03-19T05:55:57.667Z","logger":"controller.nodeclaim.consistency","message":"check failed, expected 20Gi of resource ephemeral-storage, but found 12561388Ki (59.9% of expected)","commit":"8b2d1d7","nodeclaim":"eng-8gb-9v2sj"}

I will try to update another cluster (which is now on 0.32.8) to v0.32.8 to see if there are differences with consolidation.

@engedaam
Copy link
Contributor

Can you share the NodeClaim that is created by karpenter with you are seeing this error?

@nantiferov
Copy link
Author

Added.

It probably was created by old version of Karpenter. Is it the reason of these errors?

apiVersion: karpenter.sh/v1beta1
kind: NodeClaim
metadata:
  annotations:
    karpenter.k8s.aws/ec2nodeclass-hash: "5873637210801609490"
    karpenter.k8s.aws/tagged: "true"
    karpenter.sh/managed-by: karpenter-eu-central-1
    karpenter.sh/nodepool-hash: "9384252991043097688"
  creationTimestamp: "2024-03-12T12:57:23Z"
  finalizers:
  - karpenter.sh/termination
  generateName: eng-8gb-
  generation: 1
  labels:
    some: label
    karpenter.k8s.aws/instance-category: t
    karpenter.k8s.aws/instance-cpu: "2"
    karpenter.k8s.aws/instance-encryption-in-transit-supported: "false"
    karpenter.k8s.aws/instance-family: t4g
    karpenter.k8s.aws/instance-generation: "4"
    karpenter.k8s.aws/instance-hypervisor: nitro
    karpenter.k8s.aws/instance-memory: "8192"
    karpenter.k8s.aws/instance-network-bandwidth: "512"
    karpenter.k8s.aws/instance-size: large
    karpenter.sh/capacity-type: on-demand
    karpenter.sh/nodepool: eng-8gb
    kubernetes.io/arch: arm64
    kubernetes.io/os: linux
    node.kubernetes.io/instance-type: t4g.large
    topology.kubernetes.io/region: eu-central-1
    topology.kubernetes.io/zone: eu-central-1c
  name: eng-8gb-9v2sj
  ownerReferences:
  - apiVersion: karpenter.sh/v1beta1
    blockOwnerDeletion: true
    kind: NodePool
    name: eng-8gb
    uid: d742e836-1419-4c05-b7e6-7fab470ffac9
  resourceVersion: "273778814"
  uid: 92d71c0d-b266-46a9-9926-1d8df1accfac
spec:
  kubelet:
    systemReserved:
      memory: 300Mi
  nodeClassRef:
    name: eng-12gb
  requirements:
  - key: kubernetes.io/os
    operator: In
    values:
    - linux
  - key: karpenter.k8s.aws/instance-family
    operator: In
    values:
    - t4g
  - key: kubernetes.io/arch
    operator: In
    values:
    - arm64
  - key: topology.kubernetes.io/zone
    operator: In
    values:
    - eu-central-1b
    - eu-central-1c
  - key: karpenter.k8s.aws/instance-memory
    operator: In
    values:
    - "8192"
  - key: something
    operator: In
    values:
    - special
  - key: karpenter.sh/nodepool
    operator: In
    values:
    - eng-8gb
  - key: karpenter.sh/capacity-type
    operator: In
    values:
    - on-demand
  - key: node.kubernetes.io/instance-type
    operator: In
    values:
    - t4g.large
  resources:
    requests:
      cpu: 305m
      ephemeral-storage: 128Mi
      memory: 428Mi
      pods: "6"
  taints:
  - effect: NoSchedule
    key: something
    value: special
status:
  allocatable:
    cpu: 1930m
    ephemeral-storage: 17Gi
    memory: 5754Mi
    pods: "35"
  capacity:
    cpu: "2"
    ephemeral-storage: 20Gi
    memory: 7518Mi
    pods: "35"
  conditions:
  - lastTransitionTime: "2024-03-12T12:58:10Z"
    status: "True"
    type: Initialized
  - lastTransitionTime: "2024-03-12T12:57:26Z"
    status: "True"
    type: Launched
  - lastTransitionTime: "2024-03-12T12:58:10Z"
    status: "True"
    type: Ready
  - lastTransitionTime: "2024-03-12T12:57:56Z"
    status: "True"
    type: Registered
  imageID: ami-0df7e537324849d34
  nodeName: ip-xxx-xxx-xxx-xxx.eu-central-1.compute.internal
  providerID: aws:///eu-central-1c/i-instance-id

@engedaam
Copy link
Contributor

Yes, that would seem so. Can you roll the node and see if that fixes your issue?

@nantiferov
Copy link
Author

Thanks, I will re-rollout them and check for errors and consolidation.

@nantiferov
Copy link
Author

Can confirm that after re-provisioning nodes from NodePool with new version of Karpenter related errors disappeared from logs. Consolidation also looks like working better, will observe it in next weeks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/bug Categorizes issue or PR as related to a bug. needs-triage Indicates an issue or PR lacks a `triage/foo` label and requires one.
Projects
None yet
Development

No branches or pull requests

2 participants