Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: Ensure persistent volumes are detached before deleting node #1294

Merged
merged 3 commits into from
Aug 2, 2024

Conversation

AndrewSirenko
Copy link
Contributor

@AndrewSirenko AndrewSirenko commented Jun 4, 2024

Fixes #N/A

Description

Fixes 6+ minute delays for disrupted EBS-Backed stateful workloads when starting on their new node.

For more context see RFC for solving 6+ minute delays for disrupted stateful workloads

TLDR:

In order for a stateful pod to smoothly migrate from terminating node to new node...

  1. Consolidation event starts
  2. Stateful pods must terminate
  3. EBS CSI Node pod must unmount all filesystems (NodeUnpublish & NodeUnstage RPCs)
  4. EBS CSI Controller pod must detach all volumes from instance
  5. Karpenter terminates EC2 Instance
  6. Karpenter ensures Node object deleted from Kubernetes

Problems:
A. If 2 doesn't happen, today there's a 6+ minute delay in stateful pod migration because Kubernetes is afraid volume still attached and mounted to instance (6+ min delay)
B. If 3 doesn't happen, the new stateful pod can't start until consolidated instance is terminated which auto-detaches volumes (1+ min delay)

Solution:
Wait for volumeattachment objects associated with drainable pods & non-multi-attach volumes before deleting the node.

How was this change tested?

Manual: Create statefulset + nodepool. Have nodes expire every 3 minutes. Check that stateful pods migrate to new node and start running in under a minute.

Also tested that we do not block deletion when there are stateful workloads that tolerate all taints, or Node terminationGracePeriod elapsed.

Additional Notes

Note 1: Must add read permissions for volumeattachments to clusterrole-core.yaml in karpenter-provider-aws

  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses", "csinodes", "volumeattachments"]
    verbs: ["get", "watch", "list"]

Note 2: Separate PR to add e2e tests in karpenter-provider-aws: aws/karpenter-provider-aws#6484

Note 3: It was decided in the RFC we will block node deletion for all volumeattachments, regardless of CSI Driver. In the future, we may decide to inject a list of CSI Drivers via the cloud provider instead.

Note 4: There might be some rare cases where EBS CSI Node pod can get killed before it unmounts volumes. The solution would be karpenter (or some reliable automation) tainting node with nodeshutdown:NoExecute once node is terminated, as discussed in RFC. In design meeting consensus was that this could be added later if customers run into it.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 4, 2024
@k8s-ci-robot k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 4, 2024
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
@AndrewSirenko
Copy link
Contributor Author

AndrewSirenko commented Jun 4, 2024

In the future we should consider adding an e2e test to karpenter-provider-aws that tests that a stateful workload on a consolidating node is migrated and starts running on second node < 6 minutes to prevent regressions. @jmdeal mentioned that EBS CSI Driver is already installed on e2e environment. Thoughts?

@coveralls
Copy link

coveralls commented Jun 4, 2024

Pull Request Test Coverage Report for Build 9370530435

Details

  • 49 of 67 (73.13%) changed or added relevant lines in 5 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage decreased (-0.04%) to 77.922%

Changes Missing Coverage Covered Lines Changed/Added Lines %
pkg/controllers/node/termination/terminator/terminator.go 18 20 90.0%
pkg/operator/operator.go 0 3 0.0%
pkg/utils/node/node.go 7 12 58.33%
pkg/controllers/node/termination/controller.go 15 23 65.22%
Totals Coverage Status
Change from base Build 9356953903: -0.04%
Covered Lines: 8319
Relevant Lines: 10676

💛 - Coveralls

pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
pkg/utils/node/node.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2024
Copy link

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

@github-actions github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2024
@jmdeal
Copy link
Member

jmdeal commented Jul 2, 2024

/remove-lifecycle stale

@k8s-ci-robot k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 5, 2024
@AndrewSirenko AndrewSirenko changed the title [WIP] fix: Delay termination of node until volumeattachments are deleted. fix: Ensure persistent volumes are detached before deleting node Jul 5, 2024
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2024
@AndrewSirenko AndrewSirenko changed the title fix: Ensure persistent volumes are detached before deleting node [WIP] fix: Ensure persistent volumes are detached before deleting node Jul 5, 2024
@k8s-ci-robot k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2024
@AndrewSirenko AndrewSirenko force-pushed the waitVa branch 2 times, most recently from a8804ee to 497eb96 Compare July 5, 2024 19:54
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 5, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 5, 2024
@AndrewSirenko AndrewSirenko force-pushed the waitVa branch 4 times, most recently from 9ba6e3f to 9087fc7 Compare July 5, 2024 21:51
pkg/utils/volumeattachment/volumeattachment.go Outdated Show resolved Hide resolved
pkg/utils/volumeattachment/volumeattachment.go Outdated Show resolved Hide resolved
pkg/utils/volumeattachment/volumeattachment.go Outdated Show resolved Hide resolved
pkg/controllers/node/termination/controller.go Outdated Show resolved Hide resolved
@k8s-ci-robot k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2024
@k8s-ci-robot k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2024
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm
/approve

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndrewSirenko, jonathan-innis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2024
@jonathan-innis
Copy link
Member

/hold Wait for E2E tests to complete in the CloudProvider repo

@k8s-ci-robot k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2024
@k8s-ci-robot k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024
Copy link
Member

@jonathan-innis jonathan-innis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024
@jonathan-innis
Copy link
Member

/unhold e2es are passing in the AWS provider repo

@AndrewSirenko
Copy link
Contributor Author

/unhold

^^

@k8s-ci-robot k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2024
@k8s-ci-robot k8s-ci-robot merged commit 71ffe05 into kubernetes-sigs:main Aug 2, 2024
11 checks passed
BEvgeniyS pushed a commit to BEvgeniyS/karpenter that referenced this pull request Sep 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/L Denotes a PR that changes 100-499 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants