fix: Ensure persistent volumes are detached before deleting node #1294

AndrewSirenko · 2024-06-04T15:43:59Z

Fixes #N/A

Description

Fixes 6+ minute delays for disrupted EBS-Backed stateful workloads when starting on their new node.

For more context see RFC for solving 6+ minute delays for disrupted stateful workloads

TLDR:

In order for a stateful pod to smoothly migrate from terminating node to new node...

Consolidation event starts
Stateful pods must terminate
EBS CSI Node pod must unmount all filesystems (NodeUnpublish & NodeUnstage RPCs)
EBS CSI Controller pod must detach all volumes from instance
Karpenter terminates EC2 Instance
Karpenter ensures Node object deleted from Kubernetes

Problems:
A. If 2 doesn't happen, today there's a 6+ minute delay in stateful pod migration because Kubernetes is afraid volume still attached and mounted to instance (6+ min delay)
B. If 3 doesn't happen, the new stateful pod can't start until consolidated instance is terminated which auto-detaches volumes (1+ min delay)

Solution:
Wait for volumeattachment objects associated with drainable pods & non-multi-attach volumes before deleting the node.

How was this change tested?

Manual: Create statefulset + nodepool. Have nodes expire every 3 minutes. Check that stateful pods migrate to new node and start running in under a minute.

Also tested that we do not block deletion when there are stateful workloads that tolerate all taints, or Node terminationGracePeriod elapsed.

Additional Notes

Note 1: Must add read permissions for volumeattachments to clusterrole-core.yaml in karpenter-provider-aws

  - apiGroups: ["storage.k8s.io"]
    resources: ["storageclasses", "csinodes", "volumeattachments"]
    verbs: ["get", "watch", "list"]

Note 2: Separate PR to add e2e tests in karpenter-provider-aws: aws/karpenter-provider-aws#6484

Note 3: It was decided in the RFC we will block node deletion for all volumeattachments, regardless of CSI Driver. In the future, we may decide to inject a list of CSI Drivers via the cloud provider instead.

Note 4: There might be some rare cases where EBS CSI Node pod can get killed before it unmounts volumes. The solution would be karpenter (or some reliable automation) tainting node with nodeshutdown:NoExecute once node is terminated, as discussed in RFC. In design meeting consensus was that this could be added later if customers run into it.

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

pkg/controllers/node/termination/controller.go

AndrewSirenko · 2024-06-04T16:12:28Z

In the future we should consider adding an e2e test to karpenter-provider-aws that tests that a stateful workload on a consolidating node is migrated and starts running on second node < 6 minutes to prevent regressions. @jmdeal mentioned that EBS CSI Driver is already installed on e2e environment. Thoughts?

coveralls · 2024-06-04T16:29:59Z

Pull Request Test Coverage Report for Build 9370530435

Details

49 of 67 (73.13%) changed or added relevant lines in 5 files are covered.
No unchanged relevant lines lost coverage.
Overall coverage decreased (-0.04%) to 77.922%

Changes Missing Coverage	Covered Lines	Changed/Added Lines	%
pkg/controllers/node/termination/terminator/terminator.go	18	20	90.0%
pkg/operator/operator.go	0	3	0.0%
pkg/utils/node/node.go	7	12	58.33%
pkg/controllers/node/termination/controller.go	15	23	65.22%

Totals
Change from base Build 9356953903:	-0.04%
Covered Lines:	8319
Relevant Lines:	10676

💛 - Coveralls

pkg/controllers/node/termination/controller.go

pkg/utils/node/node.go

pkg/controllers/node/termination/controller.go

github-actions · 2024-06-20T12:01:44Z

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

jmdeal · 2024-07-02T16:17:31Z

/remove-lifecycle stale

pkg/utils/pod/scheduling.go

pkg/utils/volumeattachment/volumeattachment.go

pkg/controllers/node/termination/controller.go

jonathan-innis

/lgtm
/approve

k8s-ci-robot · 2024-08-01T22:54:42Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: AndrewSirenko, jonathan-innis

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [jonathan-innis]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

jonathan-innis · 2024-08-01T22:54:52Z

/hold Wait for E2E tests to complete in the CloudProvider repo

jonathan-innis

/lgtm

jonathan-innis · 2024-08-02T19:21:05Z

/unhold e2es are passing in the AWS provider repo

AndrewSirenko · 2024-08-02T19:31:16Z

/unhold

^^

…ernetes-sigs#1294) Co-authored-by: Jason Deal <[email protected]>

k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Jun 4, 2024

k8s-ci-robot requested review from jackfrancis and tallaxes June 4, 2024 15:44

k8s-ci-robot added the size/L Denotes a PR that changes 100-499 lines, ignoring generated files. label Jun 4, 2024

AndrewSirenko commented Jun 4, 2024

View reviewed changes

AndrewSirenko mentioned this pull request Jun 4, 2024

Volume still hang on Karpenter Node Consolidation/Termination kubernetes-sigs/aws-ebs-csi-driver#1955

Closed

jmdeal reviewed Jun 4, 2024

View reviewed changes

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jun 6, 2024

github-actions bot added the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jun 20, 2024

k8s-ci-robot removed the lifecycle/stale Denotes an issue or PR has remained open with no activity and has become stale. label Jul 2, 2024

AndrewSirenko force-pushed the waitVa branch from 1408d6c to 51ea296 Compare July 5, 2024 18:35

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 5, 2024

AndrewSirenko changed the title ~~[WIP] fix: Delay termination of node until volumeattachments are deleted.~~ fix: Ensure persistent volumes are detached before deleting node Jul 5, 2024

k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2024

AndrewSirenko changed the title ~~fix: Ensure persistent volumes are detached before deleting node~~ [WIP] fix: Ensure persistent volumes are detached before deleting node Jul 5, 2024

k8s-ci-robot added the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Jul 5, 2024

AndrewSirenko force-pushed the waitVa branch 2 times, most recently from a8804ee to 497eb96 Compare July 5, 2024 19:54

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 5, 2024

AndrewSirenko force-pushed the waitVa branch from 497eb96 to 1902bc0 Compare July 5, 2024 19:59

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 5, 2024

AndrewSirenko force-pushed the waitVa branch 4 times, most recently from 9ba6e3f to 9087fc7 Compare July 5, 2024 21:51

AndrewSirenko force-pushed the waitVa branch 2 times, most recently from 2e30250 to f33653a Compare July 26, 2024 20:30

jonathan-innis reviewed Jul 30, 2024

View reviewed changes

pkg/utils/pod/scheduling.go Outdated Show resolved Hide resolved

jonathan-innis reviewed Jul 30, 2024

View reviewed changes

AndrewSirenko force-pushed the waitVa branch from f33653a to 56aad01 Compare July 30, 2024 23:58

k8s-ci-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 30, 2024

AndrewSirenko force-pushed the waitVa branch from 56aad01 to c52bbf8 Compare July 31, 2024 00:02

k8s-ci-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Jul 31, 2024

jonathan-innis approved these changes Aug 1, 2024

View reviewed changes

k8s-ci-robot assigned jonathan-innis Aug 1, 2024

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 1, 2024

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Aug 1, 2024

k8s-ci-robot added the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 1, 2024

AndrewSirenko and others added 3 commits August 2, 2024 09:04

fix: Ensure volumes are detached before deleting node

56f2403

add functional tests

e29a9ff

fixup! add functional tests

2999c22

jonathan-innis force-pushed the waitVa branch from e8c5fe7 to 2999c22 Compare August 2, 2024 16:04

k8s-ci-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024

jonathan-innis reviewed Aug 2, 2024

View reviewed changes

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Aug 2, 2024

k8s-ci-robot removed the do-not-merge/hold Indicates that a PR should not merge because someone has issued a /hold command. label Aug 2, 2024

k8s-ci-robot merged commit 71ffe05 into kubernetes-sigs:main Aug 2, 2024
11 checks passed

jmdeal mentioned this pull request Aug 15, 2024

PersistentVolumes stuck after node consolidation / termination #944

Closed

AndrewSirenko mentioned this pull request Aug 26, 2024

Update faq.md with Karpenter best practices kubernetes-sigs/aws-ebs-csi-driver#2131

Merged

BEvgeniyS pushed a commit to BEvgeniyS/karpenter that referenced this pull request Sep 16, 2024

fix: Ensure persistent volumes are detached before deleting node (kub…

37ca1c2

…ernetes-sigs#1294) Co-authored-by: Jason Deal <[email protected]>

willthames mentioned this pull request Sep 23, 2024

Drift replacement stuck due to "Cannot disrupt NodeClaim" #1684

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: Ensure persistent volumes are detached before deleting node #1294

fix: Ensure persistent volumes are detached before deleting node #1294

AndrewSirenko commented Jun 4, 2024 •

edited

Loading

AndrewSirenko commented Jun 4, 2024 •

edited

Loading

coveralls commented Jun 4, 2024 •

edited

Loading

github-actions bot commented Jun 20, 2024

jmdeal commented Jul 2, 2024

jonathan-innis left a comment

k8s-ci-robot commented Aug 1, 2024

jonathan-innis commented Aug 1, 2024

jonathan-innis left a comment

jonathan-innis commented Aug 2, 2024

AndrewSirenko commented Aug 2, 2024

fix: Ensure persistent volumes are detached before deleting node #1294

fix: Ensure persistent volumes are detached before deleting node #1294

Conversation

AndrewSirenko commented Jun 4, 2024 • edited Loading

AndrewSirenko commented Jun 4, 2024 • edited Loading

coveralls commented Jun 4, 2024 • edited Loading

Pull Request Test Coverage Report for Build 9370530435

Details

💛 - Coveralls

github-actions bot commented Jun 20, 2024

jmdeal commented Jul 2, 2024

jonathan-innis left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Aug 1, 2024

jonathan-innis commented Aug 1, 2024

jonathan-innis left a comment

Choose a reason for hiding this comment

jonathan-innis commented Aug 2, 2024

AndrewSirenko commented Aug 2, 2024

AndrewSirenko commented Jun 4, 2024 •

edited

Loading

AndrewSirenko commented Jun 4, 2024 •

edited

Loading

coveralls commented Jun 4, 2024 •

edited

Loading