-
Notifications
You must be signed in to change notification settings - Fork 205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix: Ensure persistent volumes are detached before deleting node #1294
Conversation
In the future we should consider adding an e2e test to |
Pull Request Test Coverage Report for Build 9370530435Details
💛 - Coveralls |
This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity. |
/remove-lifecycle stale |
a8804ee
to
497eb96
Compare
9ba6e3f
to
9087fc7
Compare
2e30250
to
f33653a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/approve
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: AndrewSirenko, jonathan-innis The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
/hold Wait for E2E tests to complete in the CloudProvider repo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/lgtm
/unhold e2es are passing in the AWS provider repo |
/unhold ^^ |
…ernetes-sigs#1294) Co-authored-by: Jason Deal <[email protected]>
Fixes #N/A
Description
Fixes 6+ minute delays for disrupted EBS-Backed stateful workloads when starting on their new node.
For more context see RFC for solving 6+ minute delays for disrupted stateful workloads
TLDR:
In order for a stateful pod to smoothly migrate from terminating node to new node...
Problems:
A. If 2 doesn't happen, today there's a 6+ minute delay in stateful pod migration because Kubernetes is afraid volume still attached and mounted to instance (6+ min delay)
B. If 3 doesn't happen, the new stateful pod can't start until consolidated instance is terminated which auto-detaches volumes (1+ min delay)
Solution:
Wait for volumeattachment objects associated with drainable pods & non-multi-attach volumes before deleting the node.
How was this change tested?
Manual: Create statefulset + nodepool. Have nodes expire every 3 minutes. Check that stateful pods migrate to new node and start running in under a minute.
Also tested that we do not block deletion when there are stateful workloads that tolerate all taints, or Node terminationGracePeriod elapsed.
Additional Notes
Note 1: Must add read permissions for volumeattachments to clusterrole-core.yaml in
karpenter-provider-aws
Note 2: Separate PR to add e2e tests in karpenter-provider-aws: aws/karpenter-provider-aws#6484
Note 3: It was decided in the RFC we will block node deletion for all volumeattachments, regardless of CSI Driver. In the future, we may decide to inject a list of CSI Drivers via the cloud provider instead.
Note 4: There might be some rare cases where EBS CSI Node pod can get killed before it unmounts volumes. The solution would be karpenter (or some reliable automation) tainting node with
nodeshutdown:NoExecute
once node is terminated, as discussed in RFC. In design meeting consensus was that this could be added later if customers run into it.By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.