-
Notifications
You must be signed in to change notification settings - Fork 960
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
test: Ensure no 6+ minute delays for disrupted stateful workloads #6484
Conversation
✅ Deploy Preview for karpenter-docs-prod canceled.
|
Pull Request Test Coverage Report for Build 10222622868Details
💛 - Coveralls |
@AndrewSirenko please also remember that not only statefulset should be handled. Also, we should handle cases where we have deployment with volume (single replica deployment) |
I can remove "statefulset" from the test title to make it more clear this test implicitly covers both the statefulset and deployment cases. Thanks! But I'm not sure if a separate test using deployment instead of statefulset would provide any additional coverage for karpenter (because the underlying mechanism for node draining and workload disruption is the same in both cases). @jmdeal let me know if you think the additional test provides additional coverage, or would just take up computing time. |
/hold Until Karpenter PR 1294 is merged |
4c5f815
to
1486e04
Compare
/hold Testing e2e tests against kubernetes-sigs/karpenter#1294 |
/karpenter snapshot |
x-ref: Testing e2e tests against kubernetes-sigs/karpenter#1294 stacked on top of #6614 here: https://github.com/aws/karpenter-provider-aws/pull/6615/commits |
dabf639
to
6f8a8d3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
0a76697
to
7f0247a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/karpenter snapshot
Snapshot successfully published to
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
7e6102a
to
66d5e8b
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM 🚀
Fixes #N/A
Description
Add tests that validate Karpenter PR 1294's solution to 6+ minute delays for disrupted stateful workloads outlined in #6336.
Add two storage E2E tests that ensure:
Also adds read RBAC permissions for
volumeattachment
objects. (Let me know if this should be a separatefix
PR)How was this change tested?
Running
FOCUS="StatefulSets" make e2etests
locally on cluster with and without modified Karpenter from Karpenter PR 1294.NOTE: The test
should run a disrupted stateful workload on a new node within 5 minutes
currently fails due to Karpenter PR 1294 not being merged. This test will validate that the 6+ minute delay is eliminated.NOTE-2: We are not testing whether the 1+ minute EBS DetachVolume & EC2 TerminateInstances race delay occurs because the smaller time frame is likely to induce flakes.
Does this change impact docs?
By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.