Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Automatically terminate EC2 instances when the Kubelet stops reporting #67

Open
jferris opened this issue Nov 16, 2021 · 1 comment
Open
Labels
aws enhancement New feature or request

Comments

@jferris
Copy link
Contributor

jferris commented Nov 16, 2021

We have seen several instances where:

  • All pods on a node are marked as "Terminating"
  • The underlying node reports that the Kubelet has stopped reporting
  • EC2 healthchecks for the node are passing

We would like to deploy a mechanism to automatically detect and terminate these instances rather than doing it manually.

https://mission-control.thoughtbot.com/branch/main/aws/book/debug/cluster-errors.html#kubelet-stopped-posting-node-status

@jferris jferris added aws enhancement New feature or request labels Nov 16, 2021
@clarissalimab
Copy link
Contributor

This has been happening with a particular infra. The nodes are marked with an "Unknown" state and they don't get replaced automatically.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
aws enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants