-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
NAT Gateways not recreated #16876
Comments
@zaneclaes It seems that there are some resources (EIPs) that stil reference the deleted NATGW. Could you delete those also and retry? |
There are zero EIPs in the AWS account. They were deleted at the same time
as the NAT gateway. What gave that impression?
…On Fri, Oct 4, 2024 at 11:38 PM Ciprian Hacman ***@***.***> wrote:
@zaneclaes <https://github.com/zaneclaes> It seems that there are some
resources (EIPs) that stil reference the deleted NATGW. Could you delete
those also and retry?
—
Reply to this email directly, view it on GitHub
<#16876 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAP47VMN34WXOCKSGN6VQQLZZ53PDAVCNFSM6AAAAABPMVJS7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJUHEZTMOBUG4>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
There is something that still mentions "nat-0460de55eeb540794" in logs, maybe routing table, but such references should not exist anymore. |
I have verified that the routing table and any resources I’m aware of that
are tied to the NAT were also deleted. The way I read the log statement,
kops thinks the EIP exists and is trying to describe the NAT gateway which
should be attached to it. It sounds like the Describe check is happening as
a child of the EIP provisioning, but a Describe is invalid on a deleted
gateway…
…On Sat, Oct 5, 2024 at 5:50 AM Ciprian Hacman ***@***.***> wrote:
There is something that still mentions "nat-0460de55eeb540794" in logs,
maybe routing table, but such references should not exist anymore.
—
Reply to this email directly, view it on GitHub
<#16876 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAP47VNLMEBSNF4VMRCKKTDZZ7HAFAVCNFSM6AAAAABPMVJS7GVHI2DSMVQWIX3LMV43OSLTON2WKQ3PNVWWK3TUHMZDGOJVGAZTANJWGU>
.
You are receiving this because you were mentioned.Message ID:
***@***.***>
|
Yes, but kOps does not keep track of the resources. The NGW ID you see there comes from some other resource that used to reference it. |
According to the Kops source code, one of the AWS route tables for your cluster contains a route to the deleted NGW ID. The route likely has a state of |
Thanks for the clarifications; that's very helpful. I've cleared out all the routes in the account (by first removing them from the associated subnets) except the default Route Table for the cluster (which cannot be deleted as the default for the VPC). However |
Can you find any Elastic IPs for the cluster? They'll be tagged with your cluster name but i'm not sure if they'll have an association, given their NGW was deleted. Deleting those EIPs may help. |
@rifelpet every time I delete all the Elastic IPs and then run Just to be clear:
When I run a
|
/kind bug
1. What
kops
version are you running? The commandkops version
, will displaythis information.
1.30.1
2. What Kubernetes version are you running?
kubectl version
will print theversion if a cluster is running or provide the Kubernetes version specified as
a
kops
flag.1.30.2
3. What cloud provider are you using?
AWS
4. What commands did you run? What is the simplest way to reproduce this issue?
Manually deleted the NAT gateways and EIPs on AWS (whoops).
Tried
kops update cluster
but it does not detect the deletion; instead it spits out NAT gateway errors:W1004 15:50:51.222631 35897 executor.go:141] error running task "ElasticIP/us-east" (7m29s remaining to succeed): error finding AssociatedNatGatewayRouteTable: error listing NatGateway "nat-0460de55eeb540794": operation error EC2: DescribeNatGateways, https response error StatusCode: 400, RequestID: 6af7f0d1-1b02-461f-be25-b83f6b4330c9, api error NatGatewayNotFound: The Nat gateway nat-0460de55eeb540794 was not found
5. What happened after the commands executed?
Cluster is no longer working, and no kops commands seem to fix it.
6. What did you expect to happen?
According to #6830 and #6518 this should be fixed.
7. Please provide your cluster manifest. Execute
kops get --name my.example.com -o yaml
to display your cluster manifest.You may want to remove your cluster name and other sensitive information.
8. Please run the commands with most verbose logging by adding the
-v 10
flag.Paste the logs into this report, or in a gist and provide the gist link here.
9. Anything else do we need to know?
Fat-fingered NAT deletion... but I really don't want to rebuild the whole cluster 😢 🙏
The text was updated successfully, but these errors were encountered: