Drop Node event when EC2 instance does not exist #753

cartermckinnon · 2023-11-27T21:17:08Z

What type of PR is this?

/kind bug

What this PR does / why we need it:

This avoids unnecessary retries when the ec2:CreateTags call fails with an InvalidInstanceId.NotFound error. Excessive retries for each event can lead to a growing work queue that may increase dequeue latency dramatically.

If the Node is newly-created, we requeue the event and retry, to handle eventual-consistency of this API. If the Node is not newly-created, we drop the event.

This PR is only concerned with retries for a single event; all nodes still have the implicit "retry" that results from each update event (every ~5m).

Which issue(s) this PR fixes:

Fixes #

Special notes for your reviewer:

Does this PR introduce a user-facing change?:

NONE

k8s-ci-robot · 2023-11-27T21:17:17Z

This issue is currently awaiting triage.

If cloud-provider-aws contributors determine this is a relevant issue, they will accept it by applying the triage/accepted label and provide further guidance.

The triage/accepted label can be added by org members by writing /triage accepted in a comment.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cartermckinnon · 2023-11-27T22:54:37Z

Something seems wonky with the CI, I'll look into it.

cartermckinnon · 2023-11-27T23:06:37Z

/retest

All failures are related to persistent volumes, doesn't seem related to this change.

tzneal

It look ok to me, but can we add a test for this new behavior in tagging_controller_test.go?

cartermckinnon · 2023-11-30T00:09:19Z

/retest

cartermckinnon · 2023-11-30T02:00:24Z

The CI is definitely hosed, same cases have been red in k/k since ~11/24: https://testgrid.k8s.io/presubmits-ec2#pull-kubernetes-e2e-ec2

cartermckinnon · 2023-11-30T05:20:30Z

@tzneal added a couple unit test cases 👍

tzneal · 2023-11-30T14:46:41Z

Seems ok, any idea on the CI problem?

cartermckinnon · 2023-11-30T19:45:21Z

Haven't had a chance to go down the rabbit hole. Looks like things broke when the kubekins image was bumped in the test spec, I assume a change in cloud-provider-aws-test-infra is the issue but I don't see anything obvious in the commit log.

@dims do you have a guess?

dims · 2023-11-30T20:08:08Z

@cartermckinnon no, i have not looked at this yet ..

cartermckinnon · 2023-11-30T21:58:06Z

@dims I'll try to get a fix up 👍

cartermckinnon · 2023-12-01T10:04:29Z

CI should be fixed by this: kubernetes-sigs/provider-aws-test-infra#232

cartermckinnon · 2023-12-01T17:23:34Z

/retest

mmerkes

LGTM

ndbaker1

Thanks for picking this up! did we also want to make the queue size visible with some logging around here? (since dequeuing latency isn't the most direct metric)

k8s-ci-robot · 2023-12-01T18:54:05Z

@ndbaker1: changing LGTM is restricted to collaborators

In response to this:

Thanks for picking this up! did we also want to make the queue size visible with some logging around here? (since dequeuing latency isn't the most direct metric)

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

cartermckinnon · 2023-12-01T19:24:52Z

did we also want to make the queue size visible

I plan to add a metric for this in a separate PR, because it'd be helpful to debug in the future; but I think dequeue latency is still the more important metric to track and alarm on. There can be many events in the queue that are no-ops, and that doesn't necessarily have an impact e.g. how quickly a new Node is tagged.

k8s-ci-robot · 2023-12-01T19:33:58Z

@ndbaker1: changing LGTM is restricted to collaborators

In response to this:

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

dims · 2023-12-01T21:04:35Z

/approve
/lgtm

k8s-ci-robot · 2023-12-01T21:04:43Z

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: dims, mmerkes, ndbaker1

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

~~OWNERS~~ [dims]

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

k8s-ci-robot requested review from dims and justinsb November 27, 2023 21:17

k8s-ci-robot added the size/S Denotes a PR that changes 10-29 lines, ignoring generated files. label Nov 27, 2023

cartermckinnon force-pushed the instance-not-exists-handling branch 3 times, most recently from 1eef0b0 to 1ccb35f Compare November 27, 2023 21:54

tzneal reviewed Nov 28, 2023

View reviewed changes

cartermckinnon force-pushed the instance-not-exists-handling branch from 1ccb35f to da5beab Compare November 30, 2023 04:51

k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. and removed size/S Denotes a PR that changes 10-29 lines, ignoring generated files. labels Nov 30, 2023

cartermckinnon force-pushed the instance-not-exists-handling branch 4 times, most recently from 02a76b9 to e80c120 Compare November 30, 2023 05:03

Drop Node events when EC2 instance does not exist and node is not new

1108878

cartermckinnon force-pushed the instance-not-exists-handling branch from e80c120 to 1108878 Compare November 30, 2023 05:05

mmerkes approved these changes Dec 1, 2023

View reviewed changes

k8s-ci-robot assigned mmerkes Dec 1, 2023

k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Dec 1, 2023

ndbaker1 suggested changes Dec 1, 2023

View reviewed changes

ndbaker1 approved these changes Dec 1, 2023

View reviewed changes

k8s-ci-robot assigned dims Dec 1, 2023

k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Dec 1, 2023

k8s-ci-robot merged commit ca6c03d into kubernetes:master Dec 1, 2023
13 checks passed

cartermckinnon deleted the instance-not-exists-handling branch December 1, 2023 21:11

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Drop Node event when EC2 instance does not exist #753

Drop Node event when EC2 instance does not exist #753

cartermckinnon commented Nov 27, 2023 •

edited

Loading

k8s-ci-robot commented Nov 27, 2023

cartermckinnon commented Nov 27, 2023

cartermckinnon commented Nov 27, 2023

tzneal left a comment

cartermckinnon commented Nov 30, 2023

cartermckinnon commented Nov 30, 2023

cartermckinnon commented Nov 30, 2023

tzneal commented Nov 30, 2023

cartermckinnon commented Nov 30, 2023

dims commented Nov 30, 2023

cartermckinnon commented Nov 30, 2023

cartermckinnon commented Dec 1, 2023

cartermckinnon commented Dec 1, 2023

mmerkes left a comment

ndbaker1 left a comment

k8s-ci-robot commented Dec 1, 2023

cartermckinnon commented Dec 1, 2023

k8s-ci-robot commented Dec 1, 2023

dims commented Dec 1, 2023

k8s-ci-robot commented Dec 1, 2023

Drop Node event when EC2 instance does not exist #753

Drop Node event when EC2 instance does not exist #753

Conversation

cartermckinnon commented Nov 27, 2023 • edited Loading

k8s-ci-robot commented Nov 27, 2023

cartermckinnon commented Nov 27, 2023

cartermckinnon commented Nov 27, 2023

tzneal left a comment

Choose a reason for hiding this comment

cartermckinnon commented Nov 30, 2023

cartermckinnon commented Nov 30, 2023

cartermckinnon commented Nov 30, 2023

tzneal commented Nov 30, 2023

cartermckinnon commented Nov 30, 2023

dims commented Nov 30, 2023

cartermckinnon commented Nov 30, 2023

cartermckinnon commented Dec 1, 2023

cartermckinnon commented Dec 1, 2023

mmerkes left a comment

Choose a reason for hiding this comment

ndbaker1 left a comment

Choose a reason for hiding this comment

k8s-ci-robot commented Dec 1, 2023

cartermckinnon commented Dec 1, 2023

k8s-ci-robot commented Dec 1, 2023

dims commented Dec 1, 2023

k8s-ci-robot commented Dec 1, 2023

cartermckinnon commented Nov 27, 2023 •

edited

Loading