Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

⚠️ Do not deduplicate warnings by default #2953

Conversation

dlipovetsky
Copy link
Contributor

Controllers are long-running processes, and deduplication, as implemented, increases memory use.

With this change, duplicate warnings will appear in the log by default. However, this is safe, because Kubernetes rotates container logs by default.

If a specific controller sees many duplicate warnings, it can configure the handler to deduplicate them.

@k8s-ci-robot
Copy link
Contributor

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@k8s-ci-robot k8s-ci-robot added do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 18, 2024
@k8s-ci-robot k8s-ci-robot added the size/XS Denotes a PR that changes 0-9 lines, ignoring generated files. label Sep 18, 2024
@sbueringer
Copy link
Member

sbueringer commented Sep 19, 2024

Absolutely in favor of this change.

The current default is a memory leak and I think the behavior is also questionable

/lgtm
(in general, pending un-draft + potentially unit test changes/fixes)

/assign @alvaroaleman @vincepri

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 19, 2024
@k8s-ci-robot
Copy link
Contributor

LGTM label has been added.

Git tree hash: 9bc8e9a4c90d0b860976c2fffeff13e19c7e9c43

@alvaroaleman
Copy link
Member

The current default is a memory leak

It takes some memory but how is it a leak? The number of unique warnings should be pretty finite or not?

@sbueringer
Copy link
Member

sbueringer commented Sep 19, 2024

The current default is a memory leak

It takes some memory but how is it a leak? The number of unique warnings should be pretty finite or not?

Good point. Is the number of unique warnings finite? (i.e. is there a guarantee that they never contain variable parts like object names?)

(IIRC every webhook is able to return warnings, so we have more than just the warnings built into the kube-apiserver to consider)

But independent of that part, I think it's surprising behavior that we log every unique warning only once.

I mean I wonder in general who notices these warnings in controller logs, but only logging them once doesn't make it better :)

@alvaroaleman
Copy link
Member

I mean I wonder in general who notices these warnings in controller logs, but only logging them once doesn't make it better :)

Yeah I am a bit thorn on this. On the one hand if this was net new I would keep them on because as you said, its easy to miss them if they are only logged once. On the other you can make the argument that this is a silently breaking change and ppl might complain about now getting too many warnings that they don't care about logged.

@sbueringer
Copy link
Member

Yup. We can also consider defaulting to no warning handler. Then people can add it if they want

@dlipovetsky
Copy link
Contributor Author

dlipovetsky commented Sep 19, 2024

This came out a discussion in a cluster-api PR: kubernetes-sigs/cluster-api#11179 (comment)

I think deduplication can make it harder to spot warnings. This is what I said there:

If the client makes two identical API calls hours apart, and both return the same warning, I would expect to find both in the log.

Even worse is that, with the current implementation, if the warning is logged, and that part of the log is overwritten, then evidence of the warning would disappear, because the warning would never be logged again, for as long as the process runs.

@dlipovetsky
Copy link
Contributor Author

dlipovetsky commented Sep 19, 2024

Good point. Is the number of unique warnings finite? (i.e. is there a guarantee that they never contain variable parts like object names?)

The "message" is the deduplication key here. And the message does include the object name, at least in the warnings I have seen. For example: https://github.com/kubernetes/kubernetes/blob/432e8937cfb1965f863b8f841af61a8cad7db965/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L89

(The above was incorrect)

@alvaroaleman
Copy link
Member

https://github.com/kubernetes/kubernetes/blob/432e8937cfb1965f863b8f841af61a8cad7db965/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L89

From what I can see there it only includes the fieldpath and the value but not object name or am I missing it?

@dlipovetsky
Copy link
Contributor Author

https://github.com/kubernetes/kubernetes/blob/432e8937cfb1965f863b8f841af61a8cad7db965/staging/src/k8s.io/apiextensions-apiserver/pkg/registry/customresource/validator.go#L89

From what I can see there it only includes the fieldpath and the value but not object name or am I missing it?

My mistake! You're right, the object name is not part of the message.

@dlipovetsky
Copy link
Contributor Author

To recap, a warning looks like this:

[KubeAPIWarningLogger] metadata.finalizers: "cluster.cluster.x-k8s.io": prefer a domain-qualified finalizer name to avoid accidental conflicts with other finalizer writers

In this warning, the field-path is metadata.finalizers, and "cluster.cluster.x-k8s.io" is the group-kind (no version).

Currently, warnings are de-duplicated by default, so only onewarning per field-path/group-kind combination shows up in the log for the lifetime of the manager process.

Reasons to stop de-duplicating by default:

  • Easy to miss warnings if they are logged only once.

Reasons to continue de-duplication by default:

  • Fewer warnings in the log, i.e. a less "noisy" log.

If there's no clear winner, then we can keep the existing behavior, and I'll close the PR.

@sbueringer
Copy link
Member

sbueringer commented Oct 31, 2024

I don't like the deduplication per default because one webhook in your cluster which returns non-deterministic warnings on the wrong resource is enough to ensure (probably) all CR-based comtrollers in the cluster go OOM.

As the benefit of deduplication is not that great, I don't think we should do it per default.

@dlipovetsky
Copy link
Contributor Author

Sounds good.

We can try to decrease the impact of the change. Some ideas:

  1. Write a user-friendly release note that explains how to enable de-duplication when creating the client.
  2. Announce the change in the notes of the next minor release, but only merge the change afterward, so that the default changes in two minor releases from now.

@sbueringer
Copy link
Member

sbueringer commented Nov 1, 2024

I would be fine with only 1. I think it's not a problematic change to disable the deduplication. I would be really surprised if a significant number of CR users a) know that we log warnings and that we de-duplicate them b) depend on this behavior in any way.

But let's see what @vincepri and @alvaroaleman think.

@alvaroaleman
Copy link
Member

But let's see what @vincepri and @alvaroaleman think.

Yeah, I am ok with changing it, too. Lets maybe mark this change as breaking since it changes behavior so that it shows up more prominently on the release notes, but other than that, I am fine with this.

@dlipovetsky dlipovetsky changed the title 🌱 Do not deduplicate warnings by default ⚠️ Do not deduplicate warnings by default Nov 5, 2024
Controllers are long-running processes, and deduplication, as
implemented, increases memory use.

With this change, duplicate warnings will appear in the log by default.
However, this is safe, because Kubernetes rotates container logs by
default.

If a specific controller sees many duplicate warnings, it can configure
the handler to deduplicate them.
@dlipovetsky dlipovetsky force-pushed the warning-do-not-deduplicate-by-default branch from 8632463 to 1eb0c53 Compare November 5, 2024 16:30
@dlipovetsky dlipovetsky marked this pull request as ready for review November 5, 2024 16:30
@k8s-ci-robot k8s-ci-robot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Nov 5, 2024
@k8s-ci-robot
Copy link
Contributor

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: alvaroaleman, dlipovetsky

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@k8s-ci-robot k8s-ci-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Nov 5, 2024
@k8s-ci-robot k8s-ci-robot merged commit 3ead9eb into kubernetes-sigs:main Nov 5, 2024
9 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. lgtm "Looks good to me", indicates that a PR is ready to be merged. size/XS Denotes a PR that changes 0-9 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants