Scale tests might leak InmemoryCluster objects #11750

chrischdi · 2025-01-24T10:59:01Z

What steps did you take and what happened?

Did run the scale-tests using the inmemory provider
Noticed there are duplicate inmemorycluster objects created

$ k get inmemorycluster -A | awk '{print $1}'  | sort  | uniq -c | sort -nr | head -n 10
      2 scale-042
      2 scale-035
      2 scale-026
      2 scale-009
      1 scale-500
      1 scale-499
      1 scale-498

What did you expect to happen?

only one inmemorycluster object to exist per cluster.

Cluster API version

~ main

Kubernetes version

No response

Anything else you would like to add?

It is propably due to the scale test which uses CreateOrUpdate.

An update drops all references, so KCP & InMemoryCluster got re-created.

Propably related log when this happens:

failed to create patch helper for Cluster scale-009/scale-009: server side apply dry-run failed for original object: Internal error occurred: failed calling webhook "validation.cluster.cluster.x-k8s.io": failed to call webhook: Post "[https://capi-webhook-service.capi-system.svc:443/validate-cluster-x-k8s-io-v1beta1-cluster?timeout=10s](https://capi-webhook-service.capi-system.svc/validate-cluster-x-k8s-io-v1beta1-cluster?timeout=10s)": context deadline exceeded
error reconciling the Cluster topology
sigs.k8s.io/cluster-api/internal/controllers/topology/cluster.(*Reconciler).reconcile
	/home/ubuntu/go/src/sigs.k8s.io/cluster-api/internal/controllers/topology/cluster/cluster_controller.go:422
sigs.k8s.io/cluster-api/internal/controllers/topology/cluster.(*Reconciler).Reconcile
	/home/ubuntu/go/src/sigs.k8s.io/cluster-api/internal/controllers/topology/cluster/cluster_controller.go:343
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Reconcile
	/home/ubuntu/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:116
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).reconcileHandler
	/home/ubuntu/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:303
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).processNextWorkItem
	/home/ubuntu/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:263
sigs.k8s.io/controller-runtime/pkg/internal/controller.(*Controller[...]).Start.func2.2
	/home/ubuntu/go/pkg/mod/sigs.k8s.io/[email protected]/pkg/internal/controller/controller.go:224
runtime.goexit
	/usr/local/go/src/runtime/asm_amd64.s:1700

Label(s) to be applied

/kind bug
/area e2e-testing

The text was updated successfully, but these errors were encountered:

chrischdi · 2025-01-24T11:01:19Z

/triage accepted

/help

k8s-ci-robot · 2025-01-24T11:01:22Z

@chrischdi:
This request has been marked as needing help from a contributor.

Guidelines

Please ensure that the issue body includes answers to the following questions:

Why are we solving this issue?
To address this issue, are there any code changes? If there are code changes, what needs to be done in the code and what places can the assignee treat as reference points?
Does this issue have zero to low barrier of entry?
How can the assignee reach out to you for help?

For more details on the requirements of such an issue, please see here and ensure that they are met.

If this request no longer meets these requirements, the label can be removed
by commenting with the /remove-help command.

In response to this:

/triage accepted

/help

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

sbueringer · 2025-01-24T11:04:34Z

Thx for opening the issue!

Could it be that this also happens with a real infra provider and outside of scale tests?

I was wondering if this also happens if after CP / InfraCluster creation the Cluster update call fails? But maybe the cluster-shim protects us against that?

sbueringer · 2025-01-24T11:08:44Z

Looked at the implementation, my impression is that the cluster-shim doesn't protect us against these kind of leaks. But I could be wrong

chrischdi · 2025-01-24T11:59:26Z

Not sure about that.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Scale tests might leak InmemoryCluster objects #11750

Scale tests might leak InmemoryCluster objects #11750

chrischdi commented Jan 24, 2025

chrischdi commented Jan 24, 2025

k8s-ci-robot commented Jan 24, 2025

sbueringer commented Jan 24, 2025 •

edited

Loading

sbueringer commented Jan 24, 2025 •

edited

Loading

chrischdi commented Jan 24, 2025

Scale tests might leak InmemoryCluster objects #11750

Scale tests might leak InmemoryCluster objects #11750

Comments

chrischdi commented Jan 24, 2025

What steps did you take and what happened?

What did you expect to happen?

Cluster API version

Kubernetes version

Anything else you would like to add?

Label(s) to be applied

chrischdi commented Jan 24, 2025

k8s-ci-robot commented Jan 24, 2025

Guidelines

sbueringer commented Jan 24, 2025 • edited Loading

sbueringer commented Jan 24, 2025 • edited Loading

chrischdi commented Jan 24, 2025

sbueringer commented Jan 24, 2025 •

edited

Loading

sbueringer commented Jan 24, 2025 •

edited

Loading