Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question: sharding on kustomize controller only #1174

Open
masterphenix opened this issue May 29, 2024 · 4 comments
Open

Question: sharding on kustomize controller only #1174

masterphenix opened this issue May 29, 2024 · 4 comments

Comments

@masterphenix
Copy link

masterphenix commented May 29, 2024

Hello,
One of our k8s cluster has more than 300 namespaces ; we are using flux to manage content of this cluster, by means of:

  • 1 gitRepository source, that points to a unique repository, formatted like this:
<cluster_name>
  |_<namespace1>
  |_<namespace2>
    |_ kustomization and resources
  • 1 kustomization (kustomize.toolkit.fluxcd.io/v1) per namespace
  • a webhook to trigger reconcilation when a push is made on the git repo

Today, we are facing some delays when pushing changes to git, because the kustomize controller will trigger a reconcile on every kustomization (more than 300) ; as a result, for a small change on a single namespace, it can take more than 5 minutes to be applied on the cluster, depending on the order in which reconciliations are done.

We considered using the sharding capability of the kustomize controller, but it appears that, since flux kustomization are depending on the GitRepository source, the latter should also be sharded: this is an issue for us, since there is only 1 gitRepository for 300 kustomization.

Did we understand something wrong ? Would you by any chance have any recommendations to help us lower these delays ?

@stefanprodan
Copy link
Member

You can shard only kustomize-controller, but before doing this, I suggest you increase the --concurrent and limits, see https://fluxcd.io/flux/installation/configuration/vertical-scaling/.

A single kustomize-controller instance can do 1K reconciliation in about 3m, see https://github.com/fluxcd/flux-benchmark/blob/main/RESULTS.md

@masterphenix
Copy link
Author

Thank you for your quick answer ! I'll look deeper into that, thanks a lot.

Regarding sharding, I did try to shard only kustomize-controller, but all resources were handled by the shard instead of the main controller, which made me believe that source controller was supposed to be sharded also. But I just found out, while replying to you, that we activated sharding while keeping the following flag: --enable-leader-election, which, of course, is kinda incompatible with sharding. I tried without it, and sharding seems to work better ;-)

@stefanprodan
Copy link
Member

The --enable-leader-election is not incompatible with sharding, each shard elects a primary.

@masterphenix
Copy link
Author

That's odd, as soon as I removed leader election, both controllers started behaving normally ; before that, I noticed that both the main controller and the shard tried to acquire the same lease, the main controller failed to win the election, the shard behaved as the main controller and totally disregarded the flag --watch-label-selector=sharding.fluxcd.io/key and handled all resources.

I just tried to reproduce that, but with no luck

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants