Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(kube-node-lease): remove objects created for kube-node-lease #4894

Closed
wants to merge 1 commit into from

Conversation

arnoldyahad
Copy link

Fixes #4547

Hey,
It looks like about 2 months ago, resources for kube-node-lease were added to karpenter:
#4453

Those changes are causing karpenter not to work on clusters without a namespace called kube-node-lease

as i see, this namespace is used for test, as it appers only in those files:

karpenter/test/suites/beta/integration/lease_garbagecollection_test.go
karpenter/test/suites/beta/integration/lease_garbagecollection_test.go

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.

@arnoldyahad arnoldyahad requested a review from a team as a code owner October 23, 2023 12:02
@netlify
Copy link

netlify bot commented Oct 23, 2023

Deploy Preview for karpenter-docs-prod canceled.

Name Link
🔨 Latest commit 5e47906
🔍 Latest deploy log https://app.netlify.com/sites/karpenter-docs-prod/deploys/653660ed916d970008ad23ca

@njtran
Copy link
Contributor

njtran commented Oct 23, 2023

Hey @arnoldyahad this was added as a way to mitigate an upstream bug until it was fixed upstream: kubernetes/kubernetes#109777. Are you running self-managed kubernetes? How do you manage node leases? Is it possible for you to create the namespace?

@ellistarn
Copy link
Contributor

How do you manage node leases? Is it possible for you to create the namespace?

@njtran , should we just log (info) and continue if we don't find this namespace? We definitely shouldn't block Karpenter's operations on this.

@njtran
Copy link
Contributor

njtran commented Oct 23, 2023

@ellistarn yeah that'd be reasonable. @arnoldyahad Would that be sufficient for you? You could then remove the role and rolebinding from your own CI/CD as Karpenter wouldn't make the call.

@jonathan-innis
Copy link
Contributor

I had a general question about whether having this namespace is expected for all clusters or under which circumstances we shouldn't expect to see this namespace? Is is possible to put your node leases in a different namespace?

@arnoldyahad
Copy link
Author

@ellistarn @njtran first of all thanks for the quick reply.
we are using eks 1.23 (planning to upgrade to 1.24)
but there are also old clusters(1.22) that we dont plan to upgrade.

it is possible for us to do this workaround and just not use anything related to kube-node-lease but this would mean that we are divergent from karpenter and would need to do this fix after every chart upgrade.

The problem here is that users can't use newer karpenter versions (#4547) without the namespace, and if its possible for karpenter to work without it - those changes should be optional and maybe inserted only if a cluster has this namespace already.

if you are aware that this namespace is created automatically by k8s/eks in later versions and is mandatory for cluster operations please tell us and we will add it manually for now, but if its not - better not to enforce it if not needed.

@jonathan-innis
Copy link
Contributor

if you are aware that this namespace is created automatically by k8s/eks in later versions and is mandatory for cluster operations please tell us and we will add it manually for now, but if its not - better not to enforce it if not needed.

AFAIK, I thought the feature for Node leases being stored in the kube-node-lease namespace became stable at 1.17 when this feature was introduced: https://github.com/kubernetes/enhancements/tree/master/keps/sig-node/589-efficient-node-heartbeats. Do you know if the node lease heartbeats are being stored in another namespace? I don't believe that this behavior is specific to EKS.

@arnoldyahad
Copy link
Author

@jonathan-innis thanks for replying.

i checked our clusters for leases by using:

 kubectl -n kube-system get lease

and i get information regarding all leases, so i don't know if the feature to have it in a separate namespace is "enforced" somehow and we skipped it, or its an "opt-in" to have it in a kube-node-lease dashboard.
our clusters vary from EKS-1.19 to 1.23.

@arnoldyahad
Copy link
Author

@ellistarn @njtran
I confirmed with a teammate that this is not the only problem but even when removing the roles karpenter tries to get to that namespace and list resources:

{"severity":"INFO","timestamp":"2023-10-24T08:46:19.084316716Z","logger":"controller","caller":"cache/reflector.go:424","message":"k8s.io/[email protected]/tools/cache/reflector.go:169: failed to list *v1.Lease: leases.coordination.k8s.io is for
bidden: User \"system:serviceaccount:kube-system:karpenter-controller\" cannot list resource \"leases\" in API group \"coordination.k8s.io\" in the namespace \"kube-node-lease\"\n","commit":"322822a"}
{"severity":"ERROR","timestamp":"2023-10-24T08:46:19.084371722Z","logger":"controller","caller":"cache/reflector.go:140","message":"k8s.io/[email protected]/tools/cache/reflector.go:169: Failed to watch *v1.Lease: failed to list *v1.Lease: lease
s.coordination.k8s.io is forbidden: User \"system:serviceaccount:kube-system:karpenter-controller\" cannot list resource \"leases\" in API group \"coordination.k8s.io\" in the namespace \"kube-node-lease\"\n","commit":"322822a","stacktrace":"k8s
.io/client-go/tools/cache.DefaultWatchErrorHandler\n\tk8s.io/[email protected]/tools/cache/reflector.go:140\nk8s.io/client-go/tools/cache.(*Reflector).Run.func1\n\tk8s.io/[email protected]/tools/cache/reflector.go:224\nk8s.io/apimachinery/pkg/ut
il/wait.BackoffUntil.func1\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:157\nk8s.io/apimachinery/pkg/util/wait.BackoffUntil\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:158\nk8s.io/client-go/tools/cache.(*Reflector).Run\n\tk8s.io
/[email protected]/tools/cache/reflector.go:222\nk8s.io/client-go/tools/cache.(*controller).Run.(*Group).StartWithChannel.func2\n\tk8s.io/[email protected]/pkg/util/wait/wait.go:58\nk8s.io/apimachinery/pkg/util/wait.(*Group).Start.func1\n\tk8
s.io/[email protected]/pkg/util/wait/wait.go:75"}

might be related to:
https://github.com/aws/karpenter-core/blob/223dcd0fda49f05b3d7beebc45cdb843caaefd3c/pkg/operator/operator.go#L152
or
https://github.com/aws/karpenter-core/blob/223dcd0fda49f05b3d7beebc45cdb843caaefd3c/pkg/operator/operator.go#L152

so just removing the permissions to that namespace as i did in this PR is not enough to solve this issue.

@jonathan-innis
Copy link
Contributor

i get information regarding all leases, so i don't know if the feature to have it in a separate namespace is "enforced" somehow and we skipped it

If you are getting information on the leases in that namespace, then do you know why Karpenter is failing to start then? It sounds like the namespace exists for the cluster, then? Or is there a cluster that you have an example of where you don't have the namespace and that's the cluster that Karpenter is failing to start on?

@jonathan-innis
Copy link
Contributor

Hey @arnoldyahad! Any updates here? We want to drive toward getting a solution to your problem here but it's still a little unclear to me what the cluster configuration looks like that's causing this namespace not to exist.

Copy link
Contributor

This PR has been inactive for 14 days. StaleBot will close this stale PR after 14 more days of inactivity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Namespace kube-node-lease is now required?
4 participants