Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fetch SAs from apiserver #242

Closed
wants to merge 5 commits into from

Conversation

modulitos
Copy link
Contributor

@modulitos modulitos commented Oct 21, 2024

Issue #, if available:
#174

Description of changes:
Enhances the implementation introduced in #236 so that we can fetch missing service accounts from the apiserver. Retains the existing guarantees that we won't fetch multiple service accounts concurrently, minimizing load on the apiserver.

This feature is still in shadow mode (off by default).

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

@modulitos modulitos requested a review from a team as a code owner October 21, 2024 02:57
@modulitos modulitos force-pushed the notifications-enhancement branch 2 times, most recently from 5af4ac8 to 5020bec Compare October 21, 2024 04:00
@modulitos modulitos force-pushed the notifications-enhancement branch 2 times, most recently from ed8a585 to 0178602 Compare November 14, 2024 22:26
}

go func() {
for req := range saFetchRequests {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would mean we are making only one request at a time to apiserver right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wrapped this in a goroutine - thanks for the catch 💯

defer cancel()

klog.V(5).Infof("fetching SA: %s", req.CacheKey())
saList, err := getter.ServiceAccounts(req.Namespace).List(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why List? why can't we use Get?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I moved this to Get

@modulitos modulitos force-pushed the notifications-enhancement branch from a37069c to ccbb720 Compare November 18, 2024 19:45
@modulitos modulitos force-pushed the notifications-enhancement branch from ccbb720 to 1b19f6f Compare November 26, 2024 01:31
@modulitos modulitos force-pushed the notifications-enhancement branch from 2c19173 to 9a8d1ad Compare November 26, 2024 22:12
@@ -433,9 +433,10 @@ func (m *Modifier) buildPodPatchConfig(pod *corev1.Pod) *podPatchConfig {
}

// Use the STS WebIdentity method if set
request := cache.Request{Namespace: pod.Namespace, Name: pod.Spec.ServiceAccountName, RequestNotification: true}
gracePeriodEnabled := m.saLookupGraceTime > 0
request := cache.Request{Namespace: pod.Namespace, Name: pod.Spec.ServiceAccountName, RequestNotification: gracePeriodEnabled}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this change to toggle RequestNotification only when the grace period is enabled.

Previously it was basically a no-op when the feature is disabled and RequestNotification is true, but now we're fetching from the API server when it's true. So we only want it set when the feature is enabled.

if !found {
notifier = make(chan struct{})
n.handlers[req.CacheKey()] = notifier
n.fetchRequests <- &req
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We control the APIServer request rate through the size of the channel but it has two downsides:

  1. The APIServer request is actually not rate limited. There is no request handling rate limiter in the channel consumption and it is possible that there could be > 100 requests (for different namespace/name) sent in the same second given the channel consumer initiates a new go routine to submit a request to APIServer
  2. If due to some reasons the channel consumer dies or the queue is full, the create function will hang until the channel has some capacity. It could unexpectedly delay pod creation for arbitrary time.

A better choice could be use a larger channel size to minimize the chance of channel write blocking, and implement a more robust channel consumer which limit the consumption rate. In case of extremely high volumes of requests queued in the channel and the API requests could not be sent in time, the result would be either be the cache is synced before grace period and pod is mutated, or cache is not synced and the pod is not mutated. But no prolonged delay to pod creation or excessive requests to the APIServer

@haoranleo
Copy link
Contributor

Moved the changes to #252

@haoranleo haoranleo closed this Jan 10, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants