Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(client): kube-api timeout requests #627

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

genofire
Copy link

No description provided.

@genofire
Copy link
Author

@stefanprodan any feedback?

@stefanprodan
Copy link
Member

stefanprodan commented Aug 17, 2023

All API calls that Flux makes use a Go context with a timeout, the timeout is set to the value supplied by users in each Flux CR using .spec.timeout. Having a global timeout is not something that I would consider.

@genofire
Copy link
Author

genofire commented Aug 17, 2023

where is this timeout set?

in it is used an context.Background() which has no Deadline / Timeout.
https://github.com/fluxcd/pkg/pull/627/files#diff-0f7f807ccdd7a2d2ea8cde48b96efc18dac5566a2a0117d81a971c4439b4abc2R81

@stefanprodan
Copy link
Member

@genofire
Copy link
Author

sorry, i mean - where is the context created? (for my problem inside of source-controller for helmrepos).

@stefanprodan
Copy link
Member

@hiddeco @darkowlzz any idea how SC could get stuck forever when reconciling Helm repos?

@darkowlzz
Copy link
Contributor

darkowlzz commented Aug 18, 2023

Going by the information in this comment fluxcd/source-controller#1173 (comment) , it may not be stuck. It's more likely to have entered into a considerably long exponential back-off after failing for a few times. If the reconciliation is stuck, manually triggering a reconciliation would not result in a reconciliation to start as the controller-runtime workqueue groups jobs for the same object together and one object can't be processed in parallel.
Also as per the object status shard in fluxcd/source-controller#1173 (comment) , the objects don't seem to be undergoing a reconciliation. Reconciling=ProgressingWithRetry means that the reconciliation of the object has failed. When the next reconciliation starts, it'll change to Reconciling=Progressing. If the object remains in this state for a long time, then we can say that it's stuck.

It would help to gather more information about the behavior by isolating the scenario and sharing related logs, events and object status when this happens. If manually triggering reconciliation works, the object should be able to reconcile on its own after some time. The max retry delay can also be configured to reduce this back-off time using the flag:

--max-retry-delay duration The maximum amount of time for which an object being reconciled will have to wait before a retry. (default 15m0s)

But one of the comment mentioned that the secrets were available for 31 minutes. 15 minutes of max retry delay shouldn't be an issue in that case. Maybe you need to increase the concurrency of the controller, in case it's busy reconciling other objects, using the flag:

--concurrent int The number of concurrent resource reconciles. (default 4)

@patsevanton
Copy link

Are there any updates?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants