Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Retry failed connections to the APIserver #1044

Open
metral opened this issue Mar 25, 2020 · 1 comment
Open

Retry failed connections to the APIserver #1044

metral opened this issue Mar 25, 2020 · 1 comment
Labels
impact/quality impact/reliability Something that feels unreliable or flaky kind/enhancement Improvements or new features

Comments

@metral
Copy link
Contributor

metral commented Mar 25, 2020

Problem description

During network blips, a common occurrence in CI with lots of parallel jobs running, connection to the API server can become unreachable (see logs below), causing the update to fail. Usually a follow up update will have the connectivity issues resolved and produce a successful update.

In the current implementation there are no retries in the event of an unreachable API server, as errors in this space tend to be user-driven with misconfigurations, or trying to reach a deleted cluster, and do not warrant retries.

Errors & Logs

As you can see, certain k8s resources are created but the pods do not, which means the API server is reachable for some time, but then part way through we error out during what seems to be a network blip.

error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials

Log snippet:

[ s/eks/examples/role-kubeconfig ]  +  pulumi:providers:kubernetes role-kubeconfig-eks-k8s creating 
[ s/eks/examples/role-kubeconfig ]  +  pulumi-nodejs:dynamic:Resource role-kubeconfig-vpc-cni creating 
[ s/eks/examples/role-kubeconfig ]  +  pulumi:providers:kubernetes role-kubeconfig-eks-k8s created 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:ConfigMap role-kubeconfig-nodeAccess creating 
[ s/eks/examples/role-kubeconfig ]  +  pulumi-nodejs:dynamic:Resource role-kubeconfig-vpc-cni created 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:ConfigMap role-kubeconfig-nodeAccess creating 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:ConfigMap role-kubeconfig-nodeAccess created 
[ s/eks/examples/role-kubeconfig ]  +  aws:cloudformation:Stack role-kubeconfig-nodes creating 
[ s/eks/examples/role-kubeconfig ] @ Updating......
[ s/eks/examples/role-kubeconfig ]  +  aws:cloudformation:Stack role-kubeconfig-nodes created 
[ s/eks/examples/role-kubeconfig ]  +  pulumi:providers:kubernetes role-kubeconfig-provider creating 
[ s/eks/examples/role-kubeconfig ]  +  pulumi:providers:kubernetes role-kubeconfig-provider created 
[ s/eks/examples/role-kubeconfig ] @ Updating......
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:Namespace apps creating 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:Namespace apps creating 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:Namespace apps created 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:rbac.authorization.k8s.io:Role pulumi-devs creating 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:rbac.authorization.k8s.io:Role pulumi-devs creating 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:rbac.authorization.k8s.io:Role pulumi-devs created 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:rbac.authorization.k8s.io:RoleBinding pulumi-devs creating 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:rbac.authorization.k8s.io:RoleBinding pulumi-devs creating 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:rbac.authorization.k8s.io:RoleBinding pulumi-devs created 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:Pod nginx creating 
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:Pod nginx creating error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials
[ s/eks/examples/role-kubeconfig ]  +  kubernetes:core:Pod nginx **creating failed** error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials
[ s/eks/examples/role-kubeconfig ]  +  pulumi:pulumi:Stack role-kubeconfig-p-it-travis-job-role-kubec-06638e6d creating error: update failed
[ s/eks/examples/role-kubeconfig ]  +  pulumi:pulumi:Stack role-kubeconfig-p-it-travis-job-role-kubec-06638e6d **creating failed** 1 error; 4 messages
[ s/eks/examples/role-kubeconfig ]  
[ s/eks/examples/role-kubeconfig ] Diagnostics:
[ s/eks/examples/role-kubeconfig ]   kubernetes:core:Pod (nginx):
[ s/eks/examples/role-kubeconfig ]     error: configured Kubernetes cluster is unreachable: unable to load schema information from the API server: the server has asked for the client to provide credentials
[ s/eks/examples/role-kubeconfig ]  
[ s/eks/examples/role-kubeconfig ]   pulumi:pulumi:Stack (role-kubeconfig-p-it-travis-job-role-kubec-06638e6d):
[ s/eks/examples/role-kubeconfig ]     flag provided but not defined: -tracing
[ s/eks/examples/role-kubeconfig ]     Usage of tf-provider-flags:
[ s/eks/examples/role-kubeconfig ]       -get-provider-info
[ s/eks/examples/role-kubeconfig ]         	dump provider info as JSON to stdout
[ s/eks/examples/role-kubeconfig ]  
[ s/eks/examples/role-kubeconfig ]     error: update failed
[ s/eks/examples/role-kubeconfig ] 

Suggestions for a fix

Add a max retry of say (3) attempts to the apiserver, before erroring.

@lblackstone lblackstone self-assigned this Mar 27, 2020
@lblackstone lblackstone removed their assignment Jul 13, 2023
@lblackstone lblackstone added impact/reliability Something that feels unreliable or flaky kind/enhancement Improvements or new features impact/quality labels Jul 13, 2023
@PRIHLOP
Copy link

PRIHLOP commented Jan 10, 2025

I agree with this suggestion.
I have an issue with no configurable retries to connect to Kubernetes API.
It is probably a better option to add the possibility of configuring the retries count.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
impact/quality impact/reliability Something that feels unreliable or flaky kind/enhancement Improvements or new features
Projects
None yet
Development

No branches or pull requests

3 participants