Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

argo cd application for wiz-kubernetes-integration after a while becomes outofsync #273

Open
dhorner71 opened this issue Feb 28, 2024 · 8 comments

Comments

@dhorner71
Copy link

dhorner71 commented Feb 28, 2024

after some time after successful install, argocd reports that the application (wiz-kubernetes-integration helm chart) is out of sync and unable to self heal.

wiz-kubernetes-integration-wiz-admission-controller:
reported manifest diff that is unable to resolve/self heal:
rollme.webhookCert

argocd sync logs:
deleting wiz-auto-modify-connector service account

workaround:
manual deleting of service account resumes sync successfully. this step seems to kick off the integration job which starts to properly reinstall all the respective resources

environment:
app.kubernetes.io/chartName: wiz-admission-controller
app.kubernetes.io/instance: wiz-kubernetes-integration
app.kubernetes.io/managed-by: Helm
app.kubernetes.io/name: wiz-admission-controller
app.kubernetes.io/version: '2.4'
helm.sh/chart: wiz-admission-controller-3.4.13
wiz helm chart 0.1.85
AWS EKS 1.27

@dhorner71
Copy link
Author

we currently have three separate clusters with two of the three exhibiting this behavior. after successfully resyncing the two by manually deleting that service account, all three clusters are still in good health and not out of sync for 24 hours. i will continue to monitor. we have 4 other clusters that will be deployed to too in the near future so i'll be able to report their status soon.

@dhorner71
Copy link
Author

another 24 hours and no symptoms. closing ticket.

@dhorner71
Copy link
Author

4 out of our 8 clusters are reporting out of sync in argo cd this morning. will research and post relevant logs.

@dhorner71 dhorner71 reopened this Mar 20, 2024
@dhorner71
Copy link
Author

dhorner71 commented Mar 20, 2024

wiz-kubernetes-integration-wiz-admission-controller logs:

{"level":"info","time":"2024-03-18T20:20:51.874568707Z","msg":"Auth data is expired, authenticating client","expiresAt":"2024-03-18T20:05:52.514527851Z","timeSinceExpired":"14m59.359973513s"}
{"level":"info","time":"2024-03-18T20:21:03.547774867Z","msg":"Auth data is expired, authenticating client","expiresAt":"2024-03-18T20:06:03.792072141Z","timeSinceExpired":"14m59.755684293s"}
{"level":"error","time":"2024-03-18T20:21:21.876081997Z","msg":"error posting token request to url=https://auth.app.wiz.io/oauth/token, status=, resp=","error":"Post \"https://auth.app.wiz.io/oauth/token\": dial tcp 75.2.83.126:443: i/o timeout"}
{"level":"error","time":"2024-03-18T20:21:21.876359448Z","msg":"Failed to reauthenticate client","error":"failed authenticating with credentials: error posting token request: Post \"https://auth.app.wiz.io/oauth/token\": dial tcp 75.2.83.126:443: i/o timeout"}
{"level":"info","time":"2024-03-18T20:21:21.876521311Z","msg":"Auth data is not cached, authenticating client"}
{"level":"error","time":"2024-03-18T20:21:33.549479388Z","msg":"error posting token request to url=https://auth.app.wiz.io/oauth/token, status=, resp=","error":"Post \"https://auth.app.wiz.io/oauth/token\": dial tcp 99.83.196.38:443: i/o timeout"}
{"level":"error","time":"2024-03-18T20:21:33.54972353Z","msg":"Failed to reauthenticate client","error":"failed authenticating with credentials: error posting token request: Post \"https://auth.app.wiz.io/oauth/token\": dial tcp 99.83.196.38:443: i/o timeout"}
{"level":"info","time":"2024-03-18T20:21:33.549820082Z","msg":"Auth data is not cached, authenticating client"}
{"level":"error","time":"2024-03-18T20:21:51.87763835Z","msg":"error posting token request to url=https://auth.app.wiz.io/oauth/token, status=, resp=","error":"Post \"https://auth.app.wiz.io/oauth/token\": dial tcp 99.83.196.38:443: i/o timeout"}
{"level":"error","time":"2024-03-18T20:21:51.877897953Z","msg":"Failed to reauthenticate client","error":"failed authenticating with credentials: error posting token request: Post \"https://auth.app.wiz.io/oauth/token\": dial tcp 99.83.196.38:443: i/o timeout"}
{"level":"error","time":"2024-03-18T20:22:03.550501554Z","msg":"error posting token request to url=https://auth.app.wiz.io/oauth/token, status=, resp=","error":"Post \"https://auth.app.wiz.io/oauth/token\": dial tcp 99.83.196.38:443: i/o timeout"}
{"level":"error","time":"2024-03-18T20:22:03.550656581Z","msg":"Failed to reauthenticate client","error":"failed authenticating with credentials: error posting token request: Post \"https://auth.app.wiz.io/oauth/token\": dial tcp 99.83.196.38:443: i/o timeout"}
h2_bundle.go:4527: http2: server: error reading preface from client 10.168.38.186:49516: read tcp 172.20.90.94:8000->10.168.38.186:49516: read: connection reset by peer
{"level":"info","time":"2024-03-18T21:20:51.875174756Z","msg":"Auth data is not cached, authenticating client"}

looking into any additional network policy modifications needed based on these entries

@dhorner71
Copy link
Author

we run an alternate pod ip scheme (calico) and found a comment in the admission controller values template (https://github.com/wiz-sec/charts/blob/master/wiz-admission-controller/values.yaml) about the webhook and host network flag. i've set it to true and selected a different port other than 10250. i guess i need to wait for the webhookCert to renew to see if this works.

@Cr0n1c
Copy link

Cr0n1c commented Mar 26, 2024

I am experiencing this on all my clusters as well. Any chance we can get wiz team to look at this?

@dhorner71
Copy link
Author

submitted support ticket https://support.wiz.io/hc/en-us/requests/24387

@dhorner71
Copy link
Author

we are still experiencing these issues on several clusters. on most of our nonprod EKS cluster, we scale to 0 nodes over night and on weekends. these specific clusters cannot successfully hydrate pods in the morning and report in unhealthy state.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants