K8SPXC-1534 | [bug] fix issue with inconsistent secret reconciliation #1945
+66
−31
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
CHANGE DESCRIPTION
Problem:
Creating a new PXC cluster can sometimes result in pods that go into
CreateContainerConfigError
state and never recover from in.STR:
.spec.secretsName
We expect that the
cluster1-secrets
is updated with all other users (6 users), and a copy of this secret namedinternal-<clusterName>
. However, it is observed that while thecluster1-secrets
is reconciled, theinternal-
secret still only contains themonitor
user from the originally created secret, and is never updated or reconciled with thecluster1-secrets
.Cause:
With respect to the Secret reconciliation, there are 2 steps executed in order:
.spec.secretsName
(includes validating and filling out defaults) [1]internal-[clusterName]
and mount it onto pods. [2]The result of (1) is written back to KubeAPI and in step (2) we again read this secret from the kubeAPI to create a copy. While this logic looks fine overall, it assumes that step (2) reads a consistent result from the kubeAPI which may not be the case. This means that step (2) can create a copy based on an outdated (non-reconciled) secret, which is what leads to the
CreateContainerConfigError
.Since the controller does not watch this secret, the kube client cache may not be immediately updated, resulting in an inconsistent read in step (2).
Solution:
This PR introduces the following changes to address this issue:
CHECKLIST
Jira
Needs Doc
) and QA (Needs QA
)?Tests
compare/*-oc.yml
)?Config/Logging/Testability