Clients potentially miss etcd
spec roll outs
#985
Labels
area/control-plane
Control plane related
kind/bug
Bug
status/accepted
Issue was accepted as something we need to work on
Milestone
How to categorize this issue?
/area control-plane
/kind bug
What happened:
When adjusting the spec of an
etcd
resource, Gardener waits for the change to be rolled out completely by consulting thestatus
(see CheckEtcdObject).Since Etcd-Druid changed to a multi-stage status update (probably in v0.23.0), those checks have become racy.
Please consider the following example (only relevant steps are listed):
Secret
reference is updated.Steps in
reconcileSpec
:StatefulSet
.status.observedGeneration
.status.lastOperation
andstatus.lastErrors
gardener.cloud/operation
annotation.Steps in
reconcileStatus
:status.ready
and similar fields based on backingStatefulset
.Between step 5. and 6. is no hint in the resource that point towards an ongoing rollout of a spec change. Controllers/clients might accidentally continue their operations. This lately happened in the scope of credentials rotation for
gardener/gardener
where the new Peer CA Bundle was not completely rolled out to all replicas, but Gardener already continued with the rotation which led to a certificate mismatch in the etcd cluster.What you expected to happen:
Clients to know when a spec rollout is successfully finished.
/cc @shreyas-s-rao @LucaBernstein @dguendisch @hendrikKahl
Environment:
kubectl version
):The text was updated successfully, but these errors were encountered: