-
Notifications
You must be signed in to change notification settings - Fork 5
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of the safety property that stateful set never scales down in rabbitmq #223
Comments
Invariant 1This invariant is exactly the high-level property. The proof of this invariant is where we talk about update Invariant 2 is to show that every stateful set update request must specify the resource version because stateful set is allowed to update unconditionally. If resource version can be none, we can't rule out invalid update request through resource version. Invariant 3 is quite obvious. |
Invariant 2 and 3Since update request message for stateful set (at least in rabbitmq controller) is derived from the response of get request (and the response should be an ok one), we rely on some properties of matched get response message. For invariant 2, we need invariant 5 and 6. The necessity of invariant 5 is self-explanatory. But invariant 5 alone is not enough, how can we know that when updating the stateful set, the response is a ok get response? We only know it's okay since it enters the update branch. We need invariant 6 to claim it's a get response; actually, an invariant that the request at For invariant 3, we need invariants 2, 4, 6, 7. Invariant 3 mainly talks about the replicas of the object inside the update stateful set request. We know that
|
Invariant 4To do the induction for this invariant, we mainly proves how the invariant holds if the stateful set in etcd is not changed (neither created not updated) from the previous state. If the state changes, we rely on the property of create/update request message which is proved in another lemma. If the stateful set in ectd is the same as previous state (they don't have it or have the same one), we consider the cr key in etcd, scheduled_reconciles and ongoing_reconciles. If cr key doesn't exist in previous state, this means that it must not be the one the stateful set object's owner reference points to (thanks to invariant 11). If the previous state doesn't schedule reconcile for cr key, it means cr key exists in etcd in previous state, and the reconciled object is the same as that cr. Similarly, we can prove the order for triggering cr. |
Invariant 5 and 7The proof of these invariants is straightforward. |
Invariant 6We first prove that the pending request at this step is get stateful set request, which means its key is the key of stateful set. Then we prove a generic lemma for all ok get response message, that, if it matches one request pending at the controller side, the key of the object inside it should be the same as the key of that pending request. |
Invariant 8This invariant says that object in every valid create or update stateful set request message should satisfy the replicas order. As mentioned before, the replicas of that object is the same as that of the cr in reconcile. So we need:
|
Invariant 9We prove three invariants (each two cr of the three) and integrate. them into one. We can't simply prove two orders, because the scheduled cr may not exist so it can't serve as an intermediary. I move this invariant to a common module and require the Invariant 10 and 11Due to our modifications to the implementation, it's straightforward to prove invariant 10. Then we can rely on it to prove invariant 11. Since etcd itself won't set the owner references of the object, we are already there. |
Prove the property that stateful set never scales down in rabbitmq controller. Details can be found in #223 . To supplement is, the concrete invariants in the code are as follows: - Invariant 1: `stateful_set_not_scaled_down` - Invariant 2: `object_in_sts_update_request_has_smaller_rv_than_etcd` - Invariant 3: `replicas_of_stateful_set_update_request_msg_is_no_smaller_than_etcd` - Invariant 4: `replicas_of_etcd_stateful_set_satisfies_order` - Invariant 5: `object_in_ok_get_response_has_smaller_rv_than_etcd` - Invariant 6: `response_at_after_get_stateful_set_step_is_sts_get_response` - Invariant 7: `object_in_ok_get_resp_is_same_as_etcd_with_same_rv` - Invariant 7: `replicas_of_stateful_set_create_or_update_request_msg_satisfies_order` - Invariant 8: `etcd_and_scheduled_and_triggering_cr_in_correct_order` - Invariant 9: `object_in_every_create_or_update_request_msg_only_has_valid_owner_references` - Invariant 10: `every_owner_ref_of_every_object_in_etcd_has_different_uid_from_uid_counter` Each corresponds to a spec function. --------- Signed-off-by: Wenjie Ma <[email protected]>
This issue serves as an explanation of #220 .
Context and Problem
Implementation
The current version of RabbitMQ controller doesn't support scaling down; directly reducing the replicas of the stateful set can lead to quorum loss and even data loss. More details in this issue. Thus, we also decide to disallow downscale for the stateful set in our implementation. We first change the rabbitmq controller implementation to wait for the garbage collector to delete the old stateful set. This, on top of the validation rule can guarantee that updating the deployment won't decrease the replicas. More details in PR #211 .
Proof
After implementing rabbitmq controller in the aforementioned way, we want to formally prove that in all possible executions in our current state machine, stateful set actually never gets updated with a lower replicas.
Formalism
To prove the safety property about stateful set, we need to first specify what the property is.
Previously, we planned to use
Message
to describe the possible update/deletion/creation actions, and also specify the relevant properties. However, it is better not to includeMessage
in the description the high-level safety property becauseMessage
is just a tool and a detail of the system. For update action, one way to circumvent usingMessage
is to talk about the previous and current state: an object being updated means that it exists in both states but changes in current state. As for the property in this issue, we can specify it in this way:The property will be
forall |rabbitmq| always(stateful_set_not_scaled_down(rabbitmq))
. We don't have to require that the stateful set objects are different in the two states, because if it stays unchanged, the property won't be harmed.Proof
Overview
The proof of this property mainly consists of reasoning about the replicas of etcd objects, create and update request messages, scheduled custom resource object and custom resource objects in reconcile. I will discuss the invariants we prove, what they are and what they are used for in the rest of this PR. I will number all of those invariants for better understanding.
Replicas Order
First, we define a replicas order for stateful set object:
obj
can be the etcd statful set object, the object in create/update stateful set object. We define this order because, the replicas in the update request is derived from the triggering cr; so, in order to show the updated replicas is no smaller than the original one, we need to show that the original one (the one stored in etcd)'s replicas is no larger than that of triggering cr.obj.metadata.owner_references_only_contains(s.triggering_cr_of(key).controller_owner_ref())
here is to ensure that thecr
is still the one that creates the stateful set object. The left two comparison is to assist the last one because when the state moves to the next state, the triggering_cr may be assigned (inserted or updated).Invariants
All Invariants
AfterGetStatefulSet
step, if ok, should have its object with stateful set key as its reference.replicas_of(s.resource_obj_of(cr_key))
<=replicas_of(s.reconcile_scheduled_obj_of(cr_key))
<=replicas_of(s.triggering_cr_of(cr_key))
. We prove the order of each two of them, if both cr exist and have the same uid.These invariants also use some previously proved invariants, which I omit in this issue.
Dependencies
The following graph demonstrates the dependencies of all the aforementioned invariants.
Generality
When proving properties (invariants in this issue), we should consider whether they are specific to the controller, or can directly apply to any controller, or can apply to other controllers if we add some restrictions to the controller.
The text was updated successfully, but these errors were encountered: