Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs found during the process of verification #210

Closed
marshtompsxd opened this issue Aug 12, 2023 · 0 comments
Closed

Bugs found during the process of verification #210

marshtompsxd opened this issue Aug 12, 2023 · 0 comments

Comments

@marshtompsxd
Copy link
Collaborator

marshtompsxd commented Aug 12, 2023

ZooKeeper controller bug 1

It is a liveness bug fixed by #217.

If the controller crashes between creating the parent zk node and the child zk node, after the restart, the controller will not be able to create the child zk node again because reconcile_zk_node returns when encountering any error when creating the parent node.

A similar bug is found in the Pravega operator: if the controller crashes before creating the zk node, and meanwhile the developer updates the .spec.replicas, after the restart the controller won't be able to update the stateful set because the zk node doesn't exist. The bug is reported by us and the issue id is 569.

Note that #217 also fixes another two safety bugs (as documented in the PR) though we don't have machine-checkable proof for the safety property for now.

ZooKeeper controller bug 2

It is a liveness bug fixed by #282.

If the user changes the .spec.labels or .spec.annotations in the cr spec, the controller will also update the .spec.selector and .spec.volume_claim_templates of the stateful set. The two fields are immutable so such an update will fail and the fields that are mutable and supposed to be updated will never be updated.

RabbitMQ controller bug 1

It is a safety bug found when developing #211 and fixed in the same PR.

Downscale could happen if we only rely on the validation rule: The user deletes the current deployment and creates a new one with fewer .spec.replicas, which doesn't violate the validation rule. The stateful set created from the new cr may not have been deleted by the garbage collector when the controller tries to update the stateful set with .spec.replicas from the new cr which is smaller than the current replicas of the stateful set, which cause a downscale.

RabbitMQ controller bug 2

It is a liveness bug fixed by #335.

Previously, given a rabbitmq cr foo, the controller will create a client service called foo and a headless service called foo-nodes. The problem happens when the user creates another rabbitmq cr called foo-nodes whose client service will be named foo-nodes, colliding with the headless service of the first one. The reconcile core for the two rabbitmq crs will compete on the same service for the two crs, leading to oscillation problems and liveness violations.

The same bug also exists in the official rabbitmq operator. It is reported by us and the issue id is 1464.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant