Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug Report: Failure in first call to PRS can lead to the cluster having no primary #17710

Open
GuptaManan100 opened this issue Feb 6, 2025 · 0 comments · May be fixed by #17870
Open

Bug Report: Failure in first call to PRS can lead to the cluster having no primary #17710

GuptaManan100 opened this issue Feb 6, 2025 · 0 comments · May be fixed by #17870

Comments

@GuptaManan100
Copy link
Member

GuptaManan100 commented Feb 6, 2025

Overview of the Issue

If PRS fails during the initialisation of a shard, and the failure happens while promoting the primary while it is writing to the topo-server such that the write succeeds, but fails with a timeout, then the tablet won't change its internal display state to a primary tablet. VTOrc sees this failure and tries to fix this by calling UndoDemotePrimary, but that doesn't change the type of the tablet to PRIMARY. It only fixes the mysql level settings and this causes the cluster to not have a primary at all.

Reproduction Steps

  1. Run PRS, and simulate a failure that happens before new primary tablet has promoted itself.

Binary Version

main

Operating System and Environment details

-

Log Fragments

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant