-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
raft: don't propose config changes with regressed lead support #132150
raft: don't propose config changes with regressed lead support #132150
Conversation
2f4d137
to
e0d641f
Compare
e0d641f
to
f4b2404
Compare
@arulajmani I've rebased this on master and added unit tests + datadriven tests, so this should be good for a review. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 12 of 12 files at r5, 4 of 4 files at r6, 13 of 13 files at r7, all commit messages.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @nvanbenschoten)
pkg/raft/testdata/confchange_fortification_safety.txt
line 114 at r7 (raw file):
# Try to demote v2. This should be rejected because v1's lead support under the # current config has not caught up to the previous config.
nit: consider printing out the fortification-state
to make this easy to glean.
pkg/raft/testdata/confchange_fortification_safety.txt
line 177 at r7 (raw file):
4->1 MsgAppResp Term:1 Log:0/5 Commit:4 # Try to demote v2 again. This time, the proposal should be allowed.
nit: maybe fortification-state
here as well.
pkg/raft/tracker/fortificationtracker_test.go
line 547 at r7 (raw file):
{ afterConfigChange: func(sl *mockStoreLiveness, ft *FortificationTracker) { // r4 fortified, but support still lagging.
nit: is this meaningfully different than the previous test case?
f4b2404
to
0b6cb2a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TFTR!
bors r=arulajmani
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @arulajmani)
pkg/raft/testdata/confchange_fortification_safety.txt
line 114 at r7 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: consider printing out the
fortification-state
to make this easy to glean.
Nice idea. Done.
pkg/raft/testdata/confchange_fortification_safety.txt
line 177 at r7 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: maybe
fortification-state
here as well.
Done.
pkg/raft/tracker/fortificationtracker_test.go
line 547 at r7 (raw file):
Previously, arulajmani (Arul Ajmani) wrote…
nit: is this meaningfully different than the previous test case?
Not meaningfully. Removed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 11 of 11 files at r8, 4 of 4 files at r9, 6 of 6 files at r10, all commit messages.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten)
Merge conflict. |
The test was only adding a single node, so the name was misleading. Epic: None Release note: None
This commit updates the confchange_v2_add_single_implicit test to start with 2 nodes so that the addition of a third voter does not regress the lead support and trip up the automatic transition out of the joint configuration. Epic: None Release note: None
Fixes cockroachdb#125355. If the lead support has not caught up from the previous configuration, we must not propose another configuration change. Doing so would compromise the lead support promise made by the previous configuration and used as an expiration of a leader lease. Instead, we wait for the lead support under the current configuration to catch up to the maximum lead support reached under the previous config. If the lead support is never able to catch up, the leader will eventually step down due to CheckQuorum. As a reminder, if we reject a config change, it will be reproposed again shortly, at which point the lead support will likely have caught up. As a second reminder, we fortify both voters and learners. We also always add new replicas as learners before promoting them to voters. The combination here means that a leader will commonly already be supported a voter when it is promoted added to the lead support computation. Release note: None
0b6cb2a
to
8105ddb
Compare
bors r+ |
Fixes #125355.
If the lead support has not caught up from the previous configuration, we must not propose another configuration change. Doing so would compromise the lead support promise made by the previous configuration and used as an expiration of a leader lease. Instead, we wait for the lead support under the current configuration to catch up to the maximum lead support reached under the previous config. If the lead support is never able to catch up, the leader will eventually step down due to CheckQuorum.
As a reminder, if we reject a config change, it will be reproposed again shortly, at which point the lead support will likely have caught up.
As a second reminder, we fortify both voters and learners. We also always add new replicas as learners before promoting them to voters. The combination here means that a leader will commonly already be supported a voter when it is promoted added to the lead support computation.
Release note: None