Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
132150: raft: don't propose config changes with regressed lead support r=nvanbenschoten a=nvanbenschoten

Fixes #125355.

If the lead support has not caught up from the previous configuration, we must not propose another configuration change. Doing so would compromise the lead support promise made by the previous configuration and used as an expiration of a leader lease. Instead, we wait for the lead support under the current configuration to catch up to the maximum lead support reached under the previous config. If the lead support is never able to catch up, the leader will eventually step down due to CheckQuorum.

As a reminder, if we reject a config change, it will be reproposed again shortly, at which point the lead support will likely have caught up.

As a second reminder, we fortify both voters and learners. We also always add new replicas as learners before promoting them to voters. The combination here means that a leader will commonly already be supported a voter when it is promoted added to the lead support computation.

Release note: None

132580: raft: periodically broadcast MsgDeFortifyLeader  r=nvanbenschoten a=arulajmani

See individual commits.

132682: storage: integrate columnar blocks, disabled by default r=jbowens a=jbowens

This commit defines a colblk.KeySchema for use with CockroachDB keys, bumps the
format major version in 24.3 to FormatColumnarBlocks and gates use of columnar
blocks behind a new `storage.columnar_blocks.enabled` cluster setting.

Epic: none
Release note: none

Co-authored-by: Nathan VanBenschoten <[email protected]>
Co-authored-by: Arul Ajmani <[email protected]>
Co-authored-by: Jackson Owens <[email protected]>
  • Loading branch information
4 people committed Oct 16, 2024
4 parents cbead84 + 8105ddb + 806e1c5 + d56ed5c commit 88faab9
Show file tree
Hide file tree
Showing 22 changed files with 2,176 additions and 178 deletions.
1 change: 1 addition & 0 deletions docs/generated/settings/settings-for-tenants.txt
Original file line number Diff line number Diff line change
Expand Up @@ -385,6 +385,7 @@ sql.ttl.job.enabled boolean true whether the TTL job is enabled application
sql.txn.read_committed_isolation.enabled boolean true set to true to allow transactions to use the READ COMMITTED isolation level if specified by BEGIN/SET commands application
sql.txn.repeatable_read_isolation.enabled (alias: sql.txn.snapshot_isolation.enabled) boolean false set to true to allow transactions to use the REPEATABLE READ isolation level if specified by BEGIN/SET commands application
sql.txn_fingerprint_id_cache.capacity integer 100 the maximum number of txn fingerprint IDs stored application
storage.columnar_blocks.enabled boolean false set to true to enable columnar-blocks to store KVs in a columnar format system-visible
storage.ingestion.value_blocks.enabled boolean true set to true to enable writing of value blocks in ingestion sstables application
storage.max_sync_duration duration 20s maximum duration for disk operations; any operations that take longer than this setting trigger a warning log entry or process crash system-visible
storage.max_sync_duration.fatal.enabled boolean true if true, fatal the process when a disk operation exceeds storage.max_sync_duration application
Expand Down
1 change: 1 addition & 0 deletions docs/generated/settings/settings.html
Original file line number Diff line number Diff line change
Expand Up @@ -339,6 +339,7 @@
<tr><td><div id="setting-sql-txn-read-committed-isolation-enabled" class="anchored"><code>sql.txn.read_committed_isolation.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>set to true to allow transactions to use the READ COMMITTED isolation level if specified by BEGIN/SET commands</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-sql-txn-snapshot-isolation-enabled" class="anchored"><code>sql.txn.repeatable_read_isolation.enabled<br />(alias: sql.txn.snapshot_isolation.enabled)</code></div></td><td>boolean</td><td><code>false</code></td><td>set to true to allow transactions to use the REPEATABLE READ isolation level if specified by BEGIN/SET commands</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-sql-txn-fingerprint-id-cache-capacity" class="anchored"><code>sql.txn_fingerprint_id_cache.capacity</code></div></td><td>integer</td><td><code>100</code></td><td>the maximum number of txn fingerprint IDs stored</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-storage-columnar-blocks-enabled" class="anchored"><code>storage.columnar_blocks.enabled</code></div></td><td>boolean</td><td><code>false</code></td><td>set to true to enable columnar-blocks to store KVs in a columnar format</td><td>Dedicated/Self-hosted (read-write); Serverless (read-only)</td></tr>
<tr><td><div id="setting-storage-experimental-eventually-file-only-snapshots-enabled" class="anchored"><code>storage.experimental.eventually_file_only_snapshots.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>set to false to disable eventually-file-only-snapshots (kv.snapshot_receiver.excise.enabled must also be false)</td><td>Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-storage-ingest-split-enabled" class="anchored"><code>storage.ingest_split.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>set to false to disable ingest-time splitting that lowers write-amplification</td><td>Dedicated/Self-Hosted</td></tr>
<tr><td><div id="setting-storage-ingestion-value-blocks-enabled" class="anchored"><code>storage.ingestion.value_blocks.enabled</code></div></td><td>boolean</td><td><code>true</code></td><td>set to true to enable writing of value blocks in ingestion sstables</td><td>Serverless/Dedicated/Self-Hosted</td></tr>
Expand Down
2 changes: 1 addition & 1 deletion go.mod
Original file line number Diff line number Diff line change
Expand Up @@ -128,6 +128,7 @@ require (
github.com/cockroachdb/cmux v0.0.0-20170110192607-30d10be49292
github.com/cockroachdb/cockroach-go/v2 v2.3.7
github.com/cockroachdb/crlfmt v0.0.0-20221214225007-b2fc5c302548
github.com/cockroachdb/crlib v0.0.0-20241015224233-894974b3ad94
github.com/cockroachdb/datadriven v1.0.3-0.20240530155848-7682d40af056
github.com/cockroachdb/errors v1.11.3
github.com/cockroachdb/fifo v0.0.0-20240606204812-0bbfbd93a7ce
Expand Down Expand Up @@ -303,7 +304,6 @@ require (
github.com/cespare/xxhash/v2 v2.2.0 // indirect
github.com/charmbracelet/bubbletea v0.23.1 // indirect
github.com/charmbracelet/lipgloss v0.6.0 // indirect
github.com/cockroachdb/crlib v0.0.0-20241015224233-894974b3ad94 // indirect
github.com/cockroachdb/swiss v0.0.0-20240612210725-f4de07ae6964 // indirect
github.com/danieljoos/wincred v1.1.2 // indirect
github.com/decred/dcrd/dcrec/secp256k1/v4 v4.3.0 // indirect
Expand Down
12 changes: 12 additions & 0 deletions pkg/raft/confchange/validate.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ type ValidationContext struct {
CurConfig *quorum.Config
Applied uint64
PendingConfIndex uint64
LeadSupportSafe bool
DisableValidationAgainstCurConfig bool
}

Expand All @@ -35,6 +36,17 @@ func ValidateProp(ctx ValidationContext, cc pb.ConfChangeV2) error {
ctx.PendingConfIndex, ctx.Applied)
}

// If the lead support has not caught up from the previous configuration, we
// must not propose another configuration change. Doing so would compromise
// the lead support promise made by the previous configuration and used as an
// expiration of a leader lease. Instead, we wait for the lead support under
// the current configuration to catch up to the maximum lead support reached
// under the previous config. If the lead support is never able to catch up,
// the leader will eventually step down due to CheckQuorum.
if !ctx.LeadSupportSafe {
return errors.Errorf("lead support has not caught up to previous configuration")
}

// The DisableValidationAgainstCurConfig flag allows disabling config change
// constraints that are guaranteed by the upper state machine layer (incorrect
// ones will apply as no-ops). See raft.Config.DisableConfChangeValidation for
Expand Down
24 changes: 24 additions & 0 deletions pkg/raft/confchange/validate_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -40,6 +40,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
},
cc: changeNormal,
},
Expand All @@ -49,6 +50,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 10,
LeadSupportSafe: true,
},
cc: changeNormal,
},
Expand All @@ -58,16 +60,29 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 15,
LeadSupportSafe: true,
},
cc: changeNormal,
expErr: "possible unapplied conf change at index 15 \\(applied to 10\\)",
},
{
name: "invalid, unsafe lead support",
ctx: ValidationContext{
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: false,
},
cc: changeNormal,
expErr: "lead support has not caught up to previous configuration",
},
{
name: "invalid, already in joint state",
ctx: ValidationContext{
CurConfig: configJoint,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
},
cc: changeNormal,
expErr: "must transition out of joint config first",
Expand All @@ -78,6 +93,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
},
cc: changeLeaveJoint,
expErr: "not in joint state; refusing empty conf change",
Expand All @@ -88,6 +104,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configJoint,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
DisableValidationAgainstCurConfig: true,
},
cc: changeNormal,
Expand All @@ -98,6 +115,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
DisableValidationAgainstCurConfig: true,
},
cc: changeLeaveJoint,
Expand All @@ -108,6 +126,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
},
cc: pb.ConfChangeV2{Changes: []pb.ConfChangeSingle{
{Type: pb.ConfChangeRemoveNode, NodeID: 3},
Expand All @@ -121,6 +140,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
},
cc: pb.ConfChangeV2{Changes: []pb.ConfChangeSingle{
{Type: pb.ConfChangeRemoveNode, NodeID: 3},
Expand All @@ -133,6 +153,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
},
cc: pb.ConfChangeV2{Changes: []pb.ConfChangeSingle{
{Type: pb.ConfChangeRemoveNode, NodeID: 3},
Expand All @@ -146,6 +167,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
},
cc: pb.ConfChangeV2{Changes: []pb.ConfChangeSingle{
{Type: pb.ConfChangeAddLearnerNode, NodeID: 3},
Expand All @@ -159,6 +181,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
DisableValidationAgainstCurConfig: true,
},
cc: pb.ConfChangeV2{Changes: []pb.ConfChangeSingle{
Expand All @@ -172,6 +195,7 @@ func TestValidateProp(t *testing.T) {
CurConfig: configNormal,
Applied: 10,
PendingConfIndex: 5,
LeadSupportSafe: true,
DisableValidationAgainstCurConfig: true,
},
cc: pb.ConfChangeV2{Changes: []pb.ConfChangeSingle{
Expand Down
52 changes: 45 additions & 7 deletions pkg/raft/raft.go
Original file line number Diff line number Diff line change
Expand Up @@ -391,14 +391,21 @@ type raft struct {
// term changes.
uncommittedSize entryPayloadSize

// number of ticks since it reached last electionTimeout when it is leader
// or candidate.
// number of ticks since it reached last electionTimeout or received a
// valid message from current leader when it is a follower.
// electionElapsed is the number of ticks since we last reached the
// electionTimeout. Tracked by both leaders and followers alike. Additionally,
// followers also reset this field whenever they receive a valid message from
// the current leader or if the leader is fortified when ticked.
electionElapsed int

// number of ticks since it reached last heartbeatTimeout.
// only leader keeps heartbeatElapsed.
// heartbeatElapsed is the number of ticks since we last reached the
// heartbeatTimeout. Leaders use this field to keep track of when they should
// broadcast fortification attempts, and in a pre-fortification world,
// heartbeats. Followers use this field to keep track of when they should
// broadcast de-fortification messages to peers.
//
// TODO(arul): consider renaming these to "fortifyElapsed" given heartbeats
// are no longer the first class concept they used to be pre-leader
// fortification.
heartbeatElapsed int

maxInflight int
Expand Down Expand Up @@ -870,6 +877,26 @@ func (r *raft) bcastFortify() {
})
}

// bcastDeFortify attempts to de-fortify the current peer's last (post restart)
// leadership term by sending an RPC to all peers (including itself).
func (r *raft) bcastDeFortify() {
assertTrue(r.state != pb.StateLeader, "leaders should not de-fortify without stepping down")
assertTrue(r.fortificationTracker.CanDefortify(), "unsafe to de-fortify")

r.trk.Visit(func(id pb.PeerID, _ *tracker.Progress) {
r.sendDeFortify(id)
})
}

// shouldBCastDeFortify returns whether we should attempt to broadcast a
// MsgDeFortifyLeader to all peers or not.
func (r *raft) shouldBcastDeFortify() bool {
assertTrue(r.state != pb.StateLeader, "leaders should not be de-fortifying without stepping down")
// TODO(arul): expand this condition to ensure a new leader has committed an
// entry.
return r.fortificationTracker.FortificationEnabledForTerm() && r.fortificationTracker.CanDefortify()
}

// maybeUnpauseAndBcastAppend unpauses and attempts to send an MsgApp to all the
// followers that provide store liveness support. If there is no store liveness
// support, we skip unpausing and sending MsgApp because the message is likely
Expand Down Expand Up @@ -1044,6 +1071,14 @@ func (r *raft) appendEntry(es ...pb.Entry) (accepted bool) {
func (r *raft) tickElection() {
assertTrue(r.state != pb.StateLeader, "tickElection called by leader")

r.heartbeatElapsed++
if r.heartbeatElapsed >= r.heartbeatTimeout {
r.heartbeatElapsed = 0
if r.shouldBcastDeFortify() {
r.bcastDeFortify()
}
}

if r.leadEpoch != 0 {
if r.supportingFortifiedLeader() {
// There's a fortified leader and we're supporting it. Reset the
Expand Down Expand Up @@ -1634,6 +1669,7 @@ func stepLeader(r *raft, m pb.Message) error {
CurConfig: &r.config,
Applied: r.raftLog.applied,
PendingConfIndex: r.pendingConfIndex,
LeadSupportSafe: r.fortificationTracker.ConfigChangeSafe(),
DisableValidationAgainstCurConfig: r.disableConfChangeValidation,
}
if err := confchange.ValidateProp(ccCtx, cc.AsV2()); err != nil {
Expand Down Expand Up @@ -2258,7 +2294,9 @@ func (r *raft) handleFortifyResp(m pb.Message) {

func (r *raft) handleDeFortify(m pb.Message) {
assertTrue(r.state != pb.StateLeader, "leaders should locally de-fortify without sending a message")
assertTrue(r.lead == m.From, "only the leader should send de-fortification requests")
assertTrue(r.lead == None || r.lead == m.From,
"only the leader should send de-fortification requests",
)

if r.leadEpoch == 0 {
r.logger.Debugf("%d is not fortifying %d; de-fortification is a no-op", r.id, m.From)
Expand Down
2 changes: 1 addition & 1 deletion pkg/raft/testdata/async_storage_writes_append_aba_race.txt
Original file line number Diff line number Diff line change
Expand Up @@ -497,7 +497,7 @@ stabilize 1
AppendThread->1 MsgStorageAppendResp Term:0 Log:2/12
INFO mark (term,index)=(2,12) mismatched the last accepted term 3 in unstable log; ignoring
AppendThread->1 MsgStorageAppendResp Term:0 Log:3/12

raft-log 1
----
1/11 EntryNormal ""
Expand Down
Loading

0 comments on commit 88faab9

Please sign in to comment.