Skip to content

Commit

Permalink
Browse files Browse the repository at this point in the history
132129: roachtest: add slow disk perturbation test r=kvoli a=andrewbaptist

This change adds a new set of perturbation tests perturbation/*/slowDisk which tests slow disks. We have see support cases where slow disks can cause cluster level availability outages.

Epic: none

Release note: None

132254: rac2: small allocation optimizations in rangeController r=kvoli a=sumeerbhola

- Scratch for []entryFCState for the new entries being appended.
- Scratch for []tokenWaitingHandleInfo in WaitForEval.
- Accumulate the (send or eval) tokens to deduct and then make one call to the shared tokenCounter. This avoids repeated calls to PhysicalTime() and repeated acquisitons of a possibly contended mutex.

Informs #128033

Epic: CRDB-37515

Release note: None

132313: raft,kvserver: add RawNode.WithBasicProgress and BasicProgress struct r=pav-kv,kvoli a=sumeerbhola

This is a micro-optimization, plus avoids exposing Raft details that the three callers are not interested in. BasicProgress only contains {State, Match, Next} and there is no sorting of peer ids before the visitor is called.

This method consumes 0.6% in kv0 running with RACv2, of which more than half is calls from RACv2 code.

Informs #128033

Epic: CRDB-37515

Release note: None

132315: raft: don't panic in Inflights.Add r=kvoli a=pav-kv

In lazy replication mode, the inflights struct should not enforce the in-flight limit. Instead, this policy is implemented at the higher level (RACv2). The tracker nevertheless correctly tracks all the in-flight state, so that it can correctly switch in/out of the lazy replication.

Part of #128779

Co-authored-by: Andrew Baptist <[email protected]>
Co-authored-by: sumeerbhola <[email protected]>
Co-authored-by: Pavel Kalinnikov <[email protected]>
  • Loading branch information
4 people committed Oct 10, 2024
5 parents 188d9fe + a1f9759 + 2924eeb + 107d0ca + 2ab7de9 commit a82fea1
Show file tree
Hide file tree
Showing 11 changed files with 266 additions and 90 deletions.
33 changes: 29 additions & 4 deletions pkg/cmd/roachtest/roachtestutil/disk_stall.go
Original file line number Diff line number Diff line change
Expand Up @@ -23,11 +23,24 @@ type DiskStaller interface {
Setup(ctx context.Context)
Cleanup(ctx context.Context)
Stall(ctx context.Context, nodes option.NodeListOption)
Slow(ctx context.Context, nodes option.NodeListOption, bytesPerSecond int)
Unstall(ctx context.Context, nodes option.NodeListOption)
DataDir() string
LogDir() string
}

type NoopDiskStaller struct{}

var _ DiskStaller = NoopDiskStaller{}

func (n NoopDiskStaller) Cleanup(ctx context.Context) {}
func (n NoopDiskStaller) DataDir() string { return "{store-dir}" }
func (n NoopDiskStaller) LogDir() string { return "logs" }
func (n NoopDiskStaller) Setup(ctx context.Context) {}
func (n NoopDiskStaller) Slow(_ context.Context, _ option.NodeListOption, _ int) {}
func (n NoopDiskStaller) Stall(_ context.Context, _ option.NodeListOption) {}
func (n NoopDiskStaller) Unstall(_ context.Context, _ option.NodeListOption) {}

type Fataler interface {
Fatal(args ...interface{})
Fatalf(format string, args ...interface{})
Expand Down Expand Up @@ -68,15 +81,20 @@ func (s *cgroupDiskStaller) Setup(ctx context.Context) {
func (s *cgroupDiskStaller) Cleanup(ctx context.Context) {}

func (s *cgroupDiskStaller) Stall(ctx context.Context, nodes option.NodeListOption) {
// NB: I don't understand why, but attempting to set a bytesPerSecond={0,1}
// results in Invalid argument from the io.max cgroupv2 API.
s.Slow(ctx, nodes, 4)
}

func (s *cgroupDiskStaller) Slow(
ctx context.Context, nodes option.NodeListOption, bytesPerSecond int,
) {
// Shuffle the order of read and write stall initiation.
rand.Shuffle(len(s.readOrWrite), func(i, j int) {
s.readOrWrite[i], s.readOrWrite[j] = s.readOrWrite[j], s.readOrWrite[i]
})
for _, rw := range s.readOrWrite {
// NB: I don't understand why, but attempting to set a
// bytesPerSecond={0,1} results in Invalid argument from the io.max
// cgroupv2 API.
if err := s.setThroughput(ctx, nodes, rw, throughput{limited: true, bytesPerSecond: 4}); err != nil {
if err := s.setThroughput(ctx, nodes, rw, throughput{limited: true, bytesPerSecond: bytesPerSecond}); err != nil {
s.f.Fatal(err)
}
}
Expand Down Expand Up @@ -225,6 +243,13 @@ func (s *dmsetupDiskStaller) Stall(ctx context.Context, nodes option.NodeListOpt
s.c.Run(ctx, option.WithNodes(nodes), `sudo dmsetup suspend --noflush --nolockfs data1`)
}

func (s *dmsetupDiskStaller) Slow(
ctx context.Context, nodes option.NodeListOption, bytesPerSecond int,
) {
// TODO(baptist): Consider https://github.com/kawamuray/ddi.
s.f.Fatal("Slow is not supported for dmsetupDiskStaller")
}

func (s *dmsetupDiskStaller) Unstall(ctx context.Context, nodes option.NodeListOption) {
s.c.Run(ctx, option.WithNodes(nodes), `sudo dmsetup resume data1`)
}
Expand Down
Loading

0 comments on commit a82fea1

Please sign in to comment.