Read performance degradation during compaction workload #4109

rjl493456442 · 2024-10-25T03:34:52Z

We (go-ethereum) are experiencing a significant degradation in database read performance
whenever a compaction process is initiated.

Version: github.com/cockroachdb/pebble v1.1.2
Hardware: 32GB memory, Samsung 980Pro 2TB SSD, 28 Core i7-14700K

The database configuration is shown as below:

	opt := &pebble.Options{
		// Pebble has a single combined cache area and the write
		// buffers are taken from this too. Assign all available
		// memory allowance for cache.
		Cache:        pebble.NewCache(int64(2 * 1024 * 1024 * 1024)),
		MaxOpenFiles: 524288,

		// The size of memory table(as well as the write buffer).
		// Note, there may have more than two memory tables in the system.
		MemTableSize: uint64(512 * 1024 * 1024),

		MemTableStopWritesThreshold: 2,

		// The default compaction concurrency(1 thread),
		// Here use all available CPUs for faster compaction.
		MaxConcurrentCompactions: runtime.NumCPU,

		// Per-level options. Options for at least one level must be specified. The
		// options for the last level are used for all subsequent levels.
		Levels: []pebble.LevelOptions{
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
			{TargetFileSize: 2 * 1024 * 1024, FilterPolicy: bloom.FilterPolicy(10)},
		},
	}

The read performance without the compaction workload is stable. The average time to
load a single data block (~4KB) from disk (not in cache) during normal read operations
is 40µs. (This data was obtained by injecting debug code into Pebble.)

However, when the compaction process starts, the average time to load a single data
block (~4KB) from disk (not in cache) increases to 80µs, roughly 2x slower.

Meanwhile, the average time to load a single data block (~4KB) during compaction is
significantly faster, around 8µs. I suspect this discrepancy may be related to the following
factors:

Files involved in compaction are opened with the FADV_SEQUENTIAL flag, which
optimizes the OS’s write-ahead mechanism.
Data blocks corresponding to these files in compaction are likely to be found in the OS
page cache, whereas normal reads often target data in the bottom-most level, where
blocks are less likely to be cached. Although I have no evidence to prove it

What I don't really understand is why the data block loading from disk performance could
be 2x slower when compaction is actively running?

At first I suspected that when there are too many concurrent reads (compaction is concurrent,
so there may be many concurrent disk reads in the system), the file reading efficiency will
decrease. However, only the data loading in normal Get slowed down, not compaction.

And after I changed all concurrency to single-threaded sequential reading, the same phenomenon
still occurred.

Do you have any insights about this weird phenomenon and potentially any suggestion to
address it?

The branch I used for debugging: https://github.com/rjl493456442/pebble/commits/gary-debug/

Jira issue: PEBBLE-286

The text was updated successfully, but these errors were encountered:

rjl493456442 · 2024-10-28T03:07:18Z

Gental ping @jbowens

jbowens · 2024-10-28T15:31:46Z

		// The default compaction concurrency(1 thread),
		// Here use all available CPUs for faster compaction.
		MaxConcurrentCompactions: runtime.NumCPU,

One thing I'd like to clarify is that this configures the number of concurrent compactions, not the concurrency used within a single compaction. Pebble may run multiple compactions concurrently in different parts of the LSM (ie, different levels or non-overlapping keyspaces of the same levels). Each one of these compactions by default can make use of up to 2 threads. One thread reads and performs most of the CPU work of the compaction, and the other thread performs the write syscalls to write output sstables.

When you're observing this interference, is there a single compaction running or are there multiple concurrent compactions? How is CPU utilization under normal circumstances versus during one of these compactions?

I agree that compactions' low read latency is probably explained by the sequential read pattern and the use of FADV_SEQUENTIAL, ensuring that often the next block has already been read from disk.

I don't have an explanation for why foreground workload block read latency would double. We sometimes see higher CPU utilization causing an increase in Go scheduling latency, and that Go scheduling latency is significant enough to show up in block read latencies. But that should uniformly affect compaction reads and iterator reads. Maybe the foreground workload's block reads are being queued behind the compaction's readahead reads? I.e, compaction reads using FADV_SEQUENTIAL result in a reading large regions of a file, and now the length of the I/O queue doubles. The actual linux block io reads triggered by compactions may be suffering the same underlying IO latency, but since FADV_SEQUENTIAL is performing those reads ahead of time, the latency is observed by the Pebble process is much smaller.

Digging into Linux I/O metrics might shed some light—like looking at block device I/O latency. Some form of compaction pacing #687 would help.

If you're scheduling many concurrent compactions, reducing the max compaction concurrency may help as well, reducing the spikiness of the compaction workload.

rjl493456442 · 2024-11-12T06:41:24Z

When you're observing this interference, is there a single compaction running or are there multiple concurrent compactions? How is CPU utilization under normal circumstances versus during one of these compactions?

This phenomenon can be noticed in single compaction mode and concurrent compaction mode. In both cases, the CPU utilization is pretty low.

github-project-automation bot added this to Storage Oct 25, 2024

github-project-automation bot moved this to Incoming in Storage Oct 25, 2024

blathers-crl bot added A-storage T-storage labels Oct 25, 2024

itsbilal moved this from Incoming to Community in Storage Oct 29, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Read performance degradation during compaction workload #4109

Read performance degradation during compaction workload #4109

rjl493456442 commented Oct 25, 2024 •

edited

Loading

rjl493456442 commented Oct 28, 2024

jbowens commented Oct 28, 2024

rjl493456442 commented Nov 12, 2024

Read performance degradation during compaction workload #4109

Read performance degradation during compaction workload #4109

Comments

rjl493456442 commented Oct 25, 2024 • edited Loading

rjl493456442 commented Oct 28, 2024

jbowens commented Oct 28, 2024

rjl493456442 commented Nov 12, 2024

rjl493456442 commented Oct 25, 2024 •

edited

Loading