Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

db: re-enable and limit concurrency of multilevel compactions #4139

Open
itsbilal opened this issue Nov 6, 2024 · 5 comments
Open

db: re-enable and limit concurrency of multilevel compactions #4139

itsbilal opened this issue Nov 6, 2024 · 5 comments

Comments

@itsbilal
Copy link
Member

itsbilal commented Nov 6, 2024

On a test cluster, we observed that having all compaction slots be taken up by relatively long-running (but not excessively long running) multilevel compactions can make the store very susceptible to transient increases in L0 sublevels. This is concerning because L0 sublevel count is used as a tell-all measure of store overload, and even a temporary spike can have undesirable side-effects in CockroachDB.

For instance, these three ML compactions all overlapped around 08:35:30-08:36:00, resulting in an L0 sublevel spike from 2 to 8 before the next (L0->LBase) compaction could be scheduled:

I241106 08:35:59.319548 693190905 3@pebble/event.go:776 ⋮ [n5,s5,pebble] 2825380  [JOB 552595] compacted(default) multilevel L3 [7131468] (14MB) Score=1.45 + L4 [7130082 7130092 7130097 7130121 713
0126 7130139 7130143 7130149 7130151 7130153 7097821 7076086] (152MB) Score=3.13 + L5 [7130352 7130361 6982483 6982597 6982684 6982729 6982819 6982828 6976618 6976623 7001341 7001361 7001417 700153
4 7001574 7001575 6825834 6990001 6990012 6990088 6990131 6990135 6990249] (963MB) Score=1.13 -> L5 [7133772 7133816 7133893 7133985 7133998 7134080 7134162 7134197 7134268 7134358 7134451 7134562
7134654 7134678 7134746 7134774 7134775 7134803 7134819 7134820] (931MB), in 109.0s (109.0s total), output rate 8.5MB/s
...
I241106 08:36:13.404917 693626967 3@pebble/event.go:776 ⋮ [n5,s5,pebble] 2825888  [JOB 552986] compacted(default) multilevel L3 [7134712] (11MB) Score=1.45 + L4 [6997130 7113394 7113395 7113411 7113417] (79MB) Score=3.10 + L5 [6561966 6989463 6989617 6989684 6960989 6960999 6961000 6562124 7015976 7015998] (328MB) Score=1.13 -> L5 [7134725 7134747 7134776 7134801 7134818 7134884 7134948 7134956] (378MB), in 47.7s (47.7s total), output rate 7.9MB/s
....
I241106 08:36:20.319673 693546100 3@pebble/event.go:776 ⋮ [n5,s5,pebble] 2826078  [JOB 552939] compacted(default) multilevel L3 [7134577] (8.2MB) Score=1.45 + L4 [7131212 7131215 7131248 7131249 71
31254 7131258 7131262] (79MB) Score=3.11 + L5 [7126202 7126222 7126256 7126257 7126288 6541303 7090493 7090494 7090535 7090613 7090681 7090873 7090908 7090978 7091013 6986397 6986414 6986417 698648
0 6986488 6943332] (733MB) Score=1.13 -> L5 [7134620 7134637 7134708 7134713 7134745 7134759 7134773 7134777 7134802 7134804 7134838 7134856 7134873 7134888 7134952 7135033] (718MB), in 67.1s (67.1
s total), output rate 11MB/s

As part of this change, we should re-enable multilevel compactions (disabled in cockroachdb/cockroach#134423) with a concurrency guard + possibly a tighter size limit on multi-level compactions. An example concurrency guard could be one that only schedules multilevel compactions if the number of allowed concurrent compactions is more than 1, and no other multi-level compaction is running. This would prevent us from getting in a state where we have too many concurrent multilevel compactions that all take longer than other compactions.

Jira issue: PEBBLE-298

@jbowens
Copy link
Collaborator

jbowens commented Nov 6, 2024

Eugh, these example compactions are pretty long-running and large

@sumeerbhola
Copy link
Collaborator

I'm unsure this is sufficient, in that long running compactions are generally not a good thing. We use expandedCompactionByteSizeLimit currently to limit these compactions, which IIUC is using 25x the lower-level target file size. That is very generous. I wonder where that 25x heuristic came from. Perhaps we could gather some data on the actual multiplier for most compactions and use that to tune down the number.

@itsbilal
Copy link
Member Author

itsbilal commented Nov 6, 2024

I'm unsure this is sufficient

@sumeerbhola I was thinking of something like, "schedule an ML compaction only if we have more than 1 compaction concurrency slot available to us, and no other ML compaction is running right now". This would always yield to other compactions that could reduce L0 shape sooner, while still getting us the benefits of reduced w-amp for when ML compactions do make sense.

Are you thinking of cases where a singular long-ish running ML compaction could be coming out of L0 and causing L0 sublevels to increase until it finishes?

@sumeerbhola
Copy link
Collaborator

"schedule an ML compaction only if we have more than 1 compaction concurrency slot available to us, and no other ML compaction is running right now".

Seems ok. But I am still wary of the size of these compactions, especially for compactions out of L0 and out of Lbase. If the latter is too large, it may hinder a compaction from L0 => Lbase. So a smaller size limit on those would help.

@itsbilal itsbilal changed the title db: limit concurrency of multilevel compactions db: re-enable and limit concurrency of multilevel compactions Nov 6, 2024
@itsbilal
Copy link
Member Author

itsbilal commented Nov 6, 2024

Removing ga-blocker as the issue to track the disabling of multi-level compactions in 24.3 is cockroachdb/cockroach#134423

@aadityasondhi aadityasondhi moved this from Incoming to Backlog in Storage Nov 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Backlog
Development

No branches or pull requests

3 participants