-
Notifications
You must be signed in to change notification settings - Fork 457
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
db: re-enable and limit concurrency of multilevel compactions #4139
Comments
Eugh, these example compactions are pretty long-running and large |
I'm unsure this is sufficient, in that long running compactions are generally not a good thing. We use |
@sumeerbhola I was thinking of something like, "schedule an ML compaction only if we have more than 1 compaction concurrency slot available to us, and no other ML compaction is running right now". This would always yield to other compactions that could reduce L0 shape sooner, while still getting us the benefits of reduced w-amp for when ML compactions do make sense. Are you thinking of cases where a singular long-ish running ML compaction could be coming out of L0 and causing L0 sublevels to increase until it finishes? |
Seems ok. But I am still wary of the size of these compactions, especially for compactions out of L0 and out of Lbase. If the latter is too large, it may hinder a compaction from L0 => Lbase. So a smaller size limit on those would help. |
Removing ga-blocker as the issue to track the disabling of multi-level compactions in 24.3 is cockroachdb/cockroach#134423 |
On a test cluster, we observed that having all compaction slots be taken up by relatively long-running (but not excessively long running) multilevel compactions can make the store very susceptible to transient increases in L0 sublevels. This is concerning because L0 sublevel count is used as a tell-all measure of store overload, and even a temporary spike can have undesirable side-effects in CockroachDB.
For instance, these three ML compactions all overlapped around 08:35:30-08:36:00, resulting in an L0 sublevel spike from 2 to 8 before the next (L0->LBase) compaction could be scheduled:
As part of this change, we should re-enable multilevel compactions (disabled in cockroachdb/cockroach#134423) with a concurrency guard + possibly a tighter size limit on multi-level compactions. An example concurrency guard could be one that only schedules multilevel compactions if the number of allowed concurrent compactions is more than 1, and no other multi-level compaction is running. This would prevent us from getting in a state where we have too many concurrent multilevel compactions that all take longer than other compactions.
Jira issue: PEBBLE-298
The text was updated successfully, but these errors were encountered: