storage,kv: sudden leaseholder changes due to io overload shedding #134423

dt · 2024-11-06T16:46:39Z

On a test cluster I observed frequent sudden and drastic leaseholder movements, when a node would in the space of a couple seconds shed all of its leases due to its IO overload score touching the threshold at which it does so.

Further investigation suggests this may be to a number of concurrent larger multi-level compactions that were recently enabled briefly occupying all the compaction slots, causing L0 to briefly increase in its level count and hitting the threshold.

It seems like we should shed leases more gradually as overload signals rise rather than all at once, and that we should avoid using all of our compaction capacity on longer running multi-level compactions for periods so long that they starve out compactions required to keep L0 level counts healthy.

Jira issue: CRDB-44074

itsbilal · 2024-11-06T17:18:28Z

Pebble companion issue cockroachdb/pebble#4139

itsbilal · 2024-11-06T18:51:58Z

I'll turn this issue into one about just disabling multilevel compactions in 24.3, while cockroachdb/pebble#4139 is about reenabling them with concurrency limits.

134346: sql: skip TestIndexBackfillMergeRetry under duress r=Dedej-Bergin a=Dedej-Bergin This test fails under duress so we are skipping it. Fixes: #134033 Release note: None 134441: storage: disable multilevel compactions r=jbowens a=itsbilal In their current state, multilevel compactions can cause momentary spikes in L0 sublevels, resulting in undesirable side-effects elsewhere. Fixes #134423. Epic: none Release note: None Co-authored-by: Bergin Dedej <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]>

134441: storage: disable multilevel compactions r=jbowens a=itsbilal In their current state, multilevel compactions can cause momentary spikes in L0 sublevels, resulting in undesirable side-effects elsewhere. Fixes #134423. Epic: none Release note: None Co-authored-by: Bilal Akhtar <[email protected]>

In their current state, multilevel compactions can cause momentary spikes in L0 sublevels, resulting in undesirable side-effects elsewhere. Fixes #134423. Epic: none Release note: None

dt · 2024-11-07T05:39:24Z

Do we need a new separate issue for more gradual ramp up of lease shedding?

In their current state, multilevel compactions can cause momentary spikes in L0 sublevels, resulting in undesirable side-effects elsewhere. Fixes cockroachdb#134423. Epic: none Release note (bug fix): Addressed a bug with DROP CASCADE that would occasionally panic with an undropped backref message on partitioned tables.

blathers-crl bot added the T-storage Storage Team label Nov 6, 2024

github-project-automation bot added this to Storage Nov 6, 2024

github-project-automation bot moved this to Incoming in Storage Nov 6, 2024

This was referenced Nov 6, 2024

storage: disable multilevel compactions #134441

Merged

db: re-enable and limit concurrency of multilevel compactions cockroachdb/pebble#4139

Open

craig bot closed this as completed in 88a7276 Nov 6, 2024

github-project-automation bot moved this from Incoming to Done in Storage Nov 6, 2024

blathers-crl bot mentioned this issue Nov 6, 2024

release-24.3: storage: disable multilevel compactions #134471

Merged

ebembi-crdb mentioned this issue Nov 11, 2024

storage: disable multilevel compactions ebembi-crdb/cockroach#14

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage,kv: sudden leaseholder changes due to io overload shedding #134423

storage,kv: sudden leaseholder changes due to io overload shedding #134423

dt commented Nov 6, 2024 •

edited by cockroach-jira-scripts

Loading

itsbilal commented Nov 6, 2024

itsbilal commented Nov 6, 2024

dt commented Nov 7, 2024

storage,kv: sudden leaseholder changes due to io overload shedding #134423

storage,kv: sudden leaseholder changes due to io overload shedding #134423

Comments

dt commented Nov 6, 2024 • edited by cockroach-jira-scripts Loading

itsbilal commented Nov 6, 2024

itsbilal commented Nov 6, 2024

dt commented Nov 7, 2024

dt commented Nov 6, 2024 •

edited by cockroach-jira-scripts

Loading