-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage,kv: sudden leaseholder changes due to io overload shedding #134423
Labels
A-admission-control
A-storage
Relating to our storage engine (Pebble) on-disk storage.
branch-master
Failures and bugs on the master branch.
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
GA-blocker
T-admission-control
Admission Control
T-storage
Storage Team
Comments
dt
added
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
A-storage
Relating to our storage engine (Pebble) on-disk storage.
A-admission-control
branch-master
Failures and bugs on the master branch.
GA-blocker
T-admission-control
Admission Control
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
labels
Nov 6, 2024
Pebble companion issue cockroachdb/pebble#4139 |
I'll turn this issue into one about just disabling multilevel compactions in 24.3, while cockroachdb/pebble#4139 is about reenabling them with concurrency limits. |
This was referenced Nov 6, 2024
craig bot
pushed a commit
that referenced
this issue
Nov 6, 2024
134346: sql: skip TestIndexBackfillMergeRetry under duress r=Dedej-Bergin a=Dedej-Bergin This test fails under duress so we are skipping it. Fixes: #134033 Release note: None 134441: storage: disable multilevel compactions r=jbowens a=itsbilal In their current state, multilevel compactions can cause momentary spikes in L0 sublevels, resulting in undesirable side-effects elsewhere. Fixes #134423. Epic: none Release note: None Co-authored-by: Bergin Dedej <[email protected]> Co-authored-by: Bilal Akhtar <[email protected]>
craig bot
pushed a commit
that referenced
this issue
Nov 6, 2024
134441: storage: disable multilevel compactions r=jbowens a=itsbilal In their current state, multilevel compactions can cause momentary spikes in L0 sublevels, resulting in undesirable side-effects elsewhere. Fixes #134423. Epic: none Release note: None Co-authored-by: Bilal Akhtar <[email protected]>
blathers-crl bot
pushed a commit
that referenced
this issue
Nov 6, 2024
In their current state, multilevel compactions can cause momentary spikes in L0 sublevels, resulting in undesirable side-effects elsewhere. Fixes #134423. Epic: none Release note: None
Do we need a new separate issue for more gradual ramp up of lease shedding? |
ebembi-crdb
added a commit
to ebembi-crdb/cockroach
that referenced
this issue
Nov 11, 2024
In their current state, multilevel compactions can cause momentary spikes in L0 sublevels, resulting in undesirable side-effects elsewhere. Fixes cockroachdb#134423. Epic: none Release note (bug fix): Addressed a bug with DROP CASCADE that would occasionally panic with an undropped backref message on partitioned tables.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-admission-control
A-storage
Relating to our storage engine (Pebble) on-disk storage.
branch-master
Failures and bugs on the master branch.
branch-release-24.3
Used to mark GA and release blockers, technical advisories, and bugs for 24.3
C-bug
Code not up to spec/doc, specs & docs deemed correct. Solution expected to change code/behavior.
GA-blocker
T-admission-control
Admission Control
T-storage
Storage Team
On a test cluster I observed frequent sudden and drastic leaseholder movements, when a node would in the space of a couple seconds shed all of its leases due to its IO overload score touching the threshold at which it does so.
Further investigation suggests this may be to a number of concurrent larger multi-level compactions that were recently enabled briefly occupying all the compaction slots, causing L0 to briefly increase in its level count and hitting the threshold.
It seems like we should shed leases more gradually as overload signals rise rather than all at once, and that we should avoid using all of our compaction capacity on longer running multi-level compactions for periods so long that they starve out compactions required to keep L0 level counts healthy.
Jira issue: CRDB-44074
The text was updated successfully, but these errors were encountered: