Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

compact: add simple threshold-based tombstone compaction heuristic #3739

Closed

Conversation

anish-shanbhag
Copy link
Contributor

No description provided.

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@anish-shanbhag
Copy link
Contributor Author

Currently experimenting with a rough implementation (5b9740e) of option # 1 from #3719. Here are the performance results from the benchmark added in de9b2e2:

                                                        │        old        │                 new                  │
                                                        │      sec/op       │    sec/op     vs base                │
PointDeletedSwath/gap=100/prefix_point_lookup-10               6.232µ ± ∞ ¹   6.281µ ± ∞ ¹         ~ (p=0.343 n=4)
PointDeletedSwath/gap=100/non-prefix_point_seek-10             7.783µ ± ∞ ¹   7.536µ ± ∞ ¹         ~ (p=0.343 n=4)
PointDeletedSwath/gap=100/full_scan-10                          1.793 ± ∞ ¹    1.793 ± ∞ ¹         ~ (p=0.886 n=4)
PointDeletedSwath/gap=1000/prefix_point_lookup-10              6.757µ ± ∞ ¹   6.757µ ± ∞ ¹         ~ (p=0.857 n=4)
PointDeletedSwath/gap=1000/non-prefix_point_seek-10            7.848µ ± ∞ ¹   7.826µ ± ∞ ¹         ~ (p=1.000 n=4)
PointDeletedSwath/gap=1000/full_scan-10                         1.773 ± ∞ ¹    1.881 ± ∞ ¹         ~ (p=0.057 n=4)
PointDeletedSwath/gap=10000/prefix_point_lookup-10             6.265µ ± ∞ ¹   6.575µ ± ∞ ¹    +4.96% (p=0.029 n=4)
PointDeletedSwath/gap=10000/non-prefix_point_seek-10           8.833µ ± ∞ ¹   8.723µ ± ∞ ¹    -1.26% (p=0.029 n=4)
PointDeletedSwath/gap=10000/full_scan-10                        1.835 ± ∞ ¹    1.873 ± ∞ ¹         ~ (p=1.000 n=4)
PointDeletedSwath/gap=100000/prefix_point_lookup-10            6.331µ ± ∞ ¹   6.313µ ± ∞ ¹         ~ (p=1.000 n=4)
PointDeletedSwath/gap=100000/non-prefix_point_seek-10         78.583µ ± ∞ ¹   8.078µ ± ∞ ¹   -89.72% (p=0.029 n=4)
PointDeletedSwath/gap=100000/full_scan-10                       1.874 ± ∞ ¹    1.875 ± ∞ ¹         ~ (p=0.886 n=4)
PointDeletedSwath/gap=200000/prefix_point_lookup-10            6.399µ ± ∞ ¹   6.482µ ± ∞ ¹         ~ (p=0.057 n=4)
PointDeletedSwath/gap=200000/non-prefix_point_seek-10        341.271µ ± ∞ ¹   9.768µ ± ∞ ¹   -97.14% (p=0.029 n=4)
PointDeletedSwath/gap=200000/full_scan-10                       1.770 ± ∞ ¹    1.902 ± ∞ ¹    +7.47% (p=0.029 n=4)
PointDeletedSwath/gap=400000/prefix_point_lookup-10            6.448µ ± ∞ ¹   6.474µ ± ∞ ¹         ~ (p=0.486 n=4)
PointDeletedSwath/gap=400000/non-prefix_point_seek-10       1296.707µ ± ∞ ¹   9.636µ ± ∞ ¹   -99.26% (p=0.029 n=4)
PointDeletedSwath/gap=400000/full_scan-10                       1.800 ± ∞ ¹    1.903 ± ∞ ¹    +5.69% (p=0.029 n=4)
PointDeletedSwath/gap=5000000/prefix_point_lookup-10           5.658µ ± ∞ ¹   3.861µ ± ∞ ¹   -31.77% (p=0.029 n=4)
PointDeletedSwath/gap=5000000/non-prefix_point_seek-10        77.130m ± ∞ ¹   4.271m ± ∞ ¹   -94.46% (p=0.029 n=4)
PointDeletedSwath/gap=5000000/full_scan-10                      1.474 ± ∞ ¹    1.017 ± ∞ ¹   -31.01% (p=0.029 n=4)
PointDeletedSwath/gap=10000000/prefix_point_lookup-10          3.355µ ± ∞ ¹   1.735µ ± ∞ ¹   -48.28% (p=0.029 n=4)
PointDeletedSwath/gap=10000000/non-prefix_point_seek-10   154092.521µ ± ∞ ¹   2.546µ ± ∞ ¹  -100.00% (p=0.029 n=4)
PointDeletedSwath/gap=10000000/full_scan-10                    799.2m ± ∞ ¹   297.7m ± ∞ ¹   -62.76% (p=0.029 n=4)

I examined the LSM state after all background compactions were finished, and it looks like the new heuristic does target all of the tombstone-dense SSTables in the middle of the key range for compaction. After compaction, there are no tombstones left in L0/L5 which is where they were building up before. This means that Seek operations have the same performance as without any deletions, and full scans are also faster because many of the keys have now been fully compacted away.

A single tombstone swath is a pretty simple case though, and for next steps I'm planning to examine a delete-heavy KV workload, as well as looking into adding a new Pebble benchmark for a queue-style workload.

This change adds a heuristic to compact point tombstones based on
their density across the LSM. We add a new table property called
`NumTombstoneDenseBlocks` and a corresponding field in `TableStats` that
tracks the number of data blocks in each table which are considered
tombstone-dense. This value is calculated on the fly while tables are being
written, so no extra I/O is required later on to compute it.

A data block is considered tombstone-dense if it fulfills either of the
following criteria:
1. The block contains at least `options.Experimental.NumDeletionsThreshold`
point tombstones. The default value is `100`.
2. The ratio of the uncompressed size of point tombstones to the uncompressed
size of the block is at least `options.Experimental.DeletionSizeRatioThreshold`.
For example, with the default value of `0.5`, a data block of size 4KB
would be considered tombstone-dense if it contains at least 2KB of point
tombstones.

The intuition here is that as described [here](cockroachdb#918 (comment)),
dense clusters are bad because they a) waste CPU when skipping over tombstones,
and b) waste I/O because we end up loading more blocks per live key. The
two criteria above are meant to tackle these two issues respectively; the
the count-based threshold prevents CPU waste, and the size-based threshold
prevents I/O waste.

A table is considered eligible for the new tombstone compaction type if
it contains at least `options.Experimental.MinTombstoneDenseBlocks`
tombstone-dense data blocks. The default value is `20`. We use an Annotator
in a similar way to elision-only compactions in order to prioritize compacting
the table with the most tombstone-dense blocks if there are multiple
eligible tables. The default here was chosen through experimentation on
CockroachDB KV workloads; with a lower value we were compacting too
aggressively leading to very high write amplification, but lower values
led to very few noticeable performance improvements.
@anish-shanbhag anish-shanbhag changed the title [WIP] compact: add point tombstone density compaction heuristic compact: add simple threshold-based tombstone compaction heuristic Jul 24, 2024
@anish-shanbhag
Copy link
Contributor Author

Closing because this simple heuristic did not improve performance with the queue benchmark here: #3744 (comment)

An improved heuristic is at #3790

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants