-
Notifications
You must be signed in to change notification settings - Fork 528
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
TBS: validate pebble options for performance #15568
Comments
MethodologyRan a bunch of experiments using Github actions benchmark workflow on branch In each run (a different commit), 1 variable is changed compared to the baseline (the options that are used on main now). Specifically, we will change each variable to Raw benchmark results are attached at the bottom, and their run history can be found in Github actions benchmark workflow ResultsOnly useful comparisons are highlighted. 8mb event db memtable (baseline=16mb)
32mb event db memtable (baseline=16mb)
8kb event db block size (baseline=16kb)
32kb event db block size (baseline=16kb)
1mb decision db memtable (baseline=2mb)
4mb decision db memtable (baseline=2mb)
1kb decision db block size (baseline=2kb)
4kb decision db block size (baseline=2kb)
16mb decision db cache (baseline=8mb)
decision db compression enabled (baseline=disabled)
ConclusionFor this dataset and setup (AWS with gp3 disk on minimum throughput), the current baseline appears to be perform reasonably well, as no options tested so far gives a significant boost to the numbers. Some no-gos shown in this exercise
Other trade-offs
Raw benchmark results1kb-decision-block.txt |
Agree with your analysis - none of the changes suggest it would be worth to change the default. |
MethodologyTested with local nvme disk using 32GB memory c6id instances this time instead of the slowest gp3 EBS disk. Baseline = main running in this setup. Resultsscaling cache size to 16MB per GB of system memory (spread between 2 databases) (baseline=8MB fixed per DB)As the instance is 32GB, it means 32*16MB - 16MB (baseline) = 496MB extra memory used. The memory increase is expected. However, the throughput gain isn't significant.
force write parallelism (baseline=disabled)No gains on the write path.
event db 32kb block size (baseline=16kb block size)Consistent with gp3 testing. Increasing from 16kb to 32kb seems to be beneficial.
partitions per ttl = 2 (baseline=1)This is a little surprising and unintuitive. I wonder if it is just the variance. Not planning to change it without understanding the benchmark.
Bonus: 8.18 badger comparisonsAdding some badger comparisons to check for regression with fast disk, high spec setup, in this case, 32GB apm-server with local NVMe ssd. badger 8.18 to main baselineGreat improvement in max_rss (-88%) and disk (107GB -> 26GB). intake event rate in pebble is slower than in badger (-13%), supposedly due to disk IO being used for reads (+8%).
badger 8.18 to main with 32kb block size32kb block size means more efficient disk IO, which makes up a bit for the write path regression (+23% on events/s, -10% on intake event rate). Same memory gains and even bigger disk usage improvement.
ConclusionBadger comparison shows a minor regression on intake event rate (write path) in 32GB apm-server using NVMe SSDs. 16kb->32kb event db block size seems to be the only option that is worth changing at the moment. I did not look into actual bottlenecks as the benchmarks are run on AWS. It could be a follow up task to study CPU / disk utilization, profiles and pebble metrics (cache hit rate etc). This task focuses on shipping with good enough defaults that will give us reasonable performance and ensure there is no significant perf regression compared to badger. Raw benchmark results16mbpergb.txt |
Follow-up from #15235
When developing #15235 , the pebble options were picked based on inaccurate benchmarks. Now that benchmarks are fixed, we should revisit the pebble options to ensure that reasonable values are used, and performance is good, as some options cannot be edited once the database is created.
Benchmark results under different options should be included in this issue to support the final option values selected for 9.0 release.
The text was updated successfully, but these errors were encountered: