Skip to content

Commit

Permalink
Use optimized folly DistributedMutex in LRUCache when available (face…
Browse files Browse the repository at this point in the history
…book#10179)

Summary:
folly DistributedMutex is faster than standard mutexes though
imposes some static obligations on usage. See
https://github.com/facebook/folly/blob/main/folly/synchronization/DistributedMutex.h
for details. Here we use this alternative for our Cache implementations
(especially LRUCache) for better locking performance, when RocksDB is
compiled with folly.

Also added information about which distributed mutex implementation is
being used to cache_bench output and to DB LOG.

Intended follow-up:
* Use DMutex in more places, perhaps improving API to support non-scoped
locking
* Fix linking with fbcode compiler (needs ROCKSDB_NO_FBCODE=1 currently)

Credit: Thanks Siying for reminding me about this line of work that was previously
left unfinished.

Pull Request resolved: facebook#10179

Test Plan:
for correctness, existing tests. CircleCI config updated.
Also Meta-internal buck build updated.

For performance, ran simultaneous before & after cache_bench. Out of three
comparison runs, the middle improvement to ops/sec was +21%:

Baseline: USE_CLANG=1 DEBUG_LEVEL=0 make -j24 cache_bench (fbcode
compiler)

```
Complete in 20.201 s; Rough parallel ops/sec = 1584062
Thread ops/sec = 107176

Operation latency (ns):
Count: 32000000 Average: 9257.9421  StdDev: 122412.04
Min: 134  Median: 3623.0493  Max: 56918500
Percentiles: P50: 3623.05 P75: 10288.02 P99: 30219.35 P99.9: 683522.04 P99.99: 7302791.63
```

New: (add USE_FOLLY=1)

```
Complete in 16.674 s; Rough parallel ops/sec = 1919135  (+21%)
Thread ops/sec = 135487

Operation latency (ns):
Count: 32000000 Average: 7304.9294  StdDev: 108530.28
Min: 132  Median: 3777.6012  Max: 91030902
Percentiles: P50: 3777.60 P75: 10169.89 P99: 24504.51 P99.9: 59721.59 P99.99: 1861151.83
```

Reviewed By: anand1976

Differential Revision: D37182983

Pulled By: pdillinger

fbshipit-source-id: a17eb05f25b832b6a2c1356f5c657e831a5af8d1
  • Loading branch information
pdillinger authored and facebook-github-bot committed Jun 17, 2022
1 parent f87adcf commit 1aac814
Show file tree
Hide file tree
Showing 17 changed files with 160 additions and 51 deletions.
16 changes: 13 additions & 3 deletions .circleci/config.yml
Original file line number Diff line number Diff line change
Expand Up @@ -168,6 +168,16 @@ commands:
command: |
echo "export PKG_CONFIG_PATH=/usr/local/OFF/:~/libprotobuf-mutator/build/external.protobuf/lib/pkgconfig/" >> $BASH_ENV
echo "export PROTOC_BIN=~/libprotobuf-mutator/build/external.protobuf/bin/protoc" >> $BASH_ENV
setup-folly:
steps:
- run:
name: Install folly dependencies
command: |
sudo apt-get install libgoogle-glog-dev
- run:
name: Checkout folly sources
command: |
make checkout_folly
build-for-benchmarks:
steps:
Expand Down Expand Up @@ -442,7 +452,7 @@ jobs:
- pre-steps
- install-gflags
- upgrade-cmake
- run: make checkout_folly
- setup-folly
- run: (mkdir build && cd build && cmake -DUSE_FOLLY=1 -DWITH_GFLAGS=1 .. && make V=1 -j20 && ctest -j20)
- post-steps

Expand Down Expand Up @@ -477,7 +487,7 @@ jobs:
steps:
- pre-steps
- run: sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test && sudo apt-get update -y && sudo apt-get install gcc-7 g++-7 libgflags-dev
- run: make checkout_folly
- setup-folly
- run: USE_FOLLY=1 CC=gcc-7 CXX=g++-7 V=1 make -j32 check
- post-steps

Expand Down Expand Up @@ -532,7 +542,7 @@ jobs:
- pre-steps
- install-clang-13
- install-gflags
- run: make checkout_folly
- setup-folly
- run: CC=clang-13 CXX=clang++-13 USE_CLANG=1 USE_FOLLY=1 COMPILE_WITH_UBSAN=1 COMPILE_WITH_ASAN=1 make -j32 check
- post-steps

Expand Down
7 changes: 6 additions & 1 deletion CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -587,6 +587,7 @@ include_directories(${PROJECT_SOURCE_DIR}/include)
if(USE_FOLLY)
include_directories(${PROJECT_SOURCE_DIR}/third-party/folly)
add_definitions(-DUSE_FOLLY -DFOLLY_NO_CONFIG)
list(APPEND THIRDPARTY_LIBS glog)
endif()
find_package(Threads REQUIRED)

Expand Down Expand Up @@ -975,9 +976,13 @@ endif()
if(USE_FOLLY)
list(APPEND SOURCES
third-party/folly/folly/container/detail/F14Table.cpp
third-party/folly/folly/detail/Futex.cpp
third-party/folly/folly/lang/SafeAssert.cpp
third-party/folly/folly/lang/ToAscii.cpp
third-party/folly/folly/ScopeGuard.cpp)
third-party/folly/folly/ScopeGuard.cpp
third-party/folly/folly/synchronization/AtomicNotification.cpp
third-party/folly/folly/synchronization/DistributedMutex.cpp
third-party/folly/folly/synchronization/ParkingLot.cpp)
endif()

set(ROCKSDB_STATIC_LIB rocksdb${ARTIFACT_SUFFIX})
Expand Down
3 changes: 3 additions & 0 deletions HISTORY.md
Original file line number Diff line number Diff line change
Expand Up @@ -42,6 +42,9 @@
* Removed support for reading Bloom filters using obsolete block-based filter format. (Support for writing such filters was dropped in 7.0.) For good read performance on old DBs using these filters, a full compaction is required.
* Per KV checksum in write batch is verified before a write batch is written to WAL to detect any corruption to the write batch (#10114).

### Performance Improvements
* When compiled with folly (Meta-internal integration; experimental in open source build), improve the locking performance (CPU efficiency) of LRUCache by using folly DistributedMutex in place of standard mutex.

## 7.3.0 (05/20/2022)
### Bug Fixes
* Fixed a bug where manual flush would block forever even though flush options had wait=false.
Expand Down
8 changes: 7 additions & 1 deletion Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -461,6 +461,8 @@ ifeq ($(USE_FOLLY),1)
endif
PLATFORM_CCFLAGS += -DUSE_FOLLY -DFOLLY_NO_CONFIG
PLATFORM_CXXFLAGS += -DUSE_FOLLY -DFOLLY_NO_CONFIG
# TODO: fix linking with fbcode compiler config
PLATFORM_LDFLAGS += -lglog
endif

ifdef TEST_CACHE_LINE_SIZE
Expand Down Expand Up @@ -2354,10 +2356,14 @@ checkout_folly:
fi
@# Pin to a particular version for public CI, so that PR authors don't
@# need to worry about folly breaking our integration. Update periodically
cd third-party/folly && git reset --hard 98b9b2c1124e99f50f9085ddee74ce32afffc665
cd third-party/folly && git reset --hard beacd86d63cd71c904632262e6c36f60874d78ba
@# A hack to remove boost dependency.
@# NOTE: this hack is not needed if using FBCODE compiler config
perl -pi -e 's/^(#include <boost)/\/\/$$1/' third-party/folly/folly/functional/Invoke.h
@# NOTE: this hack is required for clang in some cases
perl -pi -e 's/int rv = syscall/int rv = (int)syscall/' third-party/folly/folly/detail/Futex.cpp
@# NOTE: this hack is required for gcc in some cases
perl -pi -e 's/(__has_include.<experimental.memory_resource>.)/__cpp_rtti && $$1/' third-party/folly/folly/memory/MemoryResource.h

# ---------------------------------------------------------------------------
# Build size testing
Expand Down
2 changes: 2 additions & 0 deletions TARGETS
Original file line number Diff line number Diff line change
Expand Up @@ -334,6 +334,7 @@ cpp_library_wrapper(name="rocksdb_lib", srcs=[
"//folly/experimental/coro:collect",
"//folly/experimental/coro:coroutine",
"//folly/experimental/coro:task",
"//folly/synchronization:distributed_mutex",
], headers=None, link_whole=False, extra_test_libs=False)

cpp_library_wrapper(name="rocksdb_whole_archive_lib", srcs=[
Expand Down Expand Up @@ -662,6 +663,7 @@ cpp_library_wrapper(name="rocksdb_whole_archive_lib", srcs=[
"//folly/experimental/coro:collect",
"//folly/experimental/coro:coroutine",
"//folly/experimental/coro:task",
"//folly/synchronization:distributed_mutex",
], headers=None, link_whole=True, extra_test_libs=False)

cpp_library_wrapper(name="rocksdb_test_lib", srcs=[
Expand Down
6 changes: 4 additions & 2 deletions buckifier/buckify_rocksdb.py
Original file line number Diff line number Diff line change
Expand Up @@ -150,7 +150,8 @@ def generate_targets(repo_path, deps_map):
"//folly/experimental/coro:blocking_wait",
"//folly/experimental/coro:collect",
"//folly/experimental/coro:coroutine",
"//folly/experimental/coro:task"])
"//folly/experimental/coro:task",
"//folly/synchronization:distributed_mutex"])
# rocksdb_whole_archive_lib
TARGETS.add_library(
"rocksdb_whole_archive_lib",
Expand All @@ -163,7 +164,8 @@ def generate_targets(repo_path, deps_map):
"//folly/experimental/coro:blocking_wait",
"//folly/experimental/coro:collect",
"//folly/experimental/coro:coroutine",
"//folly/experimental/coro:task"],
"//folly/experimental/coro:task",
"//folly/synchronization:distributed_mutex"],
headers=None,
extra_external_deps="",
link_whole=True)
Expand Down
9 changes: 9 additions & 0 deletions cache/cache_bench_tool.cc
Original file line number Diff line number Diff line change
Expand Up @@ -26,6 +26,7 @@
#include "table/block_based/block_based_table_reader.h"
#include "table/block_based/cachable_entry.h"
#include "util/coding.h"
#include "util/distributed_mutex.h"
#include "util/gflags_compat.h"
#include "util/hash.h"
#include "util/mutexlock.h"
Expand Down Expand Up @@ -587,7 +588,15 @@ class CacheBench {
}

void PrintEnv() const {
#if defined(__GNUC__) && !defined(__OPTIMIZE__)
printf(
"WARNING: Optimization is disabled: benchmarks unnecessarily slow\n");
#endif
#ifndef NDEBUG
printf("WARNING: Assertions are enabled; benchmarks unnecessarily slow\n");
#endif
printf("RocksDB version : %d.%d\n", kMajorVersion, kMinorVersion);
printf("DMutex impl name : %s\n", DMutex::kName());
printf("Number of threads : %u\n", FLAGS_threads);
printf("Ops per thread : %" PRIu64 "\n", FLAGS_ops_per_thread);
printf("Cache size : %s\n",
Expand Down
16 changes: 8 additions & 8 deletions cache/clock_cache.cc
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@ std::shared_ptr<Cache> NewClockCache(
#include "port/port.h"
#include "tbb/concurrent_hash_map.h"
#include "util/autovector.h"
#include "util/mutexlock.h"
#include "util/distributed_mutex.h"

namespace ROCKSDB_NAMESPACE {

Expand Down Expand Up @@ -368,7 +368,7 @@ class ClockCacheShard final : public CacheShard {

// Guards list_, head_, and recycle_. In addition, updating table_ also has
// to hold the mutex, to avoid the cache being in inconsistent state.
mutable port::Mutex mutex_;
mutable DMutex mutex_;

// The circular list of cache handles. Initially the list is empty. Once a
// handle is needed by insertion, and no more handles are available in
Expand Down Expand Up @@ -431,7 +431,7 @@ void ClockCacheShard::ApplyToSomeEntries(
DeleterFn deleter)>& callback,
uint32_t average_entries_per_lock, uint32_t* state) {
assert(average_entries_per_lock > 0);
MutexLock lock(&mutex_);
DMutexLock l(mutex_);

// Figure out the range to iterate, update `state`
size_t list_size = list_.size();
Expand Down Expand Up @@ -532,7 +532,7 @@ bool ClockCacheShard::Unref(CacheHandle* handle, bool set_usage,
pinned_usage_.fetch_sub(total_charge, std::memory_order_relaxed);
// Cleanup if it is the last reference.
if (!InCache(flags)) {
MutexLock l(&mutex_);
DMutexLock l(mutex_);
RecycleHandle(handle, context);
}
}
Expand Down Expand Up @@ -598,7 +598,7 @@ bool ClockCacheShard::EvictFromCache(size_t charge, CleanupContext* context) {
void ClockCacheShard::SetCapacity(size_t capacity) {
CleanupContext context;
{
MutexLock l(&mutex_);
DMutexLock l(mutex_);
capacity_.store(capacity, std::memory_order_relaxed);
EvictFromCache(0, &context);
}
Expand All @@ -618,7 +618,7 @@ CacheHandle* ClockCacheShard::Insert(
uint32_t meta_charge =
CacheHandle::CalcMetadataCharge(key, metadata_charge_policy_);
size_t total_charge = charge + meta_charge;
MutexLock l(&mutex_);
DMutexLock l(mutex_);
bool success = EvictFromCache(total_charge, context);
bool strict = strict_capacity_limit_.load(std::memory_order_relaxed);
if (!success && (strict || !hold_reference)) {
Expand Down Expand Up @@ -744,7 +744,7 @@ void ClockCacheShard::Erase(const Slice& key, uint32_t hash) {

bool ClockCacheShard::EraseAndConfirm(const Slice& key, uint32_t hash,
CleanupContext* context) {
MutexLock l(&mutex_);
DMutexLock l(mutex_);
HashTable::accessor accessor;
bool erased = false;
if (table_.find(accessor, ClockCacheKey(key, hash))) {
Expand All @@ -758,7 +758,7 @@ bool ClockCacheShard::EraseAndConfirm(const Slice& key, uint32_t hash,
void ClockCacheShard::EraseUnRefEntries() {
CleanupContext context;
{
MutexLock l(&mutex_);
DMutexLock l(mutex_);
table_.clear();
for (auto& handle : list_) {
UnsetInCache(&handle, &context);
Expand Down
24 changes: 12 additions & 12 deletions cache/fast_lru_cache.cc
Original file line number Diff line number Diff line change
Expand Up @@ -16,7 +16,7 @@
#include "monitoring/perf_context_imp.h"
#include "monitoring/statistics.h"
#include "port/lang.h"
#include "util/mutexlock.h"
#include "util/distributed_mutex.h"

#define KEY_LENGTH \
16 // TODO(guido) Make use of this symbol in other parts of the source code
Expand Down Expand Up @@ -93,7 +93,7 @@ LRUCacheShard::LRUCacheShard(size_t capacity, size_t estimated_value_size,
void LRUCacheShard::EraseUnRefEntries() {
autovector<LRUHandle*> last_reference_list;
{
MutexLock l(&mutex_);
DMutexLock l(mutex_);
while (lru_.next != &lru_) {
LRUHandle* old = lru_.next;
// LRU list contains only elements which can be evicted.
Expand All @@ -120,7 +120,7 @@ void LRUCacheShard::ApplyToSomeEntries(
// The state is essentially going to be the starting hash, which works
// nicely even if we resize between calls because we use upper-most
// hash bits for table indexes.
MutexLock l(&mutex_);
DMutexLock l(mutex_);
uint32_t length_bits = table_.GetLengthBits();
uint32_t length = uint32_t{1} << length_bits;

Expand Down Expand Up @@ -208,7 +208,7 @@ int LRUCacheShard::GetHashBits(
void LRUCacheShard::SetCapacity(size_t capacity) {
autovector<LRUHandle*> last_reference_list;
{
MutexLock l(&mutex_);
DMutexLock l(mutex_);
capacity_ = capacity;
EvictFromLRU(0, &last_reference_list);
}
Expand All @@ -220,7 +220,7 @@ void LRUCacheShard::SetCapacity(size_t capacity) {
}

void LRUCacheShard::SetStrictCapacityLimit(bool strict_capacity_limit) {
MutexLock l(&mutex_);
DMutexLock l(mutex_);
strict_capacity_limit_ = strict_capacity_limit;
}

Expand All @@ -229,7 +229,7 @@ Status LRUCacheShard::InsertItem(LRUHandle* e, Cache::Handle** handle,
Status s = Status::OK();
autovector<LRUHandle*> last_reference_list;
{
MutexLock l(&mutex_);
DMutexLock l(mutex_);

// Free the space following strict LRU policy until enough space
// is freed or the lru list is empty.
Expand Down Expand Up @@ -289,7 +289,7 @@ Status LRUCacheShard::InsertItem(LRUHandle* e, Cache::Handle** handle,
Cache::Handle* LRUCacheShard::Lookup(const Slice& key, uint32_t hash) {
LRUHandle* e = nullptr;
{
MutexLock l(&mutex_);
DMutexLock l(mutex_);
e = table_.Lookup(key, hash);
if (e != nullptr) {
assert(e->InCache());
Expand All @@ -305,7 +305,7 @@ Cache::Handle* LRUCacheShard::Lookup(const Slice& key, uint32_t hash) {

bool LRUCacheShard::Ref(Cache::Handle* h) {
LRUHandle* e = reinterpret_cast<LRUHandle*>(h);
MutexLock l(&mutex_);
DMutexLock l(mutex_);
// To create another reference - entry must be already externally referenced.
assert(e->HasRefs());
e->Ref();
Expand All @@ -319,7 +319,7 @@ bool LRUCacheShard::Release(Cache::Handle* handle, bool erase_if_last_ref) {
LRUHandle* e = reinterpret_cast<LRUHandle*>(handle);
bool last_reference = false;
{
MutexLock l(&mutex_);
DMutexLock l(mutex_);
last_reference = e->Unref();
if (last_reference && e->InCache()) {
// The item is still in cache, and nobody else holds a reference to it.
Expand Down Expand Up @@ -382,7 +382,7 @@ void LRUCacheShard::Erase(const Slice& key, uint32_t hash) {
LRUHandle* e;
bool last_reference = false;
{
MutexLock l(&mutex_);
DMutexLock l(mutex_);
e = table_.Remove(key, hash);
if (e != nullptr) {
assert(e->InCache());
Expand All @@ -405,12 +405,12 @@ void LRUCacheShard::Erase(const Slice& key, uint32_t hash) {
}

size_t LRUCacheShard::GetUsage() const {
MutexLock l(&mutex_);
DMutexLock l(mutex_);
return usage_;
}

size_t LRUCacheShard::GetPinnedUsage() const {
MutexLock l(&mutex_);
DMutexLock l(mutex_);
assert(usage_ >= lru_usage_);
return usage_ - lru_usage_;
}
Expand Down
3 changes: 2 additions & 1 deletion cache/fast_lru_cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@
#include "port/port.h"
#include "rocksdb/secondary_cache.h"
#include "util/autovector.h"
#include "util/distributed_mutex.h"

namespace ROCKSDB_NAMESPACE {
namespace fast_lru_cache {
Expand Down Expand Up @@ -273,7 +274,7 @@ class ALIGN_AS(CACHE_LINE_SIZE) LRUCacheShard final : public CacheShard {
// mutex_ protects the following state.
// We don't count mutex_ as the cache's internal state so semantically we
// don't mind mutex_ invoking the non-const actions.
mutable port::Mutex mutex_;
mutable DMutex mutex_;
};

class LRUCache
Expand Down
Loading

0 comments on commit 1aac814

Please sign in to comment.