Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

enable cf uses separete write buffer manager #343

Merged
merged 42 commits into from
Aug 29, 2023

Conversation

SpadeA-Tang
Copy link
Member

@SpadeA-Tang SpadeA-Tang commented Jul 25, 2023

This PR enables different write buffer managers to manager different cfs.

Signed-off-by: Spade A <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: Spade A <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
@SpadeA-Tang
Copy link
Member Author

/run-test

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
@SpadeA-Tang SpadeA-Tang changed the title wip enable multiple write buffer manager Aug 16, 2023
@SpadeA-Tang
Copy link
Member Author

/run-test

Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems way too complex for such a simple functionality. Why would be the problem if we just put a manager in ColumnFamilyOptions? You will need an additional spinlock to protect the vector holding all managers so that we could access them in PreprocessWrite. But I think in total it should be done in less than 50LOC.

@SpadeA-Tang
Copy link
Member Author

SpadeA-Tang commented Aug 22, 2023

PTAL /cc @tabokie @Connor1996

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
@@ -1589,7 +1590,7 @@ ColumnFamilySet::ColumnFamilySet(const std::string& dbname,
const ImmutableDBOptions* db_options,
const FileOptions& file_options,
Cache* table_cache,
WriteBufferManager* _write_buffer_manager,
WriteBufferManager*, /*non-used parameter*/
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To avoid changing the tests, we should make it fallback to old behavior if cf_options.wbm is null. In specific, ColumnFamilySet should still accept and store a wbm, but ColumnFamilyData ctor might take a different wbm depending on the specific cf_options.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion and done.

}
}
if (!already_exists) {
write_buffer_manager_.push_back(cf_options.write_buffer_manager);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You need a spinlock to modify a vector.

@@ -2282,7 +2281,7 @@ class DBImpl : public DB {

Directories directories_;

WriteBufferManager* write_buffer_manager_;
std::vector<std::shared_ptr<WriteBufferManager>> write_buffer_manager_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use autovector<Manager*> and put DB level manager separately from CF level managers.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion and done.

status = Status::Incomplete("Write stall");
} else {
InstrumentedMutexLock l(&mutex_);
WriteBufferManagerStallWrites(i);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, we don't need to allow stalling from cf level wbm. For default and write cf, we will still use DB level wbm and reuse the old code path.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good suggestion and done.

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
// in DBImpl.
//
// Default: null
std::shared_ptr<WriteBufferManager> cf_write_buffer_manager = nullptr;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::shared_ptr<WriteBufferManager> cf_write_buffer_manager = nullptr;
std::shared_ptr<WriteBufferManager> write_buffer_manager = nullptr;

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Option inherited from DBOptions and ColumnFamilyOptions, so I use different name for it to avoid trouble.

const Options& options);
void CreateColumnFamilies(
const std::vector<std::string>& cfs, const Options& options,
std::unordered_map<std::string, std::shared_ptr<WriteBufferManager>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is not the intended use case of this function. If you have different options for each cf, use ReopenWithColumnFamilies(const std::vector<std::string>& cfs, const std::vector<Options>& options); instead.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fixed

@@ -2282,6 +2287,12 @@ size_t DBImpl::GetWalPreallocateBlockSize(uint64_t write_buffer_size) const {
bsize = std::min<size_t>(bsize, buffer_size);
}
}
for (auto manager : cf_based_write_buffer_manager_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be min(db_write_buffer_size, sum(cf_wbm_size) + db_wbm_size).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

// Note: std::shared_ptr<WriteBufferManager> is store in ColumnfamilyOptions
// which is destroyed before DBimpl, so we use
// std::shared_ptr<WriteBufferManager> here in the vector.
autovector<std::shared_ptr<WriteBufferManager>>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment that this is only modified in Open, and lock is not needed.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// which is destroyed before DBimpl, so we use
// std::shared_ptr<WriteBufferManager> here in the vector.
autovector<std::shared_ptr<WriteBufferManager>>
cf_based_write_buffer_manager_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If CreateColumnFamily passed in an option not in the list, return error.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

// For simplicity, CF based write buffer manager does not support stall the
// write.
// Note: std::shared_ptr<WriteBufferManager> is store in ColumnfamilyOptions
// which is destroyed before DBimpl, so we use
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you had any actual error caused by not using shared_ptr? All code using WriteBufferManager should be destroyed before VersionSet, there shouldn't be any problem.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, UnregisterDB is used after VersionSet is destroyed.

Signed-off-by: SpadeA-Tang <[email protected]>
Signed-off-by: SpadeA-Tang <[email protected]>
Copy link
Member

@tabokie tabokie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest LG

Comment on lines 1677 to 1679
options.cf_write_buffer_manager != nullptr
? options.cf_write_buffer_manager.get()
: write_buffer_manager_,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Moving this block out looks better.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -685,6 +685,9 @@ Status DBImpl::CloseHelper() {
delete txn_entry.second;
}

for (auto m : cf_based_write_buffer_manager_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment the ordering.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -419,8 +419,7 @@ class DBImpl : public DB {
virtual Status GetSortedWalFiles(VectorLogPtr& files) override;
virtual Status GetCurrentWalFile(
std::unique_ptr<LogFile>* current_log_file) override;
virtual Status GetCreationTimeOfOldestFile(
uint64_t* creation_time) override;
virtual Status GetCreationTimeOfOldestFile(uint64_t* creation_time) override;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Revert this line, need to commit without running clang-format locally.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

@@ -1625,6 +1625,22 @@ Status DBImpl::Open(const DBOptions& db_options, const std::string& dbname,
}

DBImpl* impl = new DBImpl(db_options, dbname, seq_per_batch, batch_per_txn);
for (auto cf : column_families) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The logic of this block of code is quite isolated. I think it's better to wrap a private function for it.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer not, adding a member function has both mental overhead and compilation overhead. Plus this block can't be exactly reused elsewhere.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the major benefit of a private function is better readability: people can skip the detail when they don't care the very detail of the block. Other benefits include better testability, and more modular code.
Of course adding a comment for the block serves the same purpose in terms of readability, but old-school programmers typically prefer a function and even chat-gpt also recommends a wrap function.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, it is always a trade-off to extract a sub-routine. Reading function has a significant context switch overhead, and for non-pure function that modifies states it is mostly needed to read the implementation of the function. In RocksDB codebase it is generally preferred to use inlined code rather than many small functions. Some believe it helps with code quality.

@@ -2278,6 +2283,9 @@ size_t DBImpl::GetWalPreallocateBlockSize(uint64_t write_buffer_size) const {
if (immutable_db_options_.write_buffer_manager) {
size_t buffer_size =
immutable_db_options_.write_buffer_manager->flush_size();
for (auto manager : cf_based_write_buffer_manager_) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do we need to update buffer_size for WAL here?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If the flush size of write buffer manager is small compared with write buffer size, we will switch memtable before write bytes up to write buffer size. Use min here can avoid wasty pre allocation.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

but here you're using += instead of min<> of cf_based_write_buffer_manager_

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is because all CF will write in the same WAL file. Imaging we have 1M write-buffer-limit for CF1 but 1GB write-buffer-limit for CF2.

@SpadeA-Tang SpadeA-Tang changed the title enable multiple write buffer manager enable cf uses separete write buffer manager Aug 29, 2023
@SpadeA-Tang
Copy link
Member Author

/merge

@SpadeA-Tang SpadeA-Tang merged commit 6121b2d into tikv:6.29.tikv Aug 29, 2023
2 checks passed
v01dstar pushed a commit to v01dstar/rocksdb that referenced this pull request May 29, 2024
v01dstar pushed a commit to v01dstar/rocksdb that referenced this pull request Oct 2, 2024
v01dstar pushed a commit to v01dstar/rocksdb that referenced this pull request Oct 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants