Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Enhancement] Wait apply finish out of tablet meta lock when set tablet state as SHUTDOWN #56266

Merged
merged 2 commits into from
Feb 27, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion be/src/storage/tablet.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1622,7 +1622,7 @@ Status Tablet::rowset_commit(int64_t version, const RowsetSharedPtr& rowset, uin

void Tablet::on_shutdown() {
if (_updates) {
_updates->_stop_and_wait_apply_done();
_updates->stop_and_wait_apply_done();
}
}

Expand Down
21 changes: 21 additions & 0 deletions be/src/storage/tablet_manager.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -407,6 +407,13 @@ Status TabletManager::drop_tablet(TTabletId tablet_id, TabletDropFlag flag) {

DroppedTabletInfo drop_info{.tablet = dropped_tablet, .flag = flag};

// meta lock free shutdown for primary key table. In current impl, TabletUpdates's context
// is protected by specified lock defined in TabletUpdates itself but not tablet meta lock
// It is safe call stop_and_wait_apply_done out of tablet meta lock
if (dropped_tablet->updates() != nullptr) {
dropped_tablet->updates()->stop_and_wait_apply_done();
}

if (flag == kDeleteFiles) {
{
// NOTE: Other threads may save the tablet meta back to storage again after we
Expand Down Expand Up @@ -492,6 +499,13 @@ Status TabletManager::drop_tablets_on_error_root_path(const std::vector<TabletIn
}

for (const auto& dropped_tablet : dropped_tablets) {
// meta lock free shutdown for primary key table. In current impl, TabletUpdates's context
// is protected by specified lock defined in TabletUpdates itself but not tablet meta lock
// It is safe call stop_and_wait_apply_done out of tablet meta lock
if (dropped_tablet->updates() != nullptr) {
dropped_tablet->updates()->stop_and_wait_apply_done();
}

// make sure dropped tablet state is TABLET_SHUTDOWN IN MEMORY ONLY!
// any persistent operation is useless because the disk has failed
// and the IO operation should be always failed in this case.
Expand Down Expand Up @@ -1520,6 +1534,13 @@ Status TabletManager::_drop_tablet_unlocked(TTabletId tablet_id, TabletDropFlag

DroppedTabletInfo drop_info{.tablet = dropped_tablet, .flag = flag};

// meta lock free shutdown for primary key table. In current impl, TabletUpdates's context
// is protected by specified lock defined in TabletUpdates itself but not tablet meta lock
// It is safe call stop_and_wait_apply_done out of tablet meta lock
if (dropped_tablet->updates() != nullptr) {
dropped_tablet->updates()->stop_and_wait_apply_done();
}

if (flag == kDeleteFiles) {
{
// NOTE: Other threads may save the tablet meta back to storage again after we
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The most risky bug in this code is:
Potential memory-related issues due to not properly handling the lifecycle of dropped_tablet objects before calling on_shutdown().

You can modify the code like this:

// Ensure proper handling and safe shutdown of dropped tablets
if (dropped_tablet != nullptr && dropped_tablet->updates() != nullptr) {
    dropped_tablet->updates()->on_shutdown();
}

This modification checks that dropped_tablet itself is not a null pointer, assuming there's a possibility of such an error, thus preventing undefined behavior or crashes.

Expand Down
8 changes: 4 additions & 4 deletions be/src/storage/tablet_updates.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -84,7 +84,7 @@ std::string EditVersion::to_string() const {
TabletUpdates::TabletUpdates(Tablet& tablet) : _tablet(tablet), _unused_rowsets(UINT64_MAX) {}

TabletUpdates::~TabletUpdates() {
_stop_and_wait_apply_done();
stop_and_wait_apply_done();
}

template <class Itr1, class Itr2>
Expand Down Expand Up @@ -1028,7 +1028,7 @@ void TabletUpdates::_wait_apply_done() {
}
}

void TabletUpdates::_stop_and_wait_apply_done() {
void TabletUpdates::stop_and_wait_apply_done() {
_apply_stopped = true;
_wait_apply_done();
}
Expand Down Expand Up @@ -4912,7 +4912,7 @@ Status TabletUpdates::load_snapshot(const SnapshotMeta& snapshot_meta, bool rest
RETURN_IF_ERROR(check_rowset_files(rowset_meta_pb));
}
// Stop apply thread.
_stop_and_wait_apply_done();
stop_and_wait_apply_done();

DeferOp defer([&]() {
if (!_error.load()) {
Expand Down Expand Up @@ -5644,7 +5644,7 @@ Status TabletUpdates::recover() {
}
LOG(INFO) << "Tablet " << _tablet.tablet_id() << " begin do recover: " << _error_msg;
// Stop apply thread.
_stop_and_wait_apply_done();
stop_and_wait_apply_done();

DeferOp defer([&]() {
if (!_error.load()) {
Expand Down
3 changes: 2 additions & 1 deletion be/src/storage/tablet_updates.h
Original file line number Diff line number Diff line change
Expand Up @@ -386,6 +386,8 @@ class TabletUpdates {

void rewrite_rs_meta(bool is_fatal);

void stop_and_wait_apply_done();

private:
friend class Tablet;
friend class PrimaryIndex;
Expand Down Expand Up @@ -440,7 +442,6 @@ class TabletUpdates {
EditVersion* commit_version);

void _wait_apply_done();
void _stop_and_wait_apply_done();

Status _do_compaction(std::unique_ptr<CompactionInfo>* pinfo);

Expand Down
Loading