Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Fix](cloud) Remove pending delete bitmap's lock_id check when commit txn in MS #46841

Merged
merged 2 commits into from
Jan 13, 2025

Conversation

bobhan1
Copy link
Contributor

@bobhan1 bobhan1 commented Jan 12, 2025

What problem does this PR solve?

Related PR: #46039

Problem Summary:

#46039 add a defensive check when commit_txn in MS to check whether the lock_id of pending delete bitmaps on tablets involved in the txn is the current txn's lock_id. But this may report a false negative in the following circumstance:

  1. heavy schema change begins and add shadow index to table.
  2. txn A load data to base index and shadow index.
  3. txn A write its pending delete bitmaps on MS. This includes tablets of base index and shadow index.
  4. txn A failed to remove its pending delete bitmaps for some reson(e.g. commit_txn() failed due to too large value)
  5. txn B load data to base index and shadow index.
  6. schema change failed for some reason and remove shadow index on table.
  7. txn B send delete bitmap calculation task to BE. Note that this will not involved tablets under shadow index because these tablets have been dropped. So these tablets' pending delete bitmaps will still be txn A's.
  8. txn B commit txn on MS and find that pending delete bitmaps' lock_id on tablets under shadow index not match. And txn B will failed.

We can see that the checks on these dropped tablets are useless so we remove the mandatory check to avoid this false negative and print a warning log instead to help locate problems.

Cases will be added later.

Release note

None

Check List (For Author)

  • Test

    • Regression test
    • Unit Test
    • Manual test (add detailed scripts or steps below)
    • No need to test or manual test. Explain why:
      • This is a refactor/code format and no logic has been changed.
      • Previous test can cover this change.
      • No code files have been changed.
      • Other reason
  • Behavior changed:

    • No.
    • Yes.
  • Does this need documentation?

    • No.
    • Yes.

Check List (For Reviewer who merge this PR)

  • Confirm the release note
  • Confirm test cases
  • Confirm document
  • Add branch pick label

@hello-stephen
Copy link
Contributor

Thank you for your contribution to Apache Doris.
Don't know what should be done next? See How to process your PR.

Please clearly describe your PR:

  1. What problem was fixed (it's best to include specific error reporting information). How it was fixed.
  2. Which behaviors were modified. What was the previous behavior, what is it now, why was it modified, and what possible impacts might there be.
  3. What features were added. Why was this function added?
  4. Which code was refactored and why was this part of the code refactored?
  5. Which functions were optimized and what is the difference before and after the optimization?

@bobhan1 bobhan1 marked this pull request as ready for review January 12, 2025 08:40
@bobhan1
Copy link
Contributor Author

bobhan1 commented Jan 12, 2025

run buildall

1 similar comment
@bobhan1
Copy link
Contributor Author

bobhan1 commented Jan 13, 2025

run buildall

@bobhan1 bobhan1 force-pushed the remove-pending-delete-bitmap-check branch from ffb41a8 to 4d0dbfd Compare January 13, 2025 01:58
@bobhan1
Copy link
Contributor Author

bobhan1 commented Jan 13, 2025

run buildall

Copy link
Contributor

@zhannngchen zhannngchen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@github-actions github-actions bot added the approved Indicates a PR has been approved by one committer. label Jan 13, 2025
Copy link
Contributor

PR approved by at least one committer and no changes requested.

Copy link
Contributor

PR approved by anyone and no changes requested.

Copy link
Contributor

@dataroaring dataroaring left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@zhannngchen zhannngchen merged commit ca6b229 into apache:master Jan 13, 2025
27 checks passed
bobhan1 added a commit to bobhan1/doris that referenced this pull request Jan 13, 2025
…it txn in MS (apache#46841)

Related PR: apache#46039

Problem Summary:

apache#46039 add a defensive check when
commit_txn in MS to check whether the `lock_id` of pending delete
bitmaps on tablets involved in the txn is the current txn's `lock_id`.
But this may report a false negative in the following circumstance:

1. heavy schema change begins and add shadow index to table.
2. txn A load data to base index and shadow index.
3. txn A write its pending delete bitmaps on MS. This includes tablets
of base index and shadow index.
4. txn A failed to remove its pending delete bitmaps for some reson(e.g.
`commit_txn()` failed due to too large value)
5. txn B load data to base index and shadow index.
6. schema change failed for some reason and **remove shadow index on
table.**
7. txn B send delete bitmap calculation task to BE. **Note that this
will not involved tablets under shadow index because these tablets have
been dropped.** **So these tablets' pending delete bitmaps will still be
txn A's**.
8. txn B commit txn on MS and find that pending delete bitmaps'
`lock_id` on tablets under shadow index not match. And txn B will
failed.

We can see that the checks on these dropped tablets are useless so we
remove the mandatory check to avoid this false negative and print a
warning log instead to help locate problems.
dataroaring pushed a commit that referenced this pull request Jan 13, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by one committer. dev/3.0.4-merged reviewed
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants