Skip to content
This repository has been archived by the owner on Nov 24, 2023. It is now read-only.

pessimistic: record latest done ddl to handle conflict #1626

Open
wants to merge 15 commits into
base: master
Choose a base branch
from

Conversation

GMHDBJD
Copy link
Collaborator

@GMHDBJD GMHDBJD commented Apr 23, 2021

What problem does this PR solve?

issue

What is changed and how it works?

when latest DDLs is resolved, put it into etcd.
when conflict DDLs detected in master, ignore DDLs which equals to the latest DDLs.

Check List

Tests

  • Unit test
  • Integration test

@ti-chi-bot
Copy link
Member

[REVIEW NOTIFICATION]

This pull request has not been approved.

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by writing /lgtm in a comment.
Reviewer can cancel approval by writing /lgtm cancel in a comment.

@GMHDBJD GMHDBJD added status/WIP This PR is still work in progress type/bug-fix Bug fix needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 labels Apr 23, 2021
@GMHDBJD GMHDBJD added status/PTAL This PR is ready for review. Add this label back after committing new changes and removed status/WIP This PR is still work in progress labels Apr 23, 2021
@lance6716
Copy link
Collaborator

lance6716 commented Apr 24, 2021

could we let DM-worker put ShardDDLPessimismDDLsKeyAdapter, to handle DM-master down between "receive the done lock through etcd" and "put ShardDDLPessimismDDLsKeyAdapter"?
And putting ShardDDLPessimismDDLsKeyAdapter should be in the same transaction with marking the lock done.

And I don't know if there're some DDL that could be run twice with some DML interleaved and take some effects (like TRUNCATE but TRUNCATE is filtered, I don't know if we should support pessimistic sharding TRUNCATE someday). For this case, we might adding binlog location as a part of key of ShardDDLPessimismDDLsKeyAdapter

@lance6716 lance6716 modified the milestone: v2.0.3 Apr 25, 2021
@lance6716 lance6716 added the needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated label Apr 25, 2021
@lance6716 lance6716 added this to the v2.0.3 milestone Apr 25, 2021
@GMHDBJD
Copy link
Collaborator Author

GMHDBJD commented Apr 25, 2021

could we let DM-worker put ShardDDLPessimismDDLsKeyAdapter, to handle DM-master down between "receive the done lock through etcd" and "put ShardDDLPessimismDDLsKeyAdapter"?

So you mean we should record last done ddl for every worker? Currently only put last done ddl after lock is resolved(all source done). Here handle the situation about DM-master down.

And I don't know if there're some DDL that could be run twice with some DML interleaved and take some effects

I don't think we will support truncate in pessimistic.🤣. We only skip the DDL when conflict error happened. So the case only happened like

source1 ddl1
source2 ddl1
done
source1 ddl1 <- skip
source2 ddl2

we handle this situation.

dm/common/common.go Show resolved Hide resolved

// only used to report to the caller of the watcher, do not marsh it.
// if it's true, it means the Operation has been deleted in etcd.
IsDeleted bool `json:"-"`
}

// NewOperation creates a new Operation instance.
func NewOperation(id, task, source string, ddls []string, exec, done bool) Operation {
func NewOperation(id, task, source string, ddls []string, exec, done bool, skip bool) Operation {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func NewOperation(id, task, source string, ddls []string, exec, done bool, skip bool) Operation {
func NewOperation(id, task, source string, ddls []string, exec, done, skip bool) Operation {

@@ -64,13 +68,46 @@ func NewLock(id, task, owner string, ddls, sources []string) *Lock {
// TrySync tries to sync the lock, does decrease on remain, re-entrant.
// new upstream sources may join when the DDL lock is in syncing,
// so we need to merge these new sources.
func (l *Lock) TrySync(caller string, ddls, sources []string) (bool, int, error) {
func (l *Lock) TrySync(cli *clientv3.Client, caller string, ddls, sources []string, latestDoneDDLs []string) (bool, int, error) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
func (l *Lock) TrySync(cli *clientv3.Client, caller string, ddls, sources []string, latestDoneDDLs []string) (bool, int, error) {
func (l *Lock) TrySync(cli *clientv3.Client, caller string, ddls, sources, latestDoneDDLs []string) (bool, int, error) {

}

// current ddls idempotent, skip it.
if utils.CompareShardingDDLs(latestDoneDDLs, ddls) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can save this function in a bool variable to save one comparison.

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review later

dm/master/shardddl/pessimist.go Outdated Show resolved Hide resolved
}

// PutLatestDoneDDLs puts the last done shard DDL ddls into etcd.
func PutLatestDoneDDLs(cli *clientv3.Client, lockID string, ddls []string) (int64, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

may use above function

if t := utils.ExtractTaskFromLockID(lockID); t == task {
lockIDs = append(lockIDs, lockID)
}
delete(lk.latestDoneDDLs, lockID)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we remove the lockID only when it matching the task 🤔

@@ -35,21 +35,23 @@ type Operation struct {
DDLs []string `json:"ddls"` // DDL statements
Exec bool `json:"exec"` // execute or skip the DDL statements
Done bool `json:"done"` // whether the `Exec` operation has done
Skip bool `json:"skip"` // Whether worker skip this operation
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could we don't add Skip, instead, use empty DDLs

syncer/syncer.go Show resolved Hide resolved
@lance6716 lance6716 modified the milestones: v2.0.3, v2.0.4 May 8, 2021
@lance6716
Copy link
Collaborator

need shfmt

Copy link
Collaborator

@lance6716 lance6716 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

will review later

pkg/utils/common.go Show resolved Hide resolved
dm/master/shardddl/pessimist.go Outdated Show resolved Hide resolved
dm/master/shardddl/pessimist.go Show resolved Hide resolved
dm/master/shardddl/pessimist.go Outdated Show resolved Hide resolved
_, _, err := PutOperations(cli, true, NewOperation(l.ID, l.Task, caller, latestDoneDDLs, false, false, true))
return l.remain <= 0, l.remain, err
}

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you give an example of the cases that enter here 😵 ?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curIdempotent: worker1, worker2 ddl1 -> worker 2 ddl2 -> worker1 resync ddl1
otherIdempotent: worker1, worker2 ddl1 -> worker2 resync ddl1 -> worker1 ddl2

pkg/shardddl/pessimism/ops.go Outdated Show resolved Hide resolved
}

// current ddls idempotent, skip it.
if curIdempotent {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I still prefer that we should check curIdempotent before utils.CompareShardingDDLs(ddls, l.DDLs)

background: the lock is resolved but one non-owner is failed before saving checkpoint and resync the DDL of the resolved lock

my prefer way:

  1. the failed DM-worker put the outdated DDL. if we check curIdempotent here, we could earlier avoid blocking that DM-worker which may caused by DDL lock

current way:

  1. the failed DM-worker (worker1) put the outdated DDL, utils.CompareShardingDDLs(ddls, l.DDLs) is true
  2. another DM-worker (worker2) put a DDL more likely a new different DDL, enter if-then branch, curIdempotent is false, utils.CompareShardingDDLs(latestDoneDDLs, l.DDLs) is true.
    (And i think in most cases curIdempotent is false. And when curIdempotent is true, worker1 will wait more time to get skip Op)
  3. in step2 we enter "other sources' ddls idempotent, skip them" branch, and here we put skip Op for worker1

Above all, worker1 may be blocked for a longer time

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And I still prefer save a binlog position through same transaction in etcd 😂 When DM-worker starting, this position has higher priority than one in downstream checkpoint

@GMHDBJD
Copy link
Collaborator Author

GMHDBJD commented May 19, 2021

/hold

@ti-chi-bot
Copy link
Member

@GMHDBJD: PR needs rebase.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@GMHDBJD GMHDBJD removed this from the v2.0.4 milestone Jun 7, 2021
@Ehco1996 Ehco1996 removed the status/PTAL This PR is ready for review. Add this label back after committing new changes label Jun 28, 2021
@lance6716
Copy link
Collaborator

/hold

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
do-not-merge/hold needs-cherry-pick-release-2.0 This PR should be cherry-picked to release-2.0. Remove this label after cherry-picked to release-2.0 needs-rebase needs-update-release-note This PR should be added into release notes. Remove this label once the release notes are updated size/XL status/Stale type/bug-fix Bug fix
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants