-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
br: pipeline wait tiflash synced #43726
Conversation
[REVIEW NOTIFICATION] This pull request has been approved by:
To complete the pull request process, please ask the reviewers in the list to review by filling The full list of commands accepted by this bot can be found here. Reviewer can indicate their review by submitting an approval review. |
Skipping CI for Draft Pull Request. |
fe38f37
to
e84eca0
Compare
0095a64
to
777ac5a
Compare
/run-integration-br-tests |
/run-integration-br-tests |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rest lgtm
@@ -1601,77 +1603,83 @@ func (rc *Client) switchTiKVMode(ctx context.Context, mode import_sstpb.SwitchMo | |||
return nil | |||
} | |||
|
|||
func concurrentHandleTablesCh( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make it generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
func concurrentHandleTablesCh(
ctx context.Context,
inCh <-chan T,
outCh chan<- T,
errCh chan<- error,
workers *utils.WorkerPool,
processFun func(context.Context, T) error,
deferFun func())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what's the benefit of make it generic?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So we don't need to change the outCh
in Batcher
into chan *CreatedTable
afterTableCheckesumedCh := client.GoValidateChecksum( | ||
ctx, afterTableRestoredCh, mgr.GetStorage().GetClient(), errCh, updateCh, cfg.ChecksumConcurrency) | ||
afterTableLoadStatsCh := client.GoUpdateMetaAndLoadStats(ctx, afterTableCheckesumedCh, errCh) | ||
postHandleCh = afterTableLoadStatsCh | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
does it need to add updateCh.IncBy(len(tables))
in the else
statement?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nice catch. normally client.GoUpdateMetaAndLoadStats
won't take too much time.
so I just ignore the progress of client.GoUpdateMetaAndLoadStats
. and for others I add updateCh to trace progress.
worker := workers.ApplyWorker() | ||
eg.Go(func() error { | ||
defer workers.RecycleWorker(worker) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
equals to workers.ApplyOnErrorGroup
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it's not equal. because we need pass argument(cloneTable) into goroutine.
progress, err := infosync.CalculateTiFlashProgress(tbl.Table.ID, tbl.Table.TiFlashReplica.Count, tiFlashStores) | ||
if err != nil { | ||
log.Warn("failed to get tiflash replica progress, wait for next retry", zap.Error(err)) | ||
continue |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
need to also sleep to avoid frequent requests?
/merge |
/merge |
@3pointer: We have migrated to builtin Please use
Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository. |
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Leavrth, YuJuncen The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
@3pointer: The following test failed, say
Full PR test history. Your PR dashboard. Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here. |
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
In response to a cherrypick label: new pull request created to branch |
Signed-off-by: ti-chi-bot <[email protected]>
What problem does this PR solve?
Issue Number: close #43828
Problem Summary:
Currently if we restore to a cluster that has tiflash replicas. BR only send ingest command to leader and doesn't guarantee learner(tiflash reploica) ready to serve when restore finished. in some worst cases the lag between leader and learner may take hours.
What is changed and how it works?
This PR add a config
wait-tiflash-ready
to pipeline wait tiflash replica ready to serve when restore finished.Check List
Tests
Side effects
Documentation
Release note
Please refer to Release Notes Language Style Guide to write a quality release note.