Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br: pipeline wait tiflash synced #43726

Merged
merged 7 commits into from
Jun 27, 2023
Merged

Conversation

3pointer
Copy link
Contributor

@3pointer 3pointer commented May 11, 2023

What problem does this PR solve?

Issue Number: close #43828

Problem Summary:
Currently if we restore to a cluster that has tiflash replicas. BR only send ingest command to leader and doesn't guarantee learner(tiflash reploica) ready to serve when restore finished. in some worst cases the lag between leader and learner may take hours.

What is changed and how it works?

This PR add a config wait-tiflash-ready to pipeline wait tiflash replica ready to serve when restore finished.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No code

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

Add an option for restore to wait tiflash ready to serve.

@ti-chi-bot
Copy link

ti-chi-bot bot commented May 11, 2023

[REVIEW NOTIFICATION]

This pull request has been approved by:

  • Leavrth
  • YuJuncen

To complete the pull request process, please ask the reviewers in the list to review by filling /cc @reviewer in the comment.
After your PR has acquired the required number of LGTMs, you can assign this pull request to the committer in the list by filling /assign @committer in the comment to help you merge this pull request.

The full list of commands accepted by this bot can be found here.

Reviewer can indicate their review by submitting an approval review.
Reviewer can cancel approval by submitting a request changes review.

@ti-chi-bot
Copy link

ti-chi-bot bot commented May 11, 2023

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-linked-issue do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. release-note-none Denotes a PR that doesn't merit a release note. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. labels May 11, 2023
@ti-chi-bot ti-chi-bot bot added release-note Denotes a PR that will be considered when it comes time to generate release notes. and removed do-not-merge/needs-linked-issue release-note-none Denotes a PR that doesn't merit a release note. labels May 15, 2023
@3pointer 3pointer marked this pull request as ready for review May 15, 2023 09:29
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label May 15, 2023
@3pointer
Copy link
Contributor Author

/run-integration-br-tests

@3pointer
Copy link
Contributor Author

/run-integration-br-tests

Copy link
Contributor

@YuJuncen YuJuncen left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest lgtm

@@ -1601,77 +1603,83 @@ func (rc *Client) switchTiKVMode(ctx context.Context, mode import_sstpb.SwitchMo
return nil
}

func concurrentHandleTablesCh(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make it generic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

func concurrentHandleTablesCh(    
    ctx context.Context,
	inCh <-chan T,
	outCh chan<- T,
	errCh chan<- error,
	workers *utils.WorkerPool,
	processFun func(context.Context, T) error,
	deferFun func())

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the benefit of make it generic?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So we don't need to change the outCh in Batcher into chan *CreatedTable

afterTableCheckesumedCh := client.GoValidateChecksum(
ctx, afterTableRestoredCh, mgr.GetStorage().GetClient(), errCh, updateCh, cfg.ChecksumConcurrency)
afterTableLoadStatsCh := client.GoUpdateMetaAndLoadStats(ctx, afterTableCheckesumedCh, errCh)
postHandleCh = afterTableLoadStatsCh
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it need to add updateCh.IncBy(len(tables)) in the else statement?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch. normally client.GoUpdateMetaAndLoadStats won't take too much time.
so I just ignore the progress of client.GoUpdateMetaAndLoadStats. and for others I add updateCh to trace progress.

Comment on lines +1633 to +1635
worker := workers.ApplyWorker()
eg.Go(func() error {
defer workers.RecycleWorker(worker)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

equals to workers.ApplyOnErrorGroup

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's not equal. because we need pass argument(cloneTable) into goroutine.

progress, err := infosync.CalculateTiFlashProgress(tbl.Table.ID, tbl.Table.TiFlashReplica.Count, tiFlashStores)
if err != nil {
log.Warn("failed to get tiflash replica progress, wait for next retry", zap.Error(err))
continue
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

need to also sleep to avoid frequent requests?

@ti-chi-bot ti-chi-bot bot added the status/LGT1 Indicates that a PR has LGTM 1. label May 24, 2023
@ti-chi-bot ti-chi-bot bot added status/LGT2 Indicates that a PR has LGTM 2. and removed status/LGT1 Indicates that a PR has LGTM 1. labels Jun 2, 2023
@3pointer
Copy link
Contributor Author

3pointer commented Jun 9, 2023

/merge

@ti-chi-bot ti-chi-bot bot added needs-1-more-lgtm Indicates a PR needs 1 more LGTM. approved labels Jun 25, 2023
@3pointer
Copy link
Contributor Author

/merge

@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 25, 2023

@3pointer: We have migrated to builtin LGTM and approve plugins for reviewing.

Please use /approve when you want approve this pull request.

The changes announcement: LGTM plugin changes

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the ti-community-infra/tichi repository.

@3pointer 3pointer removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 25, 2023
@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 25, 2023
@ti-chi-bot ti-chi-bot bot added the lgtm label Jun 27, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 27, 2023

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: Leavrth, YuJuncen

The full list of commands accepted by this bot can be found here.

The pull request process is described here

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot removed the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Jun 27, 2023
@ti-chi-bot
Copy link

ti-chi-bot bot commented Jun 27, 2023

[LGTM Timeline notifier]

Timeline:

  • 2023-06-25 05:38:00.098361645 +0000 UTC m=+513245.499612089: ☑️ agreed by Leavrth.
  • 2023-06-27 02:30:41.579990728 +0000 UTC m=+674806.981241177: ☑️ agreed by YuJuncen.

@tiprow
Copy link

tiprow bot commented Jun 27, 2023

@3pointer: The following test failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
tiprow_fast_test c0a7dee link true /test tiprow_fast_test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@ti-chi-bot ti-chi-bot bot merged commit 0f20315 into pingcap:master Jun 27, 2023
@3pointer 3pointer added needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. labels Jun 28, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-7.1: #45017.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jun 28, 2023
@ti-chi-bot
Copy link
Member

In response to a cherrypick label: new pull request created to branch release-6.5: #45018.

ti-chi-bot pushed a commit to ti-chi-bot/tidb that referenced this pull request Jun 28, 2023
ti-chi-bot bot pushed a commit that referenced this pull request Jul 10, 2023
ti-chi-bot bot pushed a commit that referenced this pull request Aug 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved lgtm needs-cherry-pick-release-6.5 Should cherry pick this PR to release-6.5 branch. needs-cherry-pick-release-7.1 Should cherry pick this PR to release-7.1 branch. release-note Denotes a PR that will be considered when it comes time to generate release notes. size/XL Denotes a PR that changes 500-999 lines, ignoring generated files. status/LGT2 Indicates that a PR has LGTM 2.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

BR: wait for tiflash synced after restore.
4 participants