Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BulkLoad cannot work normally in case of large amount of data, all ingest failed #93

Open
mzygQAQ opened this issue Apr 30, 2022 · 2 comments
Labels
status/stale type/bug Something isn't working

Comments

@mzygQAQ
Copy link

mzygQAQ commented Apr 30, 2022

Bug Report

1. Describe the bug

A large number of failures in the later stage of task execution,all ingest are retry again and again, until failure.

2. Minimal reproduce step (Required)

datum: 500000000 keys, and 1KB per keys.
In the initial stage of the task, each partition can be imported normally and the import speed is within 3-5 seconds. However, in the later stage of the task, each task (partition) needs to be executed for 30-40 minute, and is in the failure and retry stage. Finally, it fails.

3. What did you see instead (Required)

4. What did you expect to see? (Required)

5. What is your migration tool and TiKV version? (Required)

  • TiKV Online Bulk Load:
@mzygQAQ mzygQAQ added the type/bug Something isn't working label Apr 30, 2022
@mzygQAQ
Copy link
Author

mzygQAQ commented May 7, 2022

Here is my personal analysis:
Although PD scheduling (merge / split) is suspended before import, tikv will still trigger split check and split region by itself
The importer only obtains the topology of the region that overlaps with the imported data once before the start of ingest.

With the import of data, the topology of the region has changed after the internal split of tikv self. However, the importer does not deal with this situation and simply tries again.

When encountering errors like EpochNotMatch or NotLeader, tikv-client will update the topology of the region itself, but the TiRegion passed to importerclient was first obtained before ingest, and it has long been invalid.

So no matter how you try again, it won't work.

@github-actions
Copy link

github-actions bot commented Jun 7, 2022

This issue is stale because it has been open 30 days with no activity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status/stale type/bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant