-
Notifications
You must be signed in to change notification settings - Fork 234
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Bug] Doris Source When the parallelism is set to greater than 1, the read data will be lost #503
Comments
What is the Doris version? Does taskmanager report an error? |
Doris Version: 3.0.1 Flink's TaskManager does not have any errors I don't see any error logs related to flink-doris-connector. In my previous tests, I saw some warning logs with the content "The status of open scanner result from ......." These logs do not seem to have any impact on the code, but it seems that the read operation is closed immediately after the split is allocated, and very little data is actually read. |
This PR #502 doesn't seem to solve my problem. This is the task running result of recompiling flink-doris-connector after merging this PR tm log (Section About Source Section)
|
This has been fixed in this PR apache/doris#42421, you can try it; |
Thank you very much, I will close this issue later |
Search before asking
Version
24.0.1
What's Wrong?
I use flink-doris-connector to read the full data of a table, and then write it to kafka through flink kafka sink
I found that if I set the parallelism level to be greater than 1 for the Source, It will occasionally lose data, and sometimes only read a small portion of the data.
The following are screenshots of my experiment
PS: The number of rows of my data is 6405008
When I set my Source parallelism to 6, it only reads a very small portion of the data
When I set my Source parallelism to 2, the result is the same
It seems that only when the parallelism is 1 can he read the complete data. Why is this?
What You Expected?
I expect that when the Source parallelism is greater than 1, it should be able to read the complete data.
I looked at the Doris connector code carefully. I guess the process of assigning splits to read each split is fine. The problem is reading the data inside a DorisSplitRecords. It seems that it closes before reading all the tablets data in a split.
How to Reproduce?
flink version : 1.19-scala_2.12-java11
flink-doris-connector version: flink-doris-connector-1.18:24.0.1
flink task code
My doris table creation statement
Anything Else?
No response
Are you willing to submit PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: