-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FLINK-36517][cdc-connect][paimon] use filterAndCommit API for Avoid commit the same datafile duplicate #3639
base: master
Are you sure you want to change the base?
Conversation
@lvyanquan @leonardBang PTAL |
"Commit succeeded for %s with %s committable", | ||
checkpointId, committables.size())); | ||
} catch (Exception e) { | ||
commitRequests.forEach(CommitRequest::retryLater); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a specific purpose for retrying later in this context? @lvyanquan
Could you please assist in reviewing this PR? Thank you. @lvyanquan |
I agree that the issue of duplicate commits still exists. Our testing in the case of abnormal failover is relatively lacking, can you try adding corresponding test case for this? |
I will try, thanks. |
…commit the same datafile duplicate
// It's possible that flink job will restore from a checkpoint with only step#1 finished and | ||
// step#2 not. | ||
// CommitterOperator will try to re-commit recovered transactions. | ||
committer.commit(commitRequests); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for adding this, what about running insert and commit
many times(in a for loop), to simulate more complex situations and situations with compaction?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Considering there is another issue https://issues.apache.org/jira/browse/FLINK-36541 in PaimonWriter, If there is a problem with adding this loop, you can skip it for now.
https://issues.apache.org/jira/browse/FLINK-35938 problem still persists.
storeMultiCommitter.commit
API may cause the same datafile commit twice when job restart from failure.