You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please answer these questions before submitting your issue. Thanks!
We use sync-diff-inspector to compare the data of two Aurora clusters.
We have seen the following error in sync-diff-inspector when comparing table rows for large table, especially if there are massive mismatches.
The context deadline error could happen in comparing checksum phase due to query taking too long.
We have tried increasing the query timeout and using larger instance types, which helped reduce the errors, but would not fix the essential problem.
We should consider improving the parallelism in comparing checksum and row data.
The text was updated successfully, but these errors were encountered:
Hi, in sync-diff-inspector, we have already implemented concurrent data comparison. Here's an overview of the whole process:
Chunk Division: Each table is divided into multiple chunks. If the table has any index, we will use the index column to split chunks. If the chunk size (chunk-size) is not explicitly specified in the configuration file, we will calcuate # of chunks as max(rowCount/10000, 10000). However, if the table doesn't has any index, we will treat the whole table as one chunk.
Concurrent Chunk Checking: sync-diff-inspector will check all the chunks concurrently which can be specified by check-thread-count, which has a default value 4.
Mismatch Data Check: like the first step, if the table has indices, sync-diff-inspector will utilize it to do binary search, otherwise, it will compare the data in each chunk row by row.
Could you provide the configuration you used and table schema if possible?
Bug Report
Please answer these questions before submitting your issue. Thanks!
We use sync-diff-inspector to compare the data of two Aurora clusters.
We have seen the following error in sync-diff-inspector when comparing table rows for large table, especially if there are massive mismatches.
The context deadline error could happen in comparing checksum phase due to query taking too long.
We have tried increasing the query timeout and using larger instance types, which helped reduce the errors, but would not fix the essential problem.
We should consider improving the parallelism in comparing checksum and row data.
The text was updated successfully, but these errors were encountered: