-
Notifications
You must be signed in to change notification settings - Fork 5.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement end to end checksum for TiDB and TiCDC #42747
Labels
Comments
2 tasks
12 tasks
ti-chi-bot
pushed a commit
that referenced
this issue
Apr 10, 2023
18 tasks
12 tasks
ti-chi-bot
pushed a commit
that referenced
this issue
Apr 18, 2023
This was referenced Apr 18, 2023
12 tasks
12 tasks
ti-chi-bot
pushed a commit
to ti-chi-bot/tidb
that referenced
this issue
May 5, 2023
12 tasks
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Background
TiCDC is an important component for TiDB to synchronize data to various downstream systems. When synchronizing data to downstream systems, data integrity is especially important. However, TiCDC does not support end-to-end data integrity verification yet.
Spec
Provide below cluster level boolean type option in TiDB side.
After the customer enables this option, every data change for a row in non-system databases will append an invisible field that is used to store a computed checksum value based on the content of the row. This invisible field is just for data correctness checking purposes and is transparent to the customer.
TiCDC and end users would use this checksum value to verify the data integrity.
Development tracking for the TiDB part
tidb_row_checksum
to return the checksum value of a row. *: add tidb_row_checksum() as a builtin function #43479Let tidb be aware of the origin state (none or public) of a column if its current state is not public.-- we always append two checksums if there is a column whose state is not public, thus no need to know the direction of state transform.tidb_enable_row_level_checksum
to enable or disable the checksum calculation when inserting new rows. When it's enabled, multi-schema change will be blocked.add column
schema change, and generate two checksum values if necessary.drop column
schema change, and generate two checksum values if necessary.modify column
schema change, and generate two checksum values if necessary.tablecodec
package whenEncodeRow
function is used. Calculate the CRC32 result for each column when executingencodeRowCols
, a checksum result is returned finally.chunckDecoder
in tidb if necessary. util: extend row format with checksum #42859internal_handle_request
andPointGetter
for chunk encoding processing in tikv if necessary. storage: add checksum logic in row slice, add cop and get test cases tikv/tikv#14611Skip the checksum part processing in tiflash if necessary.-- tiflash decodes a row value byappendRowV2ToBlockImpl
. it iterates columns and decodes them one by one here, that is, the checksum part shall be already discarded in the current implementation.Development tracking for the TiKV part
Development tracking for the TiCDC part
The text was updated successfully, but these errors were encountered: