-
Notifications
You must be signed in to change notification settings - Fork 72
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve inline verifier mismatch errors #361
Conversation
154e234
to
2dd8c90
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
New message format:
From
[paginationKeys: 7 (source: boo, target: aaa) ]
To
[PKs: 7 (type: content difference, source: boo, target: aaa, column: name) ]
In case of missing rows we skip the source/target
inline_verifier.go
Outdated
MismatchRowMissingOnSource mismatchType = "row missing on source" | ||
MismatchRowMissingOnTarget mismatchType = "row missing on target" | ||
MismatchContentDifference mismatchType = "content difference" | ||
MismatchChecksumDifference mismatchType = "rows checksum difference" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
added checksum difference next to content one, to differentiate between whole row checksum comparison and column data one
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isn't content & checksum difference the same thing? especially now that we are reporting missing row & column separately. If all the columns are there, and content is different checksum will be different?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there's a difference, because we check both checksum of a whole row + compare each rows columns separately, so the checksum difference
is for when the SQL query generating checksum for whole row doesn't match one in source, while the content difference is when we compare each single column of a row and encounter and issue there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🤔 If i understand this right, then maybe we should rename these to MismatchFieldDifference
and MismatchRowChecksumDifference
?
type InlineVerifierMismatches struct { | ||
Pk uint64 | ||
SourceChecksum string | ||
TargetChecksum string | ||
MismatchColumn string | ||
MismatchType mismatchType |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
flatten the structure a bit, as we always want to have the type in
if mismatch.TargetChecksum != "" { | ||
messageBuf.WriteString(", target: ") | ||
messageBuf.WriteString(mismatch.TargetChecksum) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
source and target checksum are optional, as we don't include them in case rows are missing (source or target)
inline_verifier.go
Outdated
mismatchSet[paginationKey] = InlineVerifierMismatches{ | ||
Pk: paginationKey, | ||
MismatchType: MismatchChecksumDifference, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- I couldn't find any reason why
!exist
was after the bytes comparison, it's counter intuitive as we should first check if rows are there - That's why I've split up check for
!exists
from!bytes.Equal(sourceHash, targetHash)
so we could properly define the mismatch type
if !bytes.Equal(sourceHash, targetHash) || !exists { | ||
for paginationKey, _ := range source { | ||
_, exists := target[paginationKey] | ||
if !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if rows exist in both source & target, then they are already compared above, no need to check it again
MismatchColumn: colName, | ||
} | ||
break // no need to compare other columns | ||
} else if !bytes.Equal(sourceData, targetData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- again, split
!exists
check from bytes comparison - for bytes comparison, generate checksums for columns data (avoid storing data which might be PII)
break | ||
for colName := range sourceDecompressedColumns { | ||
_, exists := targetDecompressedColumns[colName] | ||
if !exists { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
again, there's no point of comparing the data for the second time, it should be the same as in the loop above, check for column existence only
} | ||
compressedMismatch := compareDecompressedData(sourceData, targetData) | ||
for paginationKey, mismatch := range compressedMismatch { | ||
mismatches[paginationKey] = mismatch |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
with this, even if there's a hash comparison already there, we will get more details on the column involved
for tableName, _ := range mismatches[schemaName] { | ||
sortedTables = append(sortedTables, tableName) | ||
} | ||
sort.Strings(sortedTables) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sort tables for predictability
0593a70
to
53c5c66
Compare
inline_verifier_test.go
Outdated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Proper unit tests! 🎉
No description provided.