-
Notifications
You must be signed in to change notification settings - Fork 43
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bugfix: DiffMatchPatch Surrogate Pairs Support #581
Conversation
Boolean lineBreak2 = whitespace2 && CFCharacterSetIsCharacterMember(controlSet, char2); | ||
Boolean blankLine1 = lineBreak1 && diff_regExMatch(one, &blankLineEndRegEx); | ||
Boolean blankLine2 = lineBreak2 && diff_regExMatch(two, &blankLineStartRegEx); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we remove the whitespace changes? It's making it hard to scan and see if any actual changes occurred here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only two new lines there...
Boolean char1IsSurrogate = CFStringIsSurrogateLowCharacter(char1) || CFStringIsSurrogateHighCharacter(char1);
Boolean char2IsSurrogate = CFStringIsSurrogateLowCharacter(char2) || CFStringIsSurrogateHighCharacter(char1);
I've relocated all the things because it was next to impossible to understand what was going on. Lemme see if i can revert that... relatively easily!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@dmsnell there you go! The changes themselves are very low footprint. And the beauty of it? if there are no surrogate pairs anywhere... this will run the same way it ever did.
return MIN(i - 2, 0); | ||
} | ||
|
||
return i - 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you see google/diff-match-patch#69 (comment) where new bugs appeared by using this approach?
Is the way we're modifying this here going to lend itself well to pushing a patch to the upstream repository? |
@dmsnell there is no upstream repository on this one. When this was built. DMP wasn't on Github, and (I'm pretty sure) it has a couple bugfixes. Okay to patch right here! |
Closed in favor of #582 |
Details:
In this PR we're upgrading DiffMatchPatch so that surrogate pairs are not cut in half.
This avoids invalid unicode strings to be formed.
Simplenote sibling here
Closes #580
Testing:
[TBD]