Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugfix: DiffMatchPatch Surrogate Pairs Support #581

Closed
wants to merge 6 commits into from

Conversation

jleandroperez
Copy link
Contributor

Details:

In this PR we're upgrading DiffMatchPatch so that surrogate pairs are not cut in half.
This avoids invalid unicode strings to be formed.

Simplenote sibling here
Closes #580

Testing:

[TBD]

@jleandroperez jleandroperez added this to the 0.8.24 milestone Nov 8, 2019
@jleandroperez jleandroperez self-assigned this Nov 8, 2019
Boolean lineBreak2 = whitespace2 && CFCharacterSetIsCharacterMember(controlSet, char2);
Boolean blankLine1 = lineBreak1 && diff_regExMatch(one, &blankLineEndRegEx);
Boolean blankLine2 = lineBreak2 && diff_regExMatch(two, &blankLineStartRegEx);

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we remove the whitespace changes? It's making it hard to scan and see if any actual changes occurred here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only two new lines there...

  Boolean char1IsSurrogate = CFStringIsSurrogateLowCharacter(char1) || CFStringIsSurrogateHighCharacter(char1);
   Boolean char2IsSurrogate = CFStringIsSurrogateLowCharacter(char2) || CFStringIsSurrogateHighCharacter(char1);

I've relocated all the things because it was next to impossible to understand what was going on. Lemme see if i can revert that... relatively easily!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmsnell there you go! The changes themselves are very low footprint. And the beauty of it? if there are no surrogate pairs anywhere... this will run the same way it ever did.

return MIN(i - 2, 0);
}

return i - 1;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you see google/diff-match-patch#69 (comment) where new bugs appeared by using this approach?

@dmsnell
Copy link

dmsnell commented Nov 8, 2019

Is the way we're modifying this here going to lend itself well to pushing a patch to the upstream repository?

@jleandroperez
Copy link
Contributor Author

@dmsnell there is no upstream repository on this one. When this was built. DMP wasn't on Github, and (I'm pretty sure) it has a couple bugfixes.

Okay to patch right here!

@jleandroperez
Copy link
Contributor Author

Closed in favor of #582

@jleandroperez jleandroperez deleted the issue/580-emojis-crash branch November 11, 2019 22:11
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Crash: Emojis + DiffMatchPatch
2 participants