forked from google/diff-match-patch
-
Notifications
You must be signed in to change notification settings - Fork 7
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Prevent splitting surrogate pairs when diffing
See google#69 In this patch I'm trying to follow the approach taken by @judofyr but starting from the top of the script and going through, auditing every place that perform string operations that split, index, or otherwise operate on a character level so that we can make sure that we don't split surrogate pairs. This contrasts with [attempt one] where I created a custom iterator for strings. Surprisingly I found this more "ad-hoc" approach easier to manage since it doesn't create a split universe of string/Unicode. As of this commit I haven't audited the cleanup functions but my own tests are passing so I'm given to believe that they might be safe. I have my own doubts that this is sound work and that the middle-snake algorithm might find the wrong snake when presented with variable-width characters.
- Loading branch information
Showing
2 changed files
with
143 additions
and
2 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters