Skip to content

Commit

Permalink
JavaScript: Stop breaking surrogate pairs in toDelta()
Browse files Browse the repository at this point in the history
Resolves google#69 for JavaScript

Sometimes we can find a common prefix that runs into the middle of a
surrogate pair and we split that pair when building our diff groups.

This is fine as long as we are operating on UTF-16 code units. It
becomes problematic when we start trying to treat those substrings as
valid Unicode (or UTF-8) sequences.

When we pass these split groups into `toDelta()` we do just that and the
library crashes. In this patch we're post-processing the diff groups
before encoding them to make sure that we un-split the surrogate pairs.

The post-processed diffs should produce the same output when applying
the diffs. The diff string itself will be different but should change
that much - only by a single character at surrogate boundaries.
  • Loading branch information
dmsnell committed Jan 30, 2024
1 parent 62f2e68 commit 143d61d
Show file tree
Hide file tree
Showing 3 changed files with 364 additions and 53 deletions.
Loading

0 comments on commit 143d61d

Please sign in to comment.