Prioritise Unicode Full encoding over Unicode BMP Only #93

MJDrgn · 2024-04-15T01:50:05Z

Currently, if both Full and BMPOnly Unicode cmaps are present, the library will pick whichever one it finds first. This means for libraries that have extended characters (for example, https://fonts.google.com/noto/specimen/Noto+Emoji ) will not render any characters beyond U+FFFF - which includes a lot of the common emoji.

This change alters the priority order to decode the Full table if present, over the BMPOnly table.
It maintains the previous behavior for the Microsoft platform tables (the last one in the file is used, if no Unicode table is present).

There is a potential change in behavior if existing code relied on the library silently not rendering characters outside the BMP, but this was inconsistent anyway as in most cases the not-found box symbol (Index 0) was rendered instead.

It's possible this may cause a regression in the case of a broken font file, where the BMPOnly table was correct but the Full table was broken. I cannot find any such fonts, and this would be a significant error in the font creation itself so is unlikely to be an issue.

google-cla · 2024-04-15T01:50:10Z

Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA).

View this failed invocation of the CLA check for more information.

For the most up to date status, view the checks section at the bottom of the pull request.

MJDrgn · 2024-04-15T03:27:44Z

This fixes the same issue as #88 but in a slightly different way (and that one is missing a CLA).
Addresses #44 as well.

@nigeltao I know this repo hasn't seen any changes in a long time (and there are other alternatives) but unfortunately it's an upstream dependency of a number of large libraries we use, so moving to image/font/opentype is problematic.

Alternatively, is this officially deprecated / unmaintained? If so, could this be added to the README and marked as so, which would help prioritize downstream repos switching to alternatives?

Prioritise Unicode Full over Unicode BMP Only

f5892a2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Prioritise Unicode Full encoding over Unicode BMP Only #93

Prioritise Unicode Full encoding over Unicode BMP Only #93

MJDrgn commented Apr 15, 2024

google-cla bot commented Apr 15, 2024

MJDrgn commented Apr 15, 2024 •

edited

Loading

Prioritise Unicode Full encoding over Unicode BMP Only #93

Are you sure you want to change the base?

Prioritise Unicode Full encoding over Unicode BMP Only #93

Conversation

MJDrgn commented Apr 15, 2024

google-cla bot commented Apr 15, 2024

MJDrgn commented Apr 15, 2024 • edited Loading

MJDrgn commented Apr 15, 2024 •

edited

Loading