Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Offset off when non-BMP characters are in the document #69

Open
cabo opened this issue Feb 26, 2023 · 0 comments
Open

Offset off when non-BMP characters are in the document #69

cabo opened this issue Feb 26, 2023 · 0 comments

Comments

@cabo
Copy link

cabo commented Feb 26, 2023

I have a document with a non-BMP character in it (scalar value ≥ 0x10000), namely 🤔.
All offsets that languagetool-server gives out appear to be moved one to the right in the rest of the document.
Possibly languagetool-server indicates offsets in UTF-16 code units and not in characters.
I don't know if languagetool-server can be coaxed into counting characters.
If not, probably the document needs to be searched for non-BMP characters and corrections applied on the found ones (expensive!).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant