-
-
Notifications
You must be signed in to change notification settings - Fork 662
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Compile liblouis with 32 bit widechars #9544
Compile liblouis with 32 bit widechars #9544
Conversation
cc: @DrSooom you have spent lot of work improving displaying of unicode characters. I think your thoughts are here also very apreciated. |
@Adriani90: Thanks for the notification. It seems that @LeonarddeR split PR #9044 into PR #9544 and #9545 and updated both. @LeonarddeR: Please also see #8702 and liblouis/liblouis#730.
This isn't fully correct due to the definition (e.g. "undefined 0") in some braille tables. Undefined Unicode characters can also be displayed just as ⠀ (dot 0). Please read the HUC Braille Tables documentation for further details. The one and only thing I have to know here is if I have to change all yhhhhh definitions to zhhhhhhhh definitions in the HUC Braille Tables. But these replacements are done quite quickly – in compare of the whole creating process of the HUC Braille Tables. Well, after NVDA fully supports UTF-32 characters I have to update the HUC Braille Tables documentation as well, because it references to NVDA 2019.1 yet. Personally I really want to see the UTF-32 support in NVDA, because by using the HUC Braille Tables the amount of necessary braille characters for an undefined Unicode character between U+10000 and U+10FFFF is reduced from 16 to 3 8-dot braille characters. That would be great. PS: In less than four hours I'm sitting in the train to the SightCity 2019 where I'm going to inform some people about the existence of the HUC Braille Tables. You can find A7 handouts ((cc) by-sa in EN and DE) regarding the HUC Braille Tables here on my website. |
These are out of scope for this pr. This pr aims at fixing braille issues introduced when switching to Python 3, nothing more than that.
Thanks, I will fix this entry. |
Fair enough. I understand now. Leave this change in.
|
I changed the base branch to threshold_py3_staging. I think threshold_py3_staging is now in a state where it might even need this for braille unit tests to pass at some point. |
Link to issue number:
Closes #6695
Summary of the issue:
Liblouis currently uses a 2 byte encoding to process braille. This is pretty annoying when displaying emoji, as they are 32 bit unicode characters. For example, 😉 is usually printed as '\xd83d''\xde09'.
More importantly though, using a 2 byte encoding with Python 3 is subject to break things in a major way. The braille module uses brailleToRawPos and rawToBraillePos to mape braille characters to real characters. In python 2, unicode strings are internally saved with a two byte encoding. Therefore, 32 bit unicode characters take two indexes or offsets in a string. In python 3, one index/offset corresponds with a code point. Liblouis 2 byte wide characters played pretty nicely with Python 2 unicode strings, but with 16 bit wide characters on python 3, the rawToBraillePos and brailleToRawPos mappings do no longer match, as liblouis reads 😉 as two characters whether Python 3 reads them as one.
Description of how this pull request fixes the issue:
This compiles liblouis with 32 bit wide characters instead of 16. This means only one replacement pattern is printed for 32 bit characters instead of two, and it also should ensure that brailleToRawPos and rawTobraillePos mappings are correct, as both Python 3 and Liblouis UCS4 assume that all characters in the wild only take one offset in a string.
Testing performed:
This pr is pretty theoretically. Testing can be performed as soon as #9543 is merged. Therefore, I will mark this a draft until that's the case.
Known issues with pull request:
None known as of yet
Change log entry:
+Emoji and other 32 bit unicode characters now take less space on a braille display when they are shown as hexadecimal values. (Compiling Liblouis with 32-bit Unicode support #6695)