Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Compile liblouis with 32 bit widechars #9544

Merged
merged 7 commits into from
Jun 17, 2019
Merged

Compile liblouis with 32 bit widechars #9544

merged 7 commits into from
Jun 17, 2019

Conversation

LeonarddeR
Copy link
Collaborator

@LeonarddeR LeonarddeR commented May 7, 2019

Link to issue number:

Closes #6695

Summary of the issue:

Liblouis currently uses a 2 byte encoding to process braille. This is pretty annoying when displaying emoji, as they are 32 bit unicode characters. For example, 😉 is usually printed as '\xd83d''\xde09'.

More importantly though, using a 2 byte encoding with Python 3 is subject to break things in a major way. The braille module uses brailleToRawPos and rawToBraillePos to mape braille characters to real characters. In python 2, unicode strings are internally saved with a two byte encoding. Therefore, 32 bit unicode characters take two indexes or offsets in a string. In python 3, one index/offset corresponds with a code point. Liblouis 2 byte wide characters played pretty nicely with Python 2 unicode strings, but with 16 bit wide characters on python 3, the rawToBraillePos and brailleToRawPos mappings do no longer match, as liblouis reads 😉 as two characters whether Python 3 reads them as one.

Description of how this pull request fixes the issue:

This compiles liblouis with 32 bit wide characters instead of 16. This means only one replacement pattern is printed for 32 bit characters instead of two, and it also should ensure that brailleToRawPos and rawTobraillePos mappings are correct, as both Python 3 and Liblouis UCS4 assume that all characters in the wild only take one offset in a string.

Testing performed:

This pr is pretty theoretically. Testing can be performed as soon as #9543 is merged. Therefore, I will mark this a draft until that's the case.

Known issues with pull request:

None known as of yet

Change log entry:

@Adriani90
Copy link
Collaborator

cc: @DrSooom you have spent lot of work improving displaying of unicode characters. I think your thoughts are here also very apreciated.

@DrSooom
Copy link

DrSooom commented May 8, 2019

@Adriani90: Thanks for the notification. It seems that @LeonarddeR split PR #9044 into PR #9544 and #9545 and updated both.

@LeonarddeR: Please also see #8702 and liblouis/liblouis#730.

+Emoji and other 32 bit unicode characters now take less space on a braille display when they are undefined in a braille translation table. (#6695)

This isn't fully correct due to the definition (e.g. "undefined 0") in some braille tables. Undefined Unicode characters can also be displayed just as ⠀ (dot 0). Please read the HUC Braille Tables documentation for further details.

The one and only thing I have to know here is if I have to change all yhhhhh definitions to zhhhhhhhh definitions in the HUC Braille Tables. But these replacements are done quite quickly – in compare of the whole creating process of the HUC Braille Tables. Well, after NVDA fully supports UTF-32 characters I have to update the HUC Braille Tables documentation as well, because it references to NVDA 2019.1 yet.

Personally I really want to see the UTF-32 support in NVDA, because by using the HUC Braille Tables the amount of necessary braille characters for an undefined Unicode character between U+10000 and U+10FFFF is reduced from 16 to 3 8-dot braille characters. That would be great.

PS: In less than four hours I'm sitting in the train to the SightCity 2019 where I'm going to inform some people about the existence of the HUC Braille Tables. You can find A7 handouts ((cc) by-sa in EN and DE) regarding the HUC Braille Tables here on my website.

@LeonarddeR
Copy link
Collaborator Author

@LeonarddeR: Please also see #8702 and liblouis/liblouis#730.

These are out of scope for this pr. This pr aims at fixing braille issues introduced when switching to Python 3, nothing more than that.

+Emoji and other 32 bit unicode characters now take less space on a braille display when they are undefined in a braille translation table. (#6695)

Thanks, I will fix this entry.

@LeonarddeR LeonarddeR requested a review from michaelDCurran May 30, 2019 19:22
@michaelDCurran
Copy link
Member

michaelDCurran commented May 31, 2019 via email

@LeonarddeR LeonarddeR marked this pull request as ready for review June 13, 2019 11:15
@LeonarddeR LeonarddeR changed the base branch from threshold to threshold_py3_staging June 13, 2019 11:16
@LeonarddeR
Copy link
Collaborator Author

I changed the base branch to threshold_py3_staging. I think threshold_py3_staging is now in a state where it might even need this for braille unit tests to pass at some point.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants