Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

No range version of unicharchart does not work #29

Open
devosb opened this issue Oct 26, 2021 · 4 comments
Open

No range version of unicharchart does not work #29

devosb opened this issue Oct 26, 2021 · 4 comments
Assignees

Comments

@devosb
Copy link
Contributor

devosb commented Oct 26, 2021

When using Font Proof with \unicharchart[type=all] or \unicharchart[type="all",columns="12",rows="16"] an error is generated

/usr/share/sile/languages/unicode.lua:157: Word break parser failure: U_INVALID_CHAR_FOUND

This line looks like it was added two years ago. Using \unicharchart[type=range, ...] works. Any idea @alerque ? My group is not using type=all (well, of course not, since it does not seem to work) but might if it works. However, nothing is being blocked so this is a low priority for us. Looks like it has been broken for a while, since the test file for this has the type=all test commented out.

@alerque
Copy link
Member

alerque commented Oct 29, 2021

Not off the top of my head, but it sounds like an ICU or SILE problem more than it does something in fontproof itself. Unless perchance there is invalid data being passed somewhere.

@alerque alerque self-assigned this Oct 29, 2021
@iandoug
Copy link

iandoug commented Apr 4, 2022

Noobie, trying similar.

\setTestFont[family="Libertinus Math"]
\font[size=14pt,weight=700]{Range test}\par
\unicharchart[type=range,start=0000,end=FFFF]

(after multiple similar)

! Underfull frame: 236.21079561pt stretchiness required to fill but only 116pt available at ianuni.sil: in \bigskip near 4:1: in \unicharcharttable: 0x559848b595e0
[286] 
! Underfull frame: 236.21079561pt stretchiness required to fill but only 116pt available at ianuni.sil: in \bigskip near 4:1: in \unicharcharttable: 0x559848b595e0
[287] 
! Underfull frame: 236.21079561pt stretchiness required to fill but only 116pt available at ianuni.sil: in \bigskip near 4:1: in \unicharcharttable: 0x559848b595e0
[288] 
Error detected:
        /usr/local/share/sile/languages/unicode.lua:163: Word break parser failure: U_INVALID_CHAR_FOUND

Cheers, Ian

@alerque
Copy link
Member

alerque commented Apr 7, 2023

Definitely a bug in here somewhere. We're getting tripped up at U+0941 or in the 8 thereafter, which should be valid inputs, but flushing them to the word breaker kills SILE.

@Omikhleia
Copy link
Member

Omikhleia commented Mar 20, 2024

An issue opened for 2+ years, about something "probably broken for a while"... Doh, probably no one really cares or uses this feature...

Anyway, here is a hint of what might be going on...

See:

for cp = rangeStart,rangeEnd do
local uni = SU.utf8charfromcodepoint(tostring(cp))
glyphs[#glyphs+1] = { present = hasGlyph(uni), cp = cp, uni = uni }

Eventually this code will hit things such as U+D800 (start of high surrogates)...

The hasGlyph() check is not very well implemented and may return true, as Harfbuzz might shape the passed invalid codepoint as a replacement character U+FFFD...

Yet, ICU doesn't like that standalone U+D800 when it comes to typesetting it... Which indeed seems legit.

Proof of concept: Save the following snippet as "poc.lua":

SILE.call("font", { family = "Gentium Plus" })
local hasGlyph = function(g)
  local options = SILE.font.loadDefaults({})
  local newItems = SILE.shapers.harfbuzz:shapeToken(g, options)
  for i =1,#newItems do
    if newItems[i].gid > 0 then
      print("Glyph found", newItems[i].gid, newItems[i].name)
      return true
    end
  end
  return false
end

bad = luautf8.char(0xD800)
print(hasGlyph(bad))

icu = require("justenoughicu")
if hasGlyph(bad) then print(icu.breakpoints(bad, "en")) end

Then run:

$ sile -e='require("poc")'
SILE v0.14.17 (Lua 5.2)
Glyph found	2687	uniFFFD
true
/usr/bin/lua: ./poc.lua:18: Word break parser failure: U_INVALID_CHAR_FOUND
stack traceback: ....

As far as I can tell, this likely depends on the font e.g. replacing "Gentium Plus" by "Nimbus Sans" above, then hasGlyphreturns false and hence does not error...
And some fonts may perhaps have a different name for these replacement GIDs (e.g. .null or .notdef instead of uniFFFD), I don't know...

But in brief, just looping from 0 to FFFF and expecting Harfbuzz to report whether a glyph exists is probably a very bogus way.
(I don't know Unicode enough to tell if the "invalid" surrogates D800–DFFF are the only invalid sequences).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants