You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Alternatively, could/should we do something like insert a special symbol and have fst_processor treat it specially? (any tools operating on the compiled fst's like lt-trim or lt-print|hfst-txt2fst|hfst-stuff would just have to treat it opaquely)
The text was updated successfully, but these errors were encountered:
Code can ask ICU for the list of characters, but then the finished FST will change depending on which version of ICU (and thus Unicode) it was built with. It could encode the version in the file and do it at runtime if the version differs.
Classes typically only get wider, so that sounds fine by me. I don't see a need for fst's to be perfectly reproducible when built on differing libraries – though encoding the ICU version in the file sounds like a good idea anyway.
The current binary format for alphabets makes some assumptions about alphabet symbols (see apertium/apertium-yid#3 (comment)) that I think would make having non-expanded class symbols almost certainly require a file version bump (though I suppose you'd get that from including the ICU version anyway...).
At the very least, getting Lower and Upper ranges would be nice, so we could
and whatnot.
If we do the "simple" thing and just expand like ranges in https://github.com/apertium/lttoolbox/blob/acx-spaces/lttoolbox/regexp_compiler.cc we get quite a lot of transitions
– https://www.compart.com/en/unicode/category/Ll (probably unreliable source) claims 2155 lowercase letters. But maybe it's OK if we keep regexes in their own
<section>
– more research needed.Alternatively, could/should we do something like insert a special symbol and have fst_processor treat it specially? (any tools operating on the compiled fst's like lt-trim or lt-print|hfst-txt2fst|hfst-stuff would just have to treat it opaquely)
The text was updated successfully, but these errors were encountered: