-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
parsing issues with converted transducer #57
Comments
Just in case: could it be #9 ? (Does the path for |
I mean, yes, but so do all other paths, in this transducer:
Cf.
But the latter path is in a separate section of the transducer, separated by
|
Hm, I think #9 might be about initial epsilons on input-side only (ie. not aligned, as in It's correct that Is analysis of |
Most of what's above the |
I believe the attcompiler's I am not sure I can work on it given my GSoC project. |
Paths in the FST are classified based on the first non-tag non-epsilon symbol on the input side.
In this case, both I think maybe the solution here is to allow two |
Isn't the solution rather to compile into |
Upon further investigation I think you're right, but I'm not sure how to do that efficiently. Checking whether the initial character is punctuation can almost be done while reading in the file, but I'm having trouble coming up with something better than On the other hand, maybe that's not so bad and really I should test this. |
I feel like this should also somehow be possible to solve by first reading them all into standard and then somehow splitting, or copying those paths into inconditional. (Like take the intersect with |
hfst-proc
behaviour (expected):lt-proc
behaviour (second one is unexpected):Specifically,
с.
doesn't receive an analysis above; instead the.
alone receives an analysis. My expectation is that the parsing would be LMLR, but it seems to be something else?The text was updated successfully, but these errors were encountered: