-
-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix: For invalid HTML entity &neq; #2272
Conversation
theapu
commented
Jan 25, 2024
- &neq; is not a valid HTML entity. Replaced it with correct Unicode Escape Sequence \u2260
- HTML/XML Numeric Character References and Unicode Escape Sequences are used for various math symbols. So the XML output contains both HTML or unicode entities like ⅈ and utf8 characters. This XML does not parse with strict DTDs that allows only unicode charactes. So for consistant mathml output HTML/XML Numeric Character References are replaced with Unicode Escape Sequences so that the output mathml contains utf8 characters.
…ode Escape Sequence like \u2148 for consistancy.
Good catch about Could you provide more info regarding the character entities, though? You're saying that there are cases where character entities are not valid Math-ML? That's surprising. I would have though that non-ASCII characters would actually be more of an issue since XML is not always encoded in UTF-8 (see https://www.w3.org/TR/xml/#charencoding). If character entities are problematic, I would suggest to use character references instead |
In current mathlive \alpha \beta generates following mathml output
It contains unocode entity, HTML entity and utf8 character. In my use case mathml should contain ether unocode entity or HTML entity or utf8, not a mix of all these. i.e. (all unicode entities)
OR (all utf8 characters)
OR (all html entities)
Will this be possible? |
OK, I guess I don't understand your use case. What you call "Unicode entities" and "HTML entities" are both character entities, one using hexadecimal, the other decimal. An XML parser that can't parse them, or can only parse one and not the other does not follow the spec. I would rather consistently use character entities in hexadecimal (i.e. |
I have reverted the changes except the one for |
This has been superseded by another PR. |