Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix: For invalid HTML entity &neq; #2272

Closed
wants to merge 3 commits into from
Closed

Conversation

theapu
Copy link
Contributor

@theapu theapu commented Jan 25, 2024

  1. &neq; is not a valid HTML entity. Replaced it with correct Unicode Escape Sequence \u2260
  2. HTML/XML Numeric Character References and Unicode Escape Sequences are used for various math symbols. So the XML output contains both HTML or unicode entities like ⅈ and utf8 characters. This XML does not parse with strict DTDs that allows only unicode charactes. So for consistant mathml output HTML/XML Numeric Character References are replaced with Unicode Escape Sequences so that the output mathml contains utf8 characters.

…ode Escape Sequence like \u2148 for consistancy.
@arnog
Copy link
Owner

arnog commented Jan 25, 2024

Good catch about &neq;.

Could you provide more info regarding the character entities, though? You're saying that there are cases where character entities are not valid Math-ML? That's surprising. I would have though that non-ASCII characters would actually be more of an issue since XML is not always encoded in UTF-8 (see https://www.w3.org/TR/xml/#charencoding).

If character entities are problematic, I would suggest to use character references instead &#x2061 rather than \u2061 ⁡(⁡) since that character is invisible (and this would follow the MathML spec recommendation: https://www.w3.org/TR/MathML2/appendixa.html#parsing-charents)

@theapu
Copy link
Contributor Author

theapu commented Jan 25, 2024

Good catch about &neq;.

Could you provide more info regarding the character entities, though? You're saying that there are cases where character entities are not valid Math-ML? That's surprising. I would have though that non-ASCII characters would actually be more of an issue since XML is not always encoded in UTF-8 (see https://www.w3.org/TR/xml/#charencoding).

If character entities are problematic, I would suggest to use character references instead &#x2061 rather than \u2061 ⁡(⁡) since that character is invisible (and this would follow the MathML spec recommendation: https://www.w3.org/TR/MathML2/appendixa.html#parsing-charents)

In current mathlive \alpha \beta generates following mathml output

<mrow><mi>&#x03b1;</mi><mo>&#8290;</mo><mi>β</mi></mrow>

It contains unocode entity, HTML entity and utf8 character. In my use case mathml should contain ether unocode entity or HTML entity or utf8, not a mix of all these.

i.e. (all unicode entities)

<mrow><mi>&#x03b1;</mi><mo>&#x2062;</mo><mi>&#x3b2;</mi></mrow>

OR (all utf8 characters)

<mrow><mi>α</mi><mo>⁢</mo><mi>β</mi></mrow>

OR (all html entities)

<mrow><mi>&#945;</mi><mo>&#8290;⁢</mo><mi>&#946;</mi></mrow>

Will this be possible?

@arnog
Copy link
Owner

arnog commented Jan 25, 2024

OK, I guess I don't understand your use case. What you call "Unicode entities" and "HTML entities" are both character entities, one using hexadecimal, the other decimal. An XML parser that can't parse them, or can only parse one and not the other does not follow the spec.

I would rather consistently use character entities in hexadecimal (i.e. &#xnnnn;).

@theapu
Copy link
Contributor Author

theapu commented Jan 25, 2024

I have reverted the changes except the one for &neq;. Evenif I use entities &#xXXXX some characters like \beta still appear as utf8 strings. It is better to use a post processing in mathml from mathlive at our end to convert different entities to utf8 characters for my use case.

@arnog
Copy link
Owner

arnog commented Jan 25, 2024

This has been superseded by another PR.

@arnog arnog closed this Jan 25, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants