Fix: For invalid HTML entity &neq; #2272

theapu · 2024-01-25T04:40:34Z

&neq; is not a valid HTML entity. Replaced it with correct Unicode Escape Sequence \u2260
HTML/XML Numeric Character References and Unicode Escape Sequences are used for various math symbols. So the XML output contains both HTML or unicode entities like ⅈ and utf8 characters. This XML does not parse with strict DTDs that allows only unicode charactes. So for consistant mathml output HTML/XML Numeric Character References are replaced with Unicode Escape Sequences so that the output mathml contains utf8 characters.

…ode Escape Sequence like \u2148 for consistancy.

arnog · 2024-01-25T05:16:37Z

Good catch about &neq;.

Could you provide more info regarding the character entities, though? You're saying that there are cases where character entities are not valid Math-ML? That's surprising. I would have though that non-ASCII characters would actually be more of an issue since XML is not always encoded in UTF-8 (see https://www.w3.org/TR/xml/#charencoding).

If character entities are problematic, I would suggest to use character references instead &#x2061 rather than \u2061 ⁡(⁡) since that character is invisible (and this would follow the MathML spec recommendation: https://www.w3.org/TR/MathML2/appendixa.html#parsing-charents)

theapu · 2024-01-25T06:09:30Z

Good catch about &neq;.

Could you provide more info regarding the character entities, though? You're saying that there are cases where character entities are not valid Math-ML? That's surprising. I would have though that non-ASCII characters would actually be more of an issue since XML is not always encoded in UTF-8 (see https://www.w3.org/TR/xml/#charencoding).

If character entities are problematic, I would suggest to use character references instead &#x2061 rather than \u2061 ⁡(⁡) since that character is invisible (and this would follow the MathML spec recommendation: https://www.w3.org/TR/MathML2/appendixa.html#parsing-charents)

In current mathlive \alpha \beta generates following mathml output

<mrow><mi>&#x03b1;</mi><mo>&#8290;</mo><mi>β</mi></mrow>

It contains unocode entity, HTML entity and utf8 character. In my use case mathml should contain ether unocode entity or HTML entity or utf8, not a mix of all these.

i.e. (all unicode entities)

<mrow><mi>&#x03b1;</mi><mo>&#x2062;</mo><mi>&#x3b2;</mi></mrow>

OR (all utf8 characters)

<mrow><mi>α</mi><mo>⁢</mo><mi>β</mi></mrow>

OR (all html entities)

<mrow><mi>&#945;</mi><mo>&#8290;⁢</mo><mi>&#946;</mi></mrow>

Will this be possible?

arnog · 2024-01-25T07:49:29Z

OK, I guess I don't understand your use case. What you call "Unicode entities" and "HTML entities" are both character entities, one using hexadecimal, the other decimal. An XML parser that can't parse them, or can only parse one and not the other does not follow the spec.

I would rather consistently use character entities in hexadecimal (i.e. &#xnnnn;).

theapu · 2024-01-25T13:20:28Z

I have reverted the changes except the one for &neq;. Evenif I use entities &#xXXXX some characters like \beta still appear as utf8 strings. It is better to use a post processing in mathml from mathlive at our end to convert different entities to utf8 characters for my use case.

arnog · 2024-01-25T18:49:34Z

This has been superseded by another PR.

Replaced HTML/XML Numeric Character Reference like ⅈ with Unic…

0b1ffdb

…ode Escape Sequence like \u2148 for consistancy.

theapu added 2 commits January 25, 2024 18:21

reverted the change to utf8

b9bcbcb

&neq; is invalid entity. Changed it to ≠

d4261c2

arnog closed this Jan 25, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix: For invalid HTML entity &neq; #2272

Fix: For invalid HTML entity &neq; #2272

theapu commented Jan 25, 2024

arnog commented Jan 25, 2024

theapu commented Jan 25, 2024

arnog commented Jan 25, 2024

theapu commented Jan 25, 2024

arnog commented Jan 25, 2024

Fix: For invalid HTML entity &neq; #2272

Fix: For invalid HTML entity &neq; #2272

Conversation

theapu commented Jan 25, 2024

arnog commented Jan 25, 2024

theapu commented Jan 25, 2024

arnog commented Jan 25, 2024

theapu commented Jan 25, 2024

arnog commented Jan 25, 2024