Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Boolean or not Boolean (attributes) #867

Open
JakeQZ opened this issue May 20, 2020 · 6 comments
Open

Boolean or not Boolean (attributes) #867

JakeQZ opened this issue May 20, 2020 · 6 comments

Comments

@JakeQZ
Copy link
Contributor

JakeQZ commented May 20, 2020

Arising from #831 is the handling of Boolean vs non-Boolean attributes by the Emogrifier package when parsing and serializing HTML, with respect to the empty attribute syntax.

The empty attribute syntax (e.g. in <input disabled> or <img alt> as opposed to <img alt="">) is permitted for all attributes in later versions of the HTML5 specification. However, in HTML 4 this syntax is technically an attribute value with an implicit attribute name, and only works for certain attribute keywords.

The current behaviour derives simply from that of PHP’s DOMDocument/libxml implementation, which is as follows:

  • Known Boolean attributes are always serialized with the empty attribute syntax;
  • The original syntax (empty attribute or otherwise) is preserved for all other attributes, through whether or not the DOMAttr node has a DOMText child after parsing.

As seen in the initial implementation for #831, this behaviour would change were Masterminds/html5-php to be used as is for HTML parsing and serializing (all attributes with an empty string value would be serialized with the empty attribute syntax).

Moving forward, we have several options:

  1. Allow the change in behaviour;
  2. Ensure that the current behaviour is preserved precisely (i.e. including preserving whether non-Boolean attributes were originally specified with the empty attribute syntax or otherwise);
  3. Ensure that the current behaviour is preserved in so far as to maintain compatibility with earlier HTML specifications (i.e. this would mean always serializing Boolean attributes with the empty attribute syntax, but never doing so for non-Boolean attributes).

I do not think option 1 is viable, for the following reasons:

  • Parsers which do not support HTML5 will parse HTML5 documents as HTML 4;
  • The empty attribute syntax is not supported for non-Boolean attributes in HTML 4;
  • The empty attribute syntax may not be supported by parsers which pre-date or are not fully conformant with the later HTML5 specification;
  • The primary purpose of Emogrifier is to achieve the correct display (as much as possible) of richly-formatted HTML emails by older and/or less standards-conformant email clients;
  • The attached email demonstrates that Outlook 2010, at least (probably also Outlook 2013 and likely others, not tested), ignores completely some attributes with the empty value syntax (<a href=""> is rendered as a link whereas <a href> is not).

With option 2 or 3, I think we should also (first) add tests to enforce the desired behaviour (even though these would effectively be testing a third-party library), then there are a couple of alternative approaches that could be taken:

  1. Extend the Masterminds/html5-php classes to achieve the desired behaviour;
  2. Submit a PR to Masterminds/html5-php to make the desired behaviour available via a simple option.

It is worth noting that logic exists in Masterminds/html5-php for whether to serialize attributes using the empty attribute syntax depending on whether they are Boolean, but it is currently only enacted if the document has an explicit, non-default namespace.

@JakeQZ
Copy link
Contributor Author

JakeQZ commented May 20, 2020

I forgot to add the RFC822 .eml attachment, so here it is (you'll have to rename the file because GitHub only supports attachments with certain file extensions): boolean-attribute-test.eml

I've now also tested with Windows 10 Mail (latest version), and found (slightly but not entirely surprisingly) it does not support the HTML5 empty attribute syntax either:

windows-10-mail-fail

So I definitly don't think Option 1 above is viable. There are too many people using Windows 10 Mail!

OTOH, at least we now know that the Windows 10 Mail app is still lagging at least 10 years behind in terms of conformance to specifications (incidentally, and quite annoyingly, it doesn't support Reply-To either).

@bytestream
Copy link

What changes do you propose upstream to achieve it? The spec seems to suggest both disabled and disabled="" are valid, the latter being backwards compatible with XHTML syntax. An option could be added on whether to specify which format you want to use but I'm not sure how willing they'd be to add that.

Need to open an issue and see what they might be open to. That will then decide what should happen next..

@JakeQZ
Copy link
Contributor Author

JakeQZ commented May 23, 2020

See #831 (comment)

Really, an option to short-circuit the namespace tests and simply return true for anything in that list in OutputRules::nonBooleanAttributes.

However, I think that list is back to front. It should be listing attributes known as Boolean, rather than the other way round. Any unknown attribute should be classified as non-Boolean.

So that's at least two PRs...

@bytestream
Copy link

I opened an issue for this upstream to see what they say before submitting a PR. Please comment on it if you have anything to add!

@oliverklee
Copy link
Contributor

Thanks! Would you be so kind to post the link?

@bytestream
Copy link

Masterminds/html5-php#184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants