Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ignore invalid HTML (or self closed tags) #251

Open
titospeakap opened this issue Jul 26, 2024 · 0 comments
Open

Ignore invalid HTML (or self closed tags) #251

titospeakap opened this issue Jul 26, 2024 · 0 comments

Comments

@titospeakap
Copy link

titospeakap commented Jul 26, 2024

Up to version 2.2.0, the following HTML code will be fully parsed

<H1>Heading 1</h1>
<p>Paragraph
<b>Second</b> line.</p>
<ul><li>List item 1</li><li>List item 2<ul><li>List item 2.1</li><li>List item 2.2</li></ul></li><li>List item 3</ul>
<p>Paragraph 2</p>
<h2>Heading 2</h2>
<p>Paragraph 3</p>
<p><img alt="image" width="100" height="20"></p>
<audio />
<video />
<p><a data-rel="attachment">attachment</a></p>
<p>Another paragraph. <a href="http://url.to.link">Hyperlink</a>.</p>
<ol><li>List item 1</li><li>List item 2<ol><li>List item 2.1</li><li>List item 2.2</li></ol></li><li>List item 3</ol>

In more recent versions, it stops parsing at the tag <audio /> (if I change to be <audio></audio>, it works), but no errors are generated (->hasErrors() returns false).

Is this behaviour intentional? and is there a way in more recent version to replicate what happens in version 2.2.0 or below?

For the HTML shared above, here is the code I'm running

$html5 = new HTML5();
$html5->loadHTMLFragment($html);
foreach ($fragment->childNodes as $child) {
        echo $child->nodeName . "\n";
 }

And the respective output in version 2.9.0:

h1
#text
p
#text
ul
#text
p
#text
h2
#text
p
#text
p
#text
audio

but for version 2.2.0, I get

h1
#text
p
#text
ul
#text
p
#text
h2
#text
p
#text
p
#text
audio
#text
video
#text
p
#text
p
#text
ol
@titospeakap titospeakap changed the title Ignore invalid HTML Ignore invalid HTML tags Jul 26, 2024
@titospeakap titospeakap changed the title Ignore invalid HTML tags Ignore invalid HTML (or self closed tags) Jul 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant