Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

html being decoded and triggering as side effect. #25

Open
mbret opened this issue Dec 2, 2024 · 2 comments
Open

html being decoded and triggering as side effect. #25

mbret opened this issue Dec 2, 2024 · 2 comments

Comments

@mbret
Copy link

mbret commented Dec 2, 2024

I have an epub with this content:

<section class="sect2" title="Figures">
					<h3 class="title" id="_figures">Figures</h3>
					<p>Coming up for a quick breath of fresh air before descending into another
						accessibility attribute pain point, HTML5 introduces the handy new <code
							class="literal">figure</code> element for encapsulating content
						associated with an image, table, or code example. Grouping related content
						elements together, as is becoming an old theme now, makes it simpler for a
						reader to navigate and understand your content:</p>
					<pre class="screen">&lt;figure&gt;
    &lt;img src="images/blob.jpeg" alt="the blob"/&gt;
    &lt;figcaption&gt;
        Figure 3.7 &amp;#x2014; The blob is digesting Steve McQueen in this
        unreleased ending to the classic movie.
    &lt;/figcaption&gt;
&lt;/figure&gt;</pre>
					<p>Unfortunately, there is little support for these two new elements at this
						time, so they get treated as no better than <code class="literal">div</code>
						elements. That said, it’s still preferable to future-proof your data and do
						the right thing, as support will catch up, especially since the only other
						alternative is semantically meaningless <code class="literal">div</code>
						elements.</p>
				</section>

The problematic is the figure section. When the method decodeEntities is called with the text node, the el.innerHTML = str instruction will actually load and run the html code that was in the figure. Unintentionally trigger http requests and other wort things.

Is it an actual bug or this side effect is expected and we should not work with live document ?

In the meantime I was trying to find a safer alternative and here what I came up with:

decodeEntities(dom: Document, str: string) {
    try {
      const parser = new DOMParser()
      const doc = parser.parseFromString(
        `<!DOCTYPE html><text>${str}</text>`,
        "text/html",
      )
      return doc.querySelector("text")?.textContent || str
    } catch (err) {
      // TODO fall back to simpler decode?
      // e.g. regex match for stuff like &#160; and &nbsp;
      return str
    }
  }

It does not crash and seems to be returning the correct node and CFI. I am not super confident regarding performance of using DOMParser vs the same document where we create an new temporary element

@mbret
Copy link
Author

mbret commented Dec 2, 2024

Okay so another alternative I just though which could be faster is to create a temporary dom like this:

const newDom = document.implementation.createHTMLDocument()
      const el = newDom.createElement(`textarea`)
      el.innerHTML = str
      return el.value || ``

Since it's not the current dom, it does not trigger html resources. However I believe we could initialize it on the constructor and destroy it once the CFI is resolved, that could increase performances.

Additionally, why is this decoding needed to begin with ? Why isn't the string just enough ?

@Juul
Copy link
Member

Juul commented Dec 3, 2024

The problematic is the figure section. When the method decodeEntities is called with the text node, the el.innerHTML = str instruction will actually load and run the html code that was in the figure. Unintentionally trigger http requests and other wort things.

Wow nice find!

Is it an actual bug or this side effect is expected and we should not work with live document ?

Yeah this is definitely a bug.

I don't have time to look at this right now but I'll try to squeeze it in some time this week. Do ping me again if I forget to get back to you. Feel free to make a pull request with your proposed solution and thanks for reporting this!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants