Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add end tag handler for elements #97

Merged
merged 1 commit into from
Jul 3, 2021

Conversation

jongiddy
Copy link
Collaborator

@jongiddy jongiddy commented Jul 3, 2021

Provide the ability to run a handler for the end tag. Rather then provide a selector for the end tag, this uses the existing selector on the start tag, and adds an on_end_tag method to Element to provide a handler that gets triggered when the end tag is reached.

This should be adequate to resolve the issues described in #85, #93, and #96. I'll add an example to #93 of that code using this PR.

This PR also updates the TextChunk docs to emphasize that last_in_text_node can be true multiple times inside an element when there are sub-elements, and point to the new Element::on_end_tag method as an alternative.

@jongiddy
Copy link
Collaborator Author

jongiddy commented Jul 3, 2021

#85 and #96 appear to request this feature for Javascript, so this PR may not resolve them. This PR only changes the Rust implementation.

@kornelski kornelski merged commit 7c2f694 into cloudflare:master Jul 3, 2021
@ZebraFlesh
Copy link

What's the process for making this feature available via the JS API? Bug my account manager? 😂

@xtuc
Copy link
Member

xtuc commented Jul 6, 2021

@ZebraFlesh Yes, please ask your account manager to open a feature request and the Cloudflare worker's team will take it from there.

@ZebraFlesh
Copy link

@xtuc Now I'm mildly confused: I would have expected this PR to modify the functionality for the JS API in this repo: https://github.com/cloudflare/lol-html/blob/master/js-api/src/element.rs Otherwise, why does that code exist in this repo? (Seems like this PR creates a diverging set of functionality.)

@xtuc
Copy link
Member

xtuc commented Jul 6, 2021

The JS api is meant to generate a Wasm module and be used directly from JS. However, Cloudflare workers uses the C API and uses v8 to expose to JS.

@ZebraFlesh
Copy link

Now I'm more confused: there's base functionality which is exposed via 2 different APIs. Doesn't this PR add base functionality which is not exposed to those APIs? Are the APIs not required to be kept in sync with the base Rust functionality?

@jongiddy
Copy link
Collaborator Author

jongiddy commented Jul 7, 2021

Hi @ZebraFlesh. I'm the author of this change. Note that I'm not affiliated with Cloudflare. I'm a user of the lol-html library. It's a great library, but, like others, I found the lack of an end tag handler made my code more complicated.

Looking into the code I was able to work out how to add this capability to the core library. This solved my specific problem, and it's great if other people can benefit from my change, so I submitted it as a PR. I don't use the API layers, and, although they appear to be quite slim wrappers around the core library, I'm not so familiar with the translation mechanisms they use. If my change had been rejected due to lacking this code, rather than add them, I would likely just have kept using my personal fork of lol-html to make my code better.

This is why there is a split in supported capabilities between the layers. Hopefully someone who does understand the API layers better than me can bring them up to the same capability.

@jyn514
Copy link
Contributor

jyn514 commented Dec 23, 2021

@ZebraFlesh I've opened #121 to expose this in the C API; there's a little more work required in the Workers runtime to expose this, but I've started it already and I hope to have it available by the new year.

@jyn514
Copy link
Contributor

jyn514 commented Dec 27, 2021

@jongiddy I realized while writing some tests for this that it doesn't allow access to Element in the closure; is that intentional?

error[E0477]: the type `[closure@examples/defer_scripts/main.rs:25:35: 31:22]` does not fulfill the required lifetime
  --> examples/defer_scripts/main.rs:25:24
   |
25 |                     el.on_end_tag(|end_tag| {
   |                        ^^^^^^^^^^
   |
   = note: type must satisfy the static lifetime
error[E0495]: cannot infer an appropriate lifetime due to conflicting requirements
  --> examples/defer_scripts/main.rs:25:35
   |
25 |                       el.on_end_tag(|end_tag| {
   |  ___________________________________^
26 | |                         if el.has_attribute("first-div") {
27 | |                             eprintln!("saw first div");
28 | |                         }
29 | |                         eprintln!("saw end tag {}", end_tag.name());
30 | |                         Ok(())
31 | |                     })?;
   | |_____________________^
   |
note: first, the lifetime cannot outlive the anonymous lifetime #3 defined here...
  --> examples/defer_scripts/main.rs:23:33
   |
23 |                   element!("div", |el| {
   |  _________________________________^
24 | |                     eprintln!("saw element {}", el.tag_name());
25 | |                     el.on_end_tag(|end_tag| {
26 | |                         if el.has_attribute("first-div") {
...  |
32 | |                     Ok(())
33 | |                 }),
   | |_________________^
note: ...so that the reference type `&&mut Element<'_, '_>` does not outlive the data it points at
  --> examples/defer_scripts/main.rs:25:21
   |
25 | /                     el.on_end_tag(|end_tag| {
26 | |                         if el.has_attribute("first-div") {
27 | |                             eprintln!("saw first div");
28 | |                         }
29 | |                         eprintln!("saw end tag {}", end_tag.name());
30 | |                         Ok(())
31 | |                     })?;
   | |______________________^
   = note: but, the lifetime must be valid for the static lifetime...
note: ...so that the type `[closure@examples/defer_scripts/main.rs:25:35: 31:22]` will meet its required lifetime bounds
  --> examples/defer_scripts/main.rs:25:24
   |
25 |                     el.on_end_tag(|end_tag| {
   |                        ^^^^^^^^^^

The fix is to calculate the info ahead of time, but that's non-obvious and seems kinda unnecessary ... maybe it would be better to pass &Element into the closure instead?

element!("div", |el| {
   eprintln!("saw element {}", el.tag_name());
   let is_first_div = el.has_attribute("first-div");
   el.on_end_tag(move |end_tag| {
       if is_first_div {
           eprintln!("saw first div");
       }
       eprintln!("saw end tag {}", end_tag.name());
       Ok(())
   })?;
   Ok(())
})

@jyn514
Copy link
Contributor

jyn514 commented Dec 28, 2021

Hmm, actually I realized that would prevent streaming the element contents, since before() would have to keep the whole text in memory. So there's probably not a good solution here other than documenting this as a footgun.

@jongiddy
Copy link
Collaborator Author

As you've realized, at the time of the on_end_tag callback being called, the contents of Element have been passed to the OutputHandler so the methods of Element are unusable. Hence we pass an EndTag, whose methods can affect the output just before and just after the end tag, as well as the end tag itself. (The Element.after handler could be modified to work here, but EndTag.before does the same thing).

I wouldn't call this a footgun. We pass an EndTag with methods that can be used to sensibly change some aspects of the overall element. We don't pass the Element to the end tag callback because that would be a footgun, allowing you to call methods that have no effect.

@methyl
Copy link

methyl commented Apr 26, 2022

@jyn514 any luck with exposing this on Workers runtime?

@jyn514
Copy link
Contributor

jyn514 commented Apr 26, 2022

@methyl yes, it's implemented today. It looks like the documentation is missing on https://developers.cloudflare.com/workers/runtime-apis/html-rewriter/#element, I'll try and get that fixed.

@methyl
Copy link

methyl commented May 3, 2022

@jyn514 in the meantime, what should be the method name? end doesn't seem to be called for Element Handlers.

EDIT: this is the API (found it quite surprising and inconsistent with the Document Handler, where you specify end method at the root):

class ElementHandler {
  element(el) {
    el.onEndTag(() => console.log("end tag"))
  }
}

new HTMLRewriter().on("a", new ElementHandler())

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants