-
Notifications
You must be signed in to change notification settings - Fork 85
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Text chunk handlers are deceptively difficult to use correctly #255
Comments
Some ideas how to mitigate this:
|
Text chunks can be subdivided into smaller pieces by input boundaries in
rewriter.write()
and the buffer inTextDecoder
. Our own tests incorrectly assumed this never happens (#256).This arbitrary splitting makes the text chunk handlers much more complicated to use than they seem, because the handlers don't get an equivalent of a single DOM text node. They may be invoked many times on arbitrarily small pieces of text, which could be as small as a single codepoint.
Mutations like
.before()
and.after()
are performed for each arbitrary fragment the handler has been invoked on, not before/after the full run of text between tags. Similarly.replace()
replaces each individual bit of text, not the whole run of text, so simply callingchunk.replace("new text")
is insufficient and incorrect. You have to have a stateful handler that callschunk.replace("")
on all other pieces.Splits make text search very tricky. You can't use
chunk.as_str().contains("needle")
, because the handler could be invoked on"n", "ee", "dle"
. Search can't be done efficiently with just a state machine, because by the time you find the needle, you may have already "handled" the earlier chunks. So text search requires buffering of the text and removing all text chunks proactively until the match.This behavior makes text chunk handlers quite different from comment and element handlers.
The text was updated successfully, but these errors were encountered: