Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix splitCssText again #1640

Merged
merged 18 commits into from
Feb 6, 2025

Conversation

eoghanmurray
Copy link
Contributor

See also PostHog/posthog-js#1668 for the downstream bug report, and further discussion in Slack (Thanks Paul D'Ambra for report and pointers)

This is a further improvement after performance fixes in #1615

This covers new scenarios as outlined in the tests. Test cases were recreated from a very large inline style node in https://hiring.workfully.com/signin which looks like it was in a shadow root :host(.productfruits--container) although I can't quite find it there now. I pulled the examples files from a breakpoint and have them locally, but the test cases here incorporate the important bit, including the split in the middle of a statement.
Ultimately the problem with the content which triggered this case was that margin-top: 0; as authored, gets serialized to margin-top: 0px;, which was preventing us finding the right point to split between normalized/unnormalized.

…n with `isAttachIframe`' test

 - it was working for me when the test was run in isolation (`-t` option), but when the entire cross-origin-iframes test was run, the change of iframe contents didn't seem to happen in time
… and we end up not finding a unique one - we should just go with the first one (note: this is still not binary search so could exhibit pathological behaviour)
Copy link

changeset-bot bot commented Jan 29, 2025

🦋 Changeset detected

Latest commit: 5743c7e

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 19 packages
Name Type
rrweb-snapshot Patch
rrweb Patch
rrdom Patch
rrdom-nodejs Patch
rrweb-player Patch
@rrweb/all Patch
@rrweb/replay Patch
@rrweb/record Patch
@rrweb/types Patch
@rrweb/packer Patch
@rrweb/utils Patch
@rrweb/web-extension Patch
rrvideo Patch
@rrweb/rrweb-plugin-console-record Patch
@rrweb/rrweb-plugin-console-replay Patch
@rrweb/rrweb-plugin-sequential-id-record Patch
@rrweb/rrweb-plugin-sequential-id-replay Patch
@rrweb/rrweb-plugin-canvas-webrtc-record Patch
@rrweb/rrweb-plugin-canvas-webrtc-replay Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@eoghanmurray eoghanmurray requested a review from Juice10 January 29, 2025 16:06
const prevTextContent = childNodes[i - 1].textContent;
if (prevTextContent && typeof prevTextContent === 'string') {
// pick the first matching point which respects the previous chunk's approx size
const prevMinLength = normalizeCssString(prevTextContent).length;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i guess it's somewhere here that means if you run a test twice (or some multiple times) then you don't get the same output

(at least in my experience of testing whether this was deterministic when trying to figure out what was happening)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't quite understand this comment, but I would say the algorithm is indeed deterministic, but maybe you mean it will behave differently based on different sized inputs because of the jLimit bit?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, i wrote a test that ran the split multiple times and compared the output and it didn't match
was generally whitespace ending up on different sides of a split

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah okay, yes this is possible ... basically it stops when the normalised versions match (although I still think it would be deterministic given the same input twice). I can't imagine there being a problem if one side has more whitespace than it should have.

if (_testNoPxNorm) {
return cssText.replace(/(\/\*[^*]*\*\/)|[\s;]/g, '');
} else {
return cssText.replace(/(\/\*[^*]*\*\/)|[\s;]/g, '').replace(/0px/g, '0');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i briefly tested a very naive loop parser and it was slower than regex replace - i guess because browsers/v8 are doing some magic to optimise this already

but I didn't test it over a range of inputs

this is (from my testing) at best O(n) for whitespace - and for clarity since i'm not much of a comp sci person. if you insert whitespace into the input then this gets slower the more whitespace is present

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The performance of the normalization function could be improved. I've moreso tried to ensure it's not called repeatedly on the same piece of css (with the 'binary search' style changes in #1615 ).

Copy link
Contributor

@pauldambra pauldambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i am really not sure how i could roll this out in prod
the last version basically doubled our support load and i'm ending this week exhausted as a result

i have no idea of how to test if this fixes for all cases or just for a specific case

we seem to be writing a css parser, and i'm wondering if adopting a css parser would be safer

@eoghanmurray
Copy link
Contributor Author

eoghanmurray commented Feb 4, 2025

We're not attempting to write a CSS parser at record time, we are using cssRules to parse the CSS which uses the native browser capabilities.

In most cases, the splitCssText function will pass through input for <style> elements that have only one child without further processing, so these fixes are very much for edge cases, which I know is cold comfort for yourself who is dealing with the fallout from the exceptions being encountered in the wild.

The mutation issues that this splitting solves are also mostly theoretical, so it's possible to patch/short-circuit the splitCssText function (directly return [cssText]) for guaranteed performance, at the expense of correctness of the edge cases documented in the 'css splitter' suite. I introduced this splitting approach in #1437 and never dreamed it would cause performance issues so I didn't write it with performance in mind then, and have been patching up since.

we seem to be writing a css parser

I appreciate that the algorithm implemented here is not simple; I've thought about abstracting it out to a third party library "split a string according to an array of related substrings which can be matched via a normalization function" ... I haven't looked into whether such a thing already exists.

This PR is definitely an improvement and I believe catches the last pathological case, particularly as now there is now an additional jLimit iteration limit.

There is another large PR to move CSS parsing off the main thread at record time, however that would likely have hidden this problem rather than bringing it to the fore so painfully.

I've also another plan to ditch the whole cssRules approach, and hence the need to do any matching of split points, as we'd just use the textContent directly, when we can detect that the style element hasn't been modified programmatically, however I'm waiting for #1475 to get merged before we can look at that.

@@ -463,19 +470,24 @@ export function normalizeCssString(cssText: string): string {
export function splitCssText(
cssText: string,
style: HTMLStyleElement,
_testNoPxNorm = false,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be great to get some tsdoc documentation as to what this does, especially since _variableName normally means: an unused variable, in JS/TS land

@eoghanmurray eoghanmurray merged commit 3e9e42f into rrweb-io:master Feb 6, 2025
6 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants