Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(perf): improve perf of mutation and snapshot #1271

Closed
wants to merge 23 commits into from

Conversation

JonasBa
Copy link
Contributor

@JonasBa JonasBa commented Jul 30, 2023

This PR builds on what @Juice10 had found - I basically identified the same thing using different tools.

Motivation:
We started collecting browser profiles on some Sentry sites and noticed rrweb being the bottleneck when diffing large trees (common for loading screens which populate a large DOM tree).

Example profile collected by the SDK from unknown arch (platform is macos):
image

Profile for matching route collected locally (arm64 macos)
image

By looking at the code, I managed to identify a couple of bottlenecks:

  • isBlocking has a runtime of On^2 when checking for ancestors - I tried swapping that out for el.matches(.block *), I think it works faster, but I need to run benchmarks. The other approach could be to keep a cache of blocked elements and use compareDocumentPosition, but that should be benchmarked before.
  • There is some inefficiencies around Array.from(nodelist) when iterating over nodes and their children. This is just unnecessary as we are creating and discarding arrays where we could be using NodeList.forEach
  • genAdds is a recursive fn with possible deep recursion - I converted this to an iterative procedure. It causes the snapshot tests to fail as the procedure now runs in breadth first iteration (the data is there, just in different places and id's are different)
  • some functions are just inefficient, they can either return early or avoid extra calls/allocations with simple modifications (example of refactoring to for loop vs doing .map().filter()). There are small changes like hoisting static regular expressions, calling .test instead of .match when discarding match groups and avoid creating a lot of new strings via lowerIfExists). Similar case with calling while(queue.length) queue.shift(). Since shift is On, it means the runtime for that is actually O(n^2)

Benchmark results:

master:#c6600e7
'append 70 x 70 x 70 elements' │   3   │ 1691.3333333333333 │ '1701, 1683, 1690'
'append 1000 elements and reverse their order' │   3   │   350    │ '342, 358, 350'
'real events recorded on bugs.chromium.org' │   3   │   3116   │ '3090, 3140, 3118'
'create 1000x10 DOM nodes' │ 'benchmark-dom-mutation.html' │ 'window.workload()' │  10   │  116.5   │ '114, 144, 119, 120, 111, 110, 127, 102, 112, 106'
'create 1000x10x2 DOM nodes and remove a bunch of them' │ 'benchmark-dom-mutation-add-and-remove.html' │ 'window.workload()' │  10   │  470.9   │ '526, 658, 455, 462, 415, 461, 415, 465, 441, 411'
'create 1000 DOM nodes and append into its previous looped node' │ 'benchmark-dom-mutation-multiple-descendant-add.html' │ 'window.workload()' │   5   │   69.2   │ '67, 62, 64, 89, 64'
'create 10000 DOM nodes and move it to new container' │ 'benchmark-dom-mutation-add-and-move.html' │ 'window.workload()' │   5   │  205.4   │ '208, 196, 214, 216, 193'
#bad990b
'append 70 x 70 x 70 elements' │   3   │ 1722.3333333333333 │ '1753, 1716, 1698'
'append 1000 elements and reverse their order' │   3   │ 358.6666666666667 │ '357, 341, 378'
'real events recorded on bugs.chromium.org' │   3   │ 3128.3333333333335 │ '3219, 3061, 3105
'create 1000x10 DOM nodes' │ 'benchmark-dom-mutation.html' │ 'window.workload()' │  10   │  121.2   │ '124, 168, 110, 130, 112, 107, 131, 115, 106, 109'
'create 1000x10x2 DOM nodes and remove a bunch of them' │ 'benchmark-dom-mutation-add-and-remove.html' │ 'window.workload()' │  10   │  196.6   │ '182, 201, 207, 167, 166, 210, 166, 170, 318, 179'
'create 1000 DOM nodes and append into its previous looped node' │ 'benchmark-dom-mutation-multiple-descendant-add.html' │ 'window.workload()' │   5   │   43.2   │ '44, 46, 48, 40, 38'
'create 10000 DOM nodes and move it to new container' │ 'benchmark-dom-mutation-add-and-move.html' │ 'window.workload()' │   5   │  223.6   │ '214, 225, 247, 220, 212'

Main performance improvement seems to be for the following cases (I would like to re-run these after review in case there are mistakes in my PR):
create 1000x10x2 DOM nodes and remove a bunch of them which decreased from ~500ms to ~200ms
create 1000 DOM nodes and append into its previous looped node which decreased from ~70ms to ~40ms

I suspect that by being a bit smarter about operations should also decrease memory pressure and avoid GC sweeps, but that is hard to quantify (basing this off my experience)

Feedback/critics on the PR are much appreciated. For what it's worth, I am not very familiar with rrweb codebase so take this with a grain of salt, I may have invalidated some assumptions the lib relies on and gotten invalid benchmark results. If I did, please point it out and I will address it. If someone is interested, feel free to cherry pick my changes as this PR has gotten quite large and risky. Hopefully it inspires someone to further improve the performance of operations (I am also happy to split it myself)

While this does address performance of the current implementation, I think the largest win in this case would be to process genAdds in an async loop so that we avoid blocking the main thread (I lack context on everything that might break?). Besides that, removing try/catch statements around hot code paths like isBlocking should be reevaluated as it is a common optimization bailout reason in v8 (are those really required?)

@changeset-bot
Copy link

changeset-bot bot commented Jul 30, 2023

⚠️ No Changeset found

Latest commit: bc2c8f2

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 1, 2023

I re-ran the benchmark after 5d59621

append 70 x 70 x 70 elements' │   3   │ 1669.3333333333333 │ '1688, 1666, 1654
append 1000 elements and reverse their order' │   3   │   343    │ '347, 343, 339
real events recorded on bugs.chromium.org' │   3   │ 3020.6666666666665 │ '3128, 2970, 2964
create 1000x10 DOM nodes' │ 'benchmark-dom-mutation.html' │ 'window.workload()' │  10   │  117.6   │ '115, 120, 133, 112, 119, 112, 121, 114, 109, 121
create 1000x10x2 DOM nodes and remove a bunch of them' │ 'benchmark-dom-mutation-add-and-remove.html' │ 'window.workload()' │  10   │   179    │ '187, 209, 163, 165, 201, 163, 163, 211, 164, 164
create 1000 DOM nodes and append into its previous looped node' │ 'benchmark-dom-mutation-multiple-descendant-add.html' │ 'window.workload()' │   5   │   40.6   │ '41, 42, 44, 38, 38
create 10000 DOM nodes and move it to new container' │ 'benchmark-dom-mutation-add-and-move.html' │ 'window.workload()' │   5   │  218.8   │ '210, 217, 236, 219, 212

The timings are consistent and we can still see a boost for create + remove and create + append into previous looped node

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 1, 2023

Actually @eoghanmurray, running the benchmark here and testing matches with the .tagName combination and it seems to be even slower. I've looked at the blink implementation of matches (just because I was curious) and it seems that each individual selector is parsed and tested in a left to right order. I suspect the dip here is because the engine can short circuit the search if the selector it first checks after .block is a * selector as anything will match

@eoghanmurray
Copy link
Contributor

@Juice10 is gonna get the performance/benchmark tests in main shortly so will hold off on committing #1272 until we can do a nice demonstration of that in action, and hopefully apply the same to this PR

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 3, 2023

@eoghanmurray makes sense, just keep me updated and let me know how I can help further.

On a small side note, I looked into the blink implementation of matches and it confirms the difference between closest and matches implementations - closest applies the same check as ::matches, but it does so for each parent element. I suspect that * selector could be faster as the engine has to only compare the parent -> child relationship between nodes and not match against the actual selector as * will match anything.

bool SelectorDataList::matches(Element& targetElement) const
{
    if (m_needsUpdatedDistribution)
        targetElement.updateDistribution();

    unsigned selectorCount = m_selectors.size();
    for (unsigned i = 0; i < selectorCount; ++i) {
        if (selectorMatches(*m_selectors[i], targetElement, targetElement))
            return true;
    }

    return false;
}

Element* SelectorDataList::closest(Element& targetElement) const
{
    if (m_needsUpdatedDistribution)
        targetElement.updateDistribution();

    unsigned selectorCount = m_selectors.size();
+    for (Element* currentElement = &targetElement; currentElement; currentElement = currentElement->parentElement()) {
        for (unsigned i = 0; i < selectorCount; ++i) {
            if (selectorMatches(*m_selectors[i], *currentElement, targetElement))
                return currentElement;
        }
+    }
    return nullptr;
}

Correction: I was looking at the wrong source, however the implementation in chromium (third_party/blink/renderer/core/css/selector_query.cc) has the same difference

@eoghanmurray
Copy link
Contributor

closest applies the same check as ::matches, but it does so for each parent element

Ya but in this case you are also modifying the selector so it's not a straight comparison... but closest does do an unnecessary check on the element itself (we've already checked that via the classList check, so I wonder if an optimisation to remove the classList check would work just as well)

I suspect that * selector could be faster as the engine has to only compare the parent -> child relationship between nodes and not match against the actual selector as * will match anything.

Performance guestimation is always counter-intuitive to me!

@eoghanmurray
Copy link
Contributor

I've merged a recent PR that is now on main

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 3, 2023

@eoghanmurray would you like me to rerun the benchmarks?

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 3, 2023

Ya but in this case you are also modifying the selector so it's not a straight comparison... but closest does do an unnecessary check on the element itself (we've already checked that via the classList check, so I wonder if an optimisation to remove the classList check would work just as well)

I updated the benchmark https://jsbench.me/7ilkpmi7rn/1 - it seems that matches is still faster (benchmark is pushing to an array in order to escape optimizations)

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 3, 2023

It seems that the changes from master broke lint rules

@eoghanmurray
Copy link
Contributor

It seems that the changes from master broke lint rules

cannot find name addedIds

I did the merge conflict resolution in the github editor and messed it up!

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 3, 2023

I'll reset and rebase this correctly, no worries

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 7, 2023

@eoghanmurray I reverted the genAdds change and updated snapshots, the changes are now in the new PR I opened. lmk if you need me to change something else

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 10, 2023

@eoghanmurray something I just noticed is that the SDK has a default blockClass option set to "rr-block". I'm not sure why that is, maybe you have an idea? Imo unless there are external users outside of the SDK relying on this behavior, then I think this should be opt-in with no default value. The consequence of the default value is that we are performing a lot of isBlocked calls on each mutation when the page might not even contain any rr-block elements... :/

@eoghanmurray
Copy link
Contributor

Ya this is so we can tell people to 'add rr-block class to anything you want blocked' without having to also have to explain how to modify their config (or redeploy to a customer). I don't think this can be changed now unless in a major version. We could provide a recommended installation which configures blockClass: false, // performance: omit default check class of 'rr-block' but this would need to be a recipe for backwards compatibility (IMO)

@JonasBa
Copy link
Contributor Author

JonasBa commented Aug 11, 2023

Ya this is so we can tell people to 'add rr-block class to anything you want blocked' without having to also have to explain how to modify their config (or redeploy to a customer). I don't think this can be changed now unless in a major version. We could provide a recommended installation which configures blockClass: false, // performance: omit default check class of 'rr-block' but this would need to be a recipe for backwards compatibility (IMO)

@eoghanmurray yes, that makes sense. Imo a recommendation is fine since this is definitely in the hot path

@JonasBa
Copy link
Contributor Author

JonasBa commented Sep 25, 2023

@eoghanmurray friendly ping :)

@JonasBa
Copy link
Contributor Author

JonasBa commented Oct 5, 2023

@eoghanmurray any chance we could get a pair of eyes on this PR?

@marandaneto
Copy link
Contributor

@eoghanmurray any chance we could get a pair of eyes on this PR?

Maybe @Juice10 can help here as I've seen their activity on other PRs recently.

Copy link
Contributor

@pauldambra pauldambra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fly-by approval fwiw since I'd love to see this land 🚀 ❤️

Comment on lines +265 to +267
if (!blockClass && !blockSelector) {
return false;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🙌

(lowerIfExists(sn.attributes.property).match(/^(og|twitter|fb):/) || // og = opengraph (facebook)
lowerIfExists(sn.attributes.name).match(/^(og|twitter):/) ||
lowerIfExists(sn.attributes.name) === 'pinterest')
(OG_TWITTER_OR_FB_REGEXP.test(snAttributeProperty) || // og = opengraph (facebook)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

very tiny nitpick: that the comment makes more sense up at the declaration now 🙈

@Yuyz0112
Copy link
Member

Yuyz0112 commented Jan 26, 2024

Is there any chance @JonasBa can split these cool optimizations into multiple PRs? So some small tweaks should be merged more quickly.

Since I found there are already some conflicts in this PR(sorry for the late review).

@JonasBa
Copy link
Contributor Author

JonasBa commented Jul 18, 2024

Is there any chance @JonasBa can split these cool optimizations into multiple PRs? So some small tweaks should be merged more quickly.

Since I found there are already some conflicts in this PR(sorry for the late review).

Late to this, but I'm starting to do that yes.

@JonasBa
Copy link
Contributor Author

JonasBa commented Sep 19, 2024

Closing this in favor of the smaller PRs that I opened

@JonasBa JonasBa closed this Sep 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants