-
-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(perf): improve perf of mutation and snapshot #1271
Conversation
|
I re-ran the benchmark after 5d59621
The timings are consistent and we can still see a boost for create + remove and create + append into previous looped node |
Actually @eoghanmurray, running the benchmark here and testing matches with the .tagName combination and it seems to be even slower. I've looked at the blink implementation of matches (just because I was curious) and it seems that each individual selector is parsed and tested in a left to right order. I suspect the dip here is because the engine can short circuit the search if the selector it first checks after .block is a * selector as anything will match |
@eoghanmurray makes sense, just keep me updated and let me know how I can help further. On a small side note, I looked into the blink implementation of matches and it confirms the difference between closest and matches implementations - closest applies the same check as ::matches, but it does so for each parent element. I suspect that bool SelectorDataList::matches(Element& targetElement) const
{
if (m_needsUpdatedDistribution)
targetElement.updateDistribution();
unsigned selectorCount = m_selectors.size();
for (unsigned i = 0; i < selectorCount; ++i) {
if (selectorMatches(*m_selectors[i], targetElement, targetElement))
return true;
}
return false;
}
Element* SelectorDataList::closest(Element& targetElement) const
{
if (m_needsUpdatedDistribution)
targetElement.updateDistribution();
unsigned selectorCount = m_selectors.size();
+ for (Element* currentElement = &targetElement; currentElement; currentElement = currentElement->parentElement()) {
for (unsigned i = 0; i < selectorCount; ++i) {
if (selectorMatches(*m_selectors[i], *currentElement, targetElement))
return currentElement;
}
+ }
return nullptr;
} Correction: I was looking at the wrong source, however the implementation in chromium (third_party/blink/renderer/core/css/selector_query.cc) has the same difference |
Ya but in this case you are also modifying the selector so it's not a straight comparison... but
Performance guestimation is always counter-intuitive to me! |
I've merged a recent PR that is now on main |
@eoghanmurray would you like me to rerun the benchmarks? |
I updated the benchmark https://jsbench.me/7ilkpmi7rn/1 - it seems that matches is still faster (benchmark is pushing to an array in order to escape optimizations) |
It seems that the changes from master broke lint rules |
cannot find name I did the merge conflict resolution in the github editor and messed it up! |
I'll reset and rebase this correctly, no worries |
@eoghanmurray I reverted the genAdds change and updated snapshots, the changes are now in the new PR I opened. lmk if you need me to change something else |
@eoghanmurray something I just noticed is that the SDK has a default blockClass option set to "rr-block". I'm not sure why that is, maybe you have an idea? Imo unless there are external users outside of the SDK relying on this behavior, then I think this should be opt-in with no default value. The consequence of the default value is that we are performing a lot of isBlocked calls on each mutation when the page might not even contain any rr-block elements... :/ |
Ya this is so we can tell people to 'add |
@eoghanmurray yes, that makes sense. Imo a recommendation is fine since this is definitely in the hot path |
@eoghanmurray friendly ping :) |
@eoghanmurray any chance we could get a pair of eyes on this PR? |
Maybe @Juice10 can help here as I've seen their activity on other PRs recently. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
fly-by approval fwiw since I'd love to see this land 🚀 ❤️
if (!blockClass && !blockSelector) { | ||
return false; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙌
(lowerIfExists(sn.attributes.property).match(/^(og|twitter|fb):/) || // og = opengraph (facebook) | ||
lowerIfExists(sn.attributes.name).match(/^(og|twitter):/) || | ||
lowerIfExists(sn.attributes.name) === 'pinterest') | ||
(OG_TWITTER_OR_FB_REGEXP.test(snAttributeProperty) || // og = opengraph (facebook) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
very tiny nitpick: that the comment makes more sense up at the declaration now 🙈
Is there any chance @JonasBa can split these cool optimizations into multiple PRs? So some small tweaks should be merged more quickly. Since I found there are already some conflicts in this PR(sorry for the late review). |
Late to this, but I'm starting to do that yes. |
Closing this in favor of the smaller PRs that I opened |
This PR builds on what @Juice10 had found - I basically identified the same thing using different tools.
Motivation:
We started collecting browser profiles on some Sentry sites and noticed rrweb being the bottleneck when diffing large trees (common for loading screens which populate a large DOM tree).
Example profile collected by the SDK from unknown arch (platform is macos):
Profile for matching route collected locally (arm64 macos)
By looking at the code, I managed to identify a couple of bottlenecks:
Benchmark results:
Main performance improvement seems to be for the following cases (I would like to re-run these after review in case there are mistakes in my PR):
create 1000x10x2 DOM nodes and remove a bunch of them
which decreased from ~500ms to ~200mscreate 1000 DOM nodes and append into its previous looped node
which decreased from ~70ms to ~40msI suspect that by being a bit smarter about operations should also decrease memory pressure and avoid GC sweeps, but that is hard to quantify (basing this off my experience)
Feedback/critics on the PR are much appreciated. For what it's worth, I am not very familiar with rrweb codebase so take this with a grain of salt, I may have invalidated some assumptions the lib relies on and gotten invalid benchmark results. If I did, please point it out and I will address it. If someone is interested, feel free to cherry pick my changes as this PR has gotten quite large and risky. Hopefully it inspires someone to further improve the performance of operations (I am also happy to split it myself)
While this does address performance of the current implementation, I think the largest win in this case would be to process genAdds in an async loop so that we avoid blocking the main thread (I lack context on everything that might break?). Besides that, removing try/catch statements around hot code paths like isBlocking should be reevaluated as it is a common optimization bailout reason in v8 (are those really required?)