Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix: use getHTML in the shadow root expansion #2587

Merged
merged 2 commits into from
Jul 24, 2024
Merged

Conversation

barjin
Copy link
Contributor

@barjin barjin commented Jul 23, 2024

expandShadowRoots now does not replace the original content but creates a separate (non-shadow) subtree as a sibling of the original one.

Note that this doesn't duplicate content $^{[1]}$ , as the original shadow roots are inaccessible from JS by design. document.documentElement.outerHTML returns only the contents of (new) regular DOM elements. The page also still looks the same, as a shadow DOM tree masks any "light" DOM sibling.

Closes #2583


$^{[1]}$ With the notable exception of accessgroup.my.site.com pages, where the use of custom DOM elements somehow creates elements with shadowRoot property, the content of which is still accessible from JS(???). This is likely coming from the ancient web framework they are using.

Either way, double the content is probably better than none (so far we have only noticed this issue on this page).

@barjin barjin added the bug Something isn't working. label Jul 23, 2024
@barjin barjin requested review from janbuchar and B4nan July 23, 2024 08:36
@barjin barjin self-assigned this Jul 23, 2024
@github-actions github-actions bot added this to the 94th sprint - Tooling team milestone Jul 23, 2024
@github-actions github-actions bot added t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics. labels Jul 23, 2024
@B4nan
Copy link
Member

B4nan commented Jul 23, 2024

I would still go with the getHTML method if it's available (which it should be with the latest playwright already), I guess that would solve the duplicity for that one weird site, right?

@barjin
Copy link
Contributor Author

barjin commented Jul 23, 2024

Well, the getHTML methods require the serializable property, which is apparently false by default. Combined with the fact that it only works in the latest browser versions, I cannot help but be a bit sceptical here.

I'll play with the problematic page a tiny bit more and try to replicate the issue locally, so we can debug and test better.

@B4nan
Copy link
Member

B4nan commented Jul 23, 2024

Well, the getHTML methods require the serializable property, which is apparently false by default.

And this solution will work regardless of that?

Combined with the fact that it only works in the latest browser versions, I cannot help but be a bit sceptical here.

I don't see this as a problem really (outside of the fact that latest playwright breaks that one test with firefox, that's a different story).

Anyway, I don't want to block this at all, once the CI is green we can merge and release, its just that it feels more future proof to use the new method, which does exactly what we need, instead of baking in our own implementation.

@barjin
Copy link
Contributor Author

barjin commented Jul 24, 2024

And this solution will work regardless of that?

Yep, e.g. the new tests from this PR are not passing with getHTML(), as it doesn't find the [GOOD] parts - it would if the template elements had shadowrootserializable argument.

I guess we can have a bunch of fallbacks, both for the existence of the method (getHTML?.()) and the non-emptiness of the content (getHTML?.().trim().length > 0 ? getHTML() : getShadowDomHtml()).

I still feel like W3C could have handled this somehow better, but well, here we are. 🤷🏽‍♂️

@barjin barjin force-pushed the fix/shadow-root-no-replace branch from 277f1c8 to 3a1d17b Compare July 24, 2024 06:48
@barjin barjin changed the title fix: append shadow root content in a separate sibling element fix: use getHTML in the shadow root expansion Jul 24, 2024
@B4nan
Copy link
Member

B4nan commented Jul 24, 2024

What about this option, the description sounds like you can use that to disregard the serializable property. Or did I misunderstand what it's about?

https://developer.mozilla.org/en-US/docs/Web/API/ShadowRoot/getHTML#shadowroots

Btw are we sure this is not "a good thing"? I mean in the cases we know about, were there some that disabled the serialization on the actual content we care about?


Now I am just curious, the implementation is fine by me, feel free to merge and go ahead with the release if it works correctly in WCC.

@barjin
Copy link
Contributor Author

barjin commented Jul 24, 2024

image

I think both of the options are about child shadow roots of the shadow root instance you are calling the method on. Looking into Firefox source code (9694-9695 here), it indeed seems that the shadowRoots array option does disregard the serializable option, but we first need to collect the references to all the children.

image

You're right that we haven't noticed any issues with the shadow DOMs whatsoever (except for the current issue in WCC). I say we merge and release this (the WCC website works and it's not breaking). As a bonus, we still have a better understanding of what's happening in case some other issue pops up.

@B4nan B4nan merged commit a244d62 into master Jul 24, 2024
10 checks passed
@B4nan B4nan deleted the fix/shadow-root-no-replace branch July 24, 2024 08:47
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working. t-tooling Issues with this label are in the ownership of the tooling team. tested Temporary label used only programatically for some analytics.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

bug: shadow root expansion in parseWithCheerio removes content
2 participants