-
Notifications
You must be signed in to change notification settings - Fork 312
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
DOMParser within ServiceWorkers #846
Comments
DOM implementations in browsers are not thread-safe. And don't use a regexp to parse HTML. You need to write an HTML parser if you want to do that. |
Ok, that's useful to know. Is that the technical reason why |
@markuskobler the reason |
So in my case, I want the service worker to have a better understanding of the HTML it's caching so it can make a call on which of its dependencies should also be cached. I would have assumed this to be a common use case? I only mentioned the |
If you want this done on the client I think you should just use read-through-caching. You can then just use Alternatively, you can have your server pre-compute the list of resources and cache them in your service worker install event. I think trying to parse html on the fly would be the least efficient way to go here. Would either of those solutions work for you? |
I am looking to parse HTML received by the SW to replace links to individual JS files with a single link that combines all of them - for performance reasons, and then send it to the browser for parsing. I am thinking the performance gain from lesser RTTs might offset the extra overhead of running a HTML parser within it. Having a DOMParser in this context would be useful.. Is this a reasonable thing to do within a SW? |
A better solution here would be to use HTTP/2, where the overhead of multiple requests is low. |
Hey Jake, |
You cannot use it right now if we need to make DOM thread-safe first… |
If you're wanting to trigger preload earlier, I think the options discussed in #920 are better. If you have to download the whole doc to inject a preload, I think you'll lose performance rather than win |
@delapuente The thing is that DOMParser creates real DOM nodes (hence the name), so it's not that easy to split functionality of one from the other. Much easier would be to use some external HTML5 parser (like parse5 as per @domenic's suggestion). Moreover, if you want to use HTML parser for the content you're getting from the server, you will likely want a streaming parser (which DOMParser is not) that would play well with Streaming API where / when it's available, as otherwise Service Worker will become a bottleneck and not a source of optimizations. |
Ah yes, looking like using a (external) Streaming HTML parser along with the Streams API would be better for the things like injecting the preload tag and so on.. That way, I don't need to wait for the entire document to download.. |
@inian Yup (and parse5 has streaming mode). As for preload specifically, you don't even need HTML parser, as instead of tags you can use |
Thanks for clarifying that @RReverser. Some of the people we are working with find it easier to add a script tag to their page than messing with their servers - so if we could all these optimisations just using a SW, it would be cool..so we are working from that angle now.. Thinking about it, we could just add the Link header via the SW too.. |
I just bumped into this issue where, upon using THREE.js ColladaLoader2 (which uses DOMParser) to break open Collada files (3d models), I tried speeding up the load process by putting it through a web worker. To my surprise, this didn't work simply because DOMParser is not available from within web workers. It's possible to hack ColladaLoader2 to use a DOMLoader alternative but that's just crazy. It's entirely reasonable for a web worker to be parsing things like DOM. |
I think for most purposes, any XML parser would suffice. It would not have to be DOMParser specifically, but depending on a pure JS XML parser seems like too much trouble. |
@joeyparrish No, no, please never use XML parser to parse HTML. Despite visual similarity, they have very different semantics (unless you specifically target XHTML and not HTML5). |
True, for HTML. I was thinking of a use case of my own where we use DOMParser to parse XML and would like to be able to do so from a service worker. Since the OP wants to parse HTML, please disregard my comment. |
IMO there are A LOT of possible uses for domparser in service workers. |
Right now the DOM is closely coupled with rendering, with things such as However, we shouldn't do this just for service workers, it would need to be a feature across all (or at least many) worker type. Further discussion of this proposal should happen in https://github.com/whatwg/dom. |
Another use-case: Chrome bans background pages in manifest v3 extensions, making some set of use-cases not possible (specifically here, I've been using DOMParser in background page) |
Same here, how to use DOMParser in Service Worker for the Manifest v3 migration now 😢 |
the hack I ended up doing is having an iframe that I would use as a fake background page. this does not work for all use-cases, but worked for me to run the DOM parsing on a separate thread from the rendering. see an example here https://github.com/transclude-me/extension/tree/main/source/content/background-simulation |
I agree this isn't a ServiceWorker specific need, but it seems like the conversation around this died with this ticket being closed. It does a lot more than just DOM, it's a full page, but in part I think we have special-purpose one-off Offscreen Documents (w3c/webextensions#170) capability built specifically for Web Extensions, because the good/necessary ideas here failured to get spoken to. Did anyone ever actually take any steps to getting the DOM spec engaged on this topic, after Jake closed this issue? I'd also point to great articles like https://paul.kinlan.me/we-need-dom-apis-in-workers/ which again validate the general ask here. People really want a programmatic DOM. We don't really care about the rendering for a lot of our work. Shipping the JSDOM library again and again is a tell that we need help here. |
#846 (comment) is still relevant. This is the wrong venue for the discussion. Service workers haven't chosen to block DOM APIs, it's that DOM APIs haven't been spec'd to work in workers. If you want that, the discussion needs to happen in the repos relating to the DOM APIs, as it's the maintainers of those features that would need to make the change. |
If folks still aren't convinced, it's like observing that helicopters don't work underwater, and complaining to the sea. |
It's more like :
Users : Can we have the keys of the boat ?
W3c: Sure
U : How is the weather at sea today ?
W: I don't check. I think there's some wind.
U: Is it safe to use the boat ?
W: at sea ? No it is not.
U: Why ?
W: Because of the fan !
U : The Fan ?
W: Yeah it's a fan boat.
U: Why a fan ?
W : to go in the bayou
U: why would anyone want to go in the bayou ?
W: I do.
U: Ok can't we just remove the fan ?
W: No we can't.
U: Why ?
W: Because it's more a fan that a boat.
U: why do you call it a boat then ?
W: because that's how we call a board with a fan on it that can go in the bayou.
|
No, it's not like that whatwg/dom#1217 (comment) |
I'm trying to figure out the best way to parse HTML in response to a
fetch
request. In my case, I'm attempting to figure out any dependencies the HTML might have so I can also cache those assets as wellIs the only way todo this currently to use a horrendously complicated regex or am I missing something obvious?
Or put it another way what would be the downside of exposing the
DOMParser
to theServiceWorker
scope.The text was updated successfully, but these errors were encountered: