Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DOMParser within ServiceWorkers #846

Closed
markuskobler opened this issue Mar 15, 2016 · 30 comments
Closed

DOMParser within ServiceWorkers #846

markuskobler opened this issue Mar 15, 2016 · 30 comments
Milestone

Comments

@markuskobler
Copy link

I'm trying to figure out the best way to parse HTML in response to a fetch request. In my case, I'm attempting to figure out any dependencies the HTML might have so I can also cache those assets as well

Is the only way todo this currently to use a horrendously complicated regex or am I missing something obvious?

Or put it another way what would be the downside of exposing the DOMParser to the ServiceWorker scope.

@annevk
Copy link
Member

annevk commented Mar 16, 2016

DOM implementations in browsers are not thread-safe. And don't use a regexp to parse HTML. You need to write an HTML parser if you want to do that.

@markuskobler
Copy link
Author

Ok, that's useful to know. Is that the technical reason why document types are supported on XHR and not on fetch Response?

@domenic
Copy link
Contributor

domenic commented Mar 16, 2016

They aren't supported in web workers.

parse5 works in web workers, as does in fact large parts of the jsdom project (although that is a much larger dependency).

@annevk
Copy link
Member

annevk commented Mar 16, 2016

@markuskobler the reason fetch() doesn't support nodes is mostly because we didn't think a potential fetch module requiring all of DOM is a reasonable proposition. It's not really related to the DOM not being available in workers.

@markuskobler
Copy link
Author

So in my case, I want the service worker to have a better understanding of the HTML it's caching so it can make a call on which of its dependencies should also be cached. I would have assumed this to be a common use case?

I only mentioned the DOMParser, not because I'm attached to that API, but because this feels like a core responsibility of the browser.

@wanderview
Copy link
Member

If you want this done on the client I think you should just use read-through-caching. You can then just use <link rel="prefetch"> for resources you want to aggressively load/cache.

Alternatively, you can have your server pre-compute the list of resources and cache them in your service worker install event.

I think trying to parse html on the fly would be the least efficient way to go here.

Would either of those solutions work for you?

@inian
Copy link

inian commented Jul 8, 2016

I am looking to parse HTML received by the SW to replace links to individual JS files with a single link that combines all of them - for performance reasons, and then send it to the browser for parsing. I am thinking the performance gain from lesser RTTs might offset the extra overhead of running a HTML parser within it. Having a DOMParser in this context would be useful..

Is this a reasonable thing to do within a SW?

@jakearchibald
Copy link
Contributor

A better solution here would be to use HTTP/2, where the overhead of multiple requests is low.

@inian
Copy link

inian commented Jul 9, 2016

Hey Jake,
I understand that but I was thinking of other optimisations too..say injecting preload tags for fonts and other optimisations purely done on the client side HTML..
Plus this would be a solution that we can use right now, before CDNs start adopting HTTP/2 more widely..

@annevk
Copy link
Member

annevk commented Jul 9, 2016

You cannot use it right now if we need to make DOM thread-safe first…

@jakearchibald
Copy link
Contributor

If you're wanting to trigger preload earlier, I think the options discussed in #920 are better.

If you have to download the whole doc to inject a preload, I think you'll lose performance rather than win

@delapuente
Copy link

@annevk I think @inian does not want to access the DOM, he is simply suggesting to use some kind of parser to get the body of the response as structured data instead of dealing with strings.

@RReverser
Copy link
Member

@delapuente The thing is that DOMParser creates real DOM nodes (hence the name), so it's not that easy to split functionality of one from the other. Much easier would be to use some external HTML5 parser (like parse5 as per @domenic's suggestion).

Moreover, if you want to use HTML parser for the content you're getting from the server, you will likely want a streaming parser (which DOMParser is not) that would play well with Streaming API where / when it's available, as otherwise Service Worker will become a bottleneck and not a source of optimizations.

@inian
Copy link

inian commented Jul 15, 2016

Ah yes, looking like using a (external) Streaming HTML parser along with the Streams API would be better for the things like injecting the preload tag and so on.. That way, I don't need to wait for the entire document to download..

@RReverser
Copy link
Member

@inian Yup (and parse5 has streaming mode).

As for preload specifically, you don't even need HTML parser, as instead of tags you can use Link header to indicate same intent, and header is easy to add even without retrieving the response body.

@inian
Copy link

inian commented Jul 15, 2016

Thanks for clarifying that @RReverser. Some of the people we are working with find it easier to add a script tag to their page than messing with their servers - so if we could all these optimisations just using a SW, it would be cool..so we are working from that angle now..

Thinking about it, we could just add the Link header via the SW too..

@jakearchibald jakearchibald added this to the Future ideas milestone Jul 25, 2016
@mflux
Copy link

mflux commented Jan 19, 2017

I just bumped into this issue where, upon using THREE.js ColladaLoader2 (which uses DOMParser) to break open Collada files (3d models), I tried speeding up the load process by putting it through a web worker. To my surprise, this didn't work simply because DOMParser is not available from within web workers.

It's possible to hack ColladaLoader2 to use a DOMLoader alternative but that's just crazy. It's entirely reasonable for a web worker to be parsing things like DOM.

@joeyparrish
Copy link
Member

I think for most purposes, any XML parser would suffice. It would not have to be DOMParser specifically, but depending on a pure JS XML parser seems like too much trouble.

@RReverser
Copy link
Member

@joeyparrish No, no, please never use XML parser to parse HTML. Despite visual similarity, they have very different semantics (unless you specifically target XHTML and not HTML5).

@joeyparrish
Copy link
Member

True, for HTML. I was thinking of a use case of my own where we use DOMParser to parse XML and would like to be able to do so from a service worker. Since the OP wants to parse HTML, please disregard my comment.

@v1nce
Copy link

v1nce commented Jul 22, 2021

IMO there are A LOT of possible uses for domparser in service workers.
My SW does (or try to do) a lot of thing in the fetch section ( (un)zipping, OTF conversion of files not supported in browser, text manipulation in html or xml) and this will be so much easier with domparser (and canvas)
So I don't see the point of arguing against them and telling people they should go for such or such workarounds.
The only valid point is current DOM parser is not thread-safe.

@jakearchibald
Copy link
Contributor

Right now the DOM is closely coupled with rendering, with things such as offsetWidth, CSS styles, getBoundingClientRect etc etc. Maybe in some future we could have a different representation of DOM that's lighter and not coupled to rendering which could be used in a service worker.

However, we shouldn't do this just for service workers, it would need to be a feature across all (or at least many) worker type.

Further discussion of this proposal should happen in https://github.com/whatwg/dom.

@Stvad
Copy link

Stvad commented Apr 29, 2022

Another use-case: Chrome bans background pages in manifest v3 extensions, making some set of use-cases not possible (specifically here, I've been using DOMParser in background page)

@tuhuynh27
Copy link

Another use-case: Chrome bans background pages in manifest v3 extensions, making some set of use-cases not possible (specifically here, I've been using DOMParser in background page)

Same here, how to use DOMParser in Service Worker for the Manifest v3 migration now 😢

@Stvad
Copy link

Stvad commented Aug 31, 2022

the hack I ended up doing is having an iframe that I would use as a fake background page. this does not work for all use-cases, but worked for me to run the DOM parsing on a separate thread from the rendering. see an example here https://github.com/transclude-me/extension/tree/main/source/content/background-simulation

@rektide
Copy link

rektide commented Jul 29, 2023

I agree this isn't a ServiceWorker specific need, but it seems like the conversation around this died with this ticket being closed.

It does a lot more than just DOM, it's a full page, but in part I think we have special-purpose one-off Offscreen Documents (w3c/webextensions#170) capability built specifically for Web Extensions, because the good/necessary ideas here failured to get spoken to. Did anyone ever actually take any steps to getting the DOM spec engaged on this topic, after Jake closed this issue?

I'd also point to great articles like https://paul.kinlan.me/we-need-dom-apis-in-workers/ which again validate the general ask here. People really want a programmatic DOM. We don't really care about the rendering for a lot of our work. Shipping the JSDOM library again and again is a tell that we need help here.

@jakearchibald
Copy link
Contributor

#846 (comment) is still relevant. This is the wrong venue for the discussion. Service workers haven't chosen to block DOM APIs, it's that DOM APIs haven't been spec'd to work in workers. If you want that, the discussion needs to happen in the repos relating to the DOM APIs, as it's the maintainers of those features that would need to make the change.

@jakearchibald
Copy link
Contributor

If folks still aren't convinced, it's like observing that helicopters don't work underwater, and complaining to the sea.

@v1nce
Copy link

v1nce commented Jul 31, 2023 via email

@jakearchibald
Copy link
Contributor

No, it's not like that whatwg/dom#1217 (comment)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests