Make it possible to use docs.rs offline for pages that have been visited at least once #845

jyn514 · 2020-06-21T19:17:06Z

It'd be great to turn docs.rs into an offline-first PWA (Progressive Web App). So the user would still be able to browse the docs they have already visited before even when offline, without having to use a separate website or app.

The same could be done for doc.rust-lang.org.

Originally posted by @teohhanhui in #174 (comment)

jyn514 · 2020-06-21T19:22:37Z

The same could be done for doc.rust-lang.org.

You can open an issue on https://github.com/rust-lang/www.rust-lang.org for that site, it's managed by a different team. I imagine they would be very receptive since it's a completely static site.

Another alternative is a browser extension to redirect online version -> offline version, similar to what the IPFS Companion extension does. For example:
https://doc.rust-lang.org/std/sync/struct.RwLock.html -> file:///home/teohhanhui/.rustup/toolchains/stable-x86_64-unknown-linux-gnu/share/doc/rust/html/std/sync/struct.RwLock.html

Hmm, this is an interesting idea. I don't think it would work with relative links though, .. would send you to doc/rust/html/std/sync/index.html which might not yet exist. Also I'm not sure that this would work correctly if we used trailing slashes on any page instead of /index.html.

That can be achieved with cargo doc to build local crates and rustup doc for the book, std, and everything else on doc.rustlang.org

The whole point of docs.rs is that you don't have to build the docs yourself, so while useful a useful tip I don't think it should replace being able to use docs.rs offline.

I don't know very much about PWAs. If we set pages to be cached for a longer time, would that meet this use case? That way you could visit the cached page even when you lost internet.

Kixiron · 2020-06-21T20:00:58Z

In regards to relative links, I wouldn't be sad if they went away, as they're not really a great thing in the first place. Replacing relative links would probably help simplify a good portion of code while also being less finicky/more difficult to mess up

jyn514 · 2020-06-21T20:09:36Z

In regards to relative links, I wouldn't be sad if they went away, as they're not really a great thing in the first place. Replacing relative links would probably help simplify a good portion of code while also being less finicky/more difficult to mess up

I strongly disagree. Without relative links we'd have to hardcode https://docs.rs at the start of every url, which would break this anyway.

jyn514 · 2020-06-21T20:10:41Z

Also, rustdoc heavily uses relative links for documentation, I don't see a good way to change that since it doesn't know the absolute URL it will be used with.

Kixiron · 2020-06-21T20:28:32Z

Oh, I thought you meant relative links in reference to how we do our own source browser, with .. being "up", but it seems we already use a canonical link for that

teohhanhui · 2020-06-22T03:46:36Z

I don't know very much about PWAs. If we set pages to be cached for a longer time, would that meet this use case? That way you could visit the cached page even when you lost internet.

See https://developer.mozilla.org/en-US/docs/Web/Progressive_web_apps/Offline_Service_workers#Offline_First

Changing the cache expiry would help, however that requires the user to manually toggle offline mode in their browser (which is a very hidden thing nowadays, if not impossible altogether...)

jyn514 · 2020-06-22T03:59:18Z

Changing the cache expiry would help, however that requires the user to manually toggle offline mode in their browser

That seems to defeat the point of caching :(

Glancing through the page you linked it seems like the main idea is to have some JavaScript that checks if the page is cached before making a network request. I agree that should be the behavior, but I'm not comfortable enough with JavaScript to implement it /don't have the time. If someone is interested in working on this I'd be happy to mentor though :) almost all of the site can be cached except the home page, /releases, and redirects.

wanderrful · 2020-09-20T02:56:32Z

Angular apps have service workers built into them implicitly, so if you guys are willing to upgrade this from a Python Jinja-like Tera front-end (https://crates.io/crates/tera) to an Angular front-end then you can get the Service Worker caching for free.

Here's some more info: https://angular.io/guide/service-worker-intro

As for the rust-lang website itself, it has a Handlebars front-end (https://github.com/rust-lang/www.rust-lang.org/blob/master/templates/index.hbs), which could also be replaced with an Angular front-end.

However, I think it'd probably be more on-brand for these Rust websites to have a Rust-based front-end that compiles to WebAssembly rather than be Javascript-based. The only such Crate I'm aware of that might do this is Yew, but it doesn't have Service Workers built into it as far as I know. It's not "production-ready", but since these websites are just static pages I don't think that that's a concern.

Angular could potentially be overkill since these sites are just static pages, but just because it has a bunch of bells and whistles doesn't mean you have to use them.

jyn514 · 2020-09-20T03:04:50Z

I'd strongly prefer for docs.rs to remain a static site first and foremost, and especially remain usable with JavaScript disabled. I'm fine with JS adding features on top, but the JS shouldn't be necessary just to use the site.

That said I don't know much about frontend, so maybe Angular can do that?

wanderrful · 2020-09-20T03:13:05Z

Service workers themselves are implemented on the front-end via Javascript, so I'm not sure that we can have our cake and eat it, too, in this situation.

With that design constraint, I'm not sure we can make this website offline-first. All we could do is just ask users to use their browser's "make available offline" feature if they want to use the site while offline.

Edit: Even WebAssembly requires Javascript to be enabled, so I'm not sure that any Rust-based WASM solution would work either.

jyn514 · 2020-09-20T03:26:31Z

Let me approach this from a different angle (I really like the framing in https://internals.rust-lang.org/t/pre-rfc-user-namespaces-on-crates-io/12851/96 to discuss things as problems to solve and not solutions to implement).

docs.rs currently is a dynamic site which serves static HTML. It does not have caching for rustdoc pages, which means the site is not available when you're offline. The goal of this issue is to be able to use docs.rs offline if you've already visited the relevant pages at least once.

If I'd never heard of PWAs, the way I'd imagine imagine implementing this is something like the following:

When you visit a page for the first time it loads as HTML. The HTML is cached indefinitely and never expires.
The HTML has a link to a .js file. Every time you refresh the page, the JS reaches out to the server to ask for a new version; if there's a newer version it replaces the HTML on the page (preferably with lazy-loading so this doesn't block)

What this gets docs.rs is three things:

JS is completely optional. If you have it disabled, then you just have to force-refresh the page once in a while to get newer versions, which I'm ok with (the rustdoc pages are very rarely updated, only when there's a docs.rs bug).
Pages are viewable offline. If you are offline, the HTML is cached and the JS just doesn't run, so you can view the page fine.
If JS is enabled, the page will always be up-to-date.

Regardless of the technologies or frameworks used, does that basic idea sound feasible?

GuillaumeGomez · 2021-11-13T14:32:36Z

Won't this be an issue for pages like doc.rust-lang.org/nightly/std/whatever.html? I don't think we have an equivalent on docs.rs except when arriving on the crate page (but then it makes a redirection to the last version).

jyn514 · 2021-11-22T14:22:47Z

@GuillaumeGomez are you saying that this breaks once latest no longer redirects to another page (#1527)? I think we can avoid that by just having a much shorter cache expiration date on those pages.

GuillaumeGomez · 2021-11-22T14:29:10Z

Yes it's what I meant.

jsha · 2021-11-25T03:32:13Z

I think this is probably feasible. Some questions to figure out: should all of docs.rs be one big PWA, which manages a cache of all the various docs you've visited? Or should each crate's doc be a separate PWA? Ideally we'd like the same behavior on doc.rust-lang.org, which means the functionality should be in rustdoc, which advocates towards a PWA per crate.

Also, it looks like Service Workers allow us to actually prefetch resources that the user hasn't visited yet. So for instance if you visit one page of a crate's docs, it could download all the pages of that crate's docs. The storage could add up fast, though, so we'd need heuristics about when or if to do that.

jsha · 2021-11-26T19:48:57Z

I have a local prototype of this that's kinda neat, and plan to work on it some more and will share results when they're good enough. I had high hopes of precaching a whole crate / the whole stdlib, but fetching that many files individually (30,847 for the stdlib) was prohibitively slow. And users probably wouldn't thank us for using that much data without a more explicit opt-in anyhow.

Here's my current thinking:

On Service Worker install, preload the static assets (fonts, JS, CSS, images), and always serve those from the Cache API
Whenever a user navigates to an HTML page:
- If it's not in the Cache API, fetch it, store it, and serve it.
- If it's in the Cache API
  - load from the local copy
  - fire off a background fetch to see if there's an updated version (new crate version, or rebuild with different rustdoc version).
  - if there is an updated version:
    - store it in the Cache API
    - flush all local storage of the outdated version
    - start preloading the latest version of pages that were locally cached before?
    - update the current page to tell the user there's a newer version available, with a button/link that reloads the page.

Note that in this scenario, nothing changes for users without JS; they never load the Service Worker.

Alternately, we could prefer freshness:

Whenever a user navigates to an HTML page:
Attempt to fetch it from the network
If the network is offline, or after a 2-second timeout, serve from cache (if available).

The first approach is quite similar to the Cache-Control stale-while-revalidate directive. As a simpler approach, we could try changing the headers on HTML pages. Right now they have no Cache-Control header. We could add max-age=0, stale-while-revalidate=5260000. I think that would make the page available offline for up to 2 months, and if there is a newer version available it would get fetched in the background and be ready on the user's next page load. I need to do some testing on this - none of the docs for Cache-Control stale-while-revalidate explicitly mention offline.

Advantage for the Cache-Control approach: much easier to deploy and reason about.

Advantages of the Service Worker approach:

deeply customizable. We can provide an interface for people to preload whole crates for offline use. When a new version is available we can purge all URLs from the old version. That avoids a potentially frustrating experience under Cache-Control stale-while-revalidate where each page you load shows the outdated version at first. For docs.rs we could even have an origin wide background JS task that looks for newer versions of all crates where you have a local cache of some pages, and proactively purges / refreshes.
we can provide a custom offline page for things we don't have in cache, offering information on what we do have cached.
it can work regardless of Cache-Control headers. So when someone deploys docs on their own server, they don't need to get all the headers just right - we can still offer offline via ServiceWorkers.

One of the exciting things about both approaches is they have the potentially to dramatically speed up repeat visits even when online.

jsha · 2021-11-26T20:12:26Z

By the way, to be able to readily experiment with this without the possibility of breaking docs.rs, it should be possible to run some totally third party site that has a Service Worker and fetches / serves pages from docs.rs as if those pages were on its own origin. But that would require settings Access-Control-Allow-Origin on all/most docs.rs pages. Is that reasonable to do?

jyn514 · 2021-11-26T20:13:46Z

I had high hopes of precaching a whole crate / the whole stdlib, but fetching that many files individually (30,847 for the stdlib) was prohibitively slow.

This should be possible once we finally implement downloadable docs :) that serves the docs as one big zipfile for the whole crate.

By the way, to be able to readily experiment with this without the possibility of breaking docs.rs, it should be possible to run some totally third party site that has a Service Worker and fetches / serves pages from docs.rs as if those pages were on its own origin. But that would require settings Access-Control-Allow-Origin on all/most docs.rs pages. Is that reasonable to do?

I would be worried about doing this on docs.rs in prod, but it shouldn't be terribly difficult to run a fork of docs.rs somewhere and add Access-Control-Allow-Origin there.

Hmm, I guess that doesn't let you test how it interacts with cloudfront though.

jyn514 · 2021-11-26T20:18:01Z

Advantage for the Cache-Control approach: much easier to deploy and reason about.

This is very tempting 😆 it sounds like you're volunteering to do much of the work, which I really appreciate ❤️ but simpler to write also means simpler to review.

How hard would it be to switch between the two ideas at a later time? It sounds like a lot of the work is hooking the service worker up to the Cache API and actually changing the page, which is the same between both, right?

jsha · 2021-11-26T20:30:03Z

Switching at any point would be the same work as doing either change from scratch. If we use the Cache-Control: max-age=0, stale-while-revalidate=N approach, it's a one-liner. We don't touch Service Worker or Cache API at all. If we do the Service Worker approach, it's a decent amount of work - and as you say, involves at least one other person learning enough about Service Worker to adequately review. :-)

The thing I worry about with stale-while-revalidate is this:

You load /regex/latest/regex on Nov 24. It's serving version 1.0.
You load /regex/latest/regex on Nov 26. You know the crate was updated to 2.0 yesterday, renaming a bunch of structs. Because of stale-while-revalidate, your browser shows you version 1.0. That's confusing! Of course, if you reload, you'll get 2.0.
If you don't reload (for instance, you don't know about the revision, or don't care, or actively want to look at 1.0 docs), when you click a link to one of the renamed structs, you will get a 404 (because that struct doesn't exist in 2.0).

Of course, now that I write these out I see these are also a problem for the /latest/ change in general. For instance, you could have /latest/ (version 1.0) loaded in your browser when 2.0 is released, and click a link to one of the now-renamed structs.

The problem also exists for versioned URLs. For instance, visit https://docs.rs/rustls/0.19.0/rustls/trait.Session.html and click "Go to latest version" (Session was renamed to Connection in 0.20). I see somebody has already thought of the problem, and that link takes you to a search page across 0.20. That's pretty neat! Maybe that's adequate?

The other problem with stale-while-revalidate is: say you load the root page, see it's outdated, and reload. Then you click to another page you've visited before. That's also outdated. You have to reload that too. It would get frustrating pretty fast.

jyn514 · 2021-11-26T21:02:17Z

I see somebody has already thought of the problem, and that link takes you to a search page across 0.20. That's pretty neat! Maybe that's adequate?

Haha, yeah I spent a while on that :)

Of course, now that I write these out I see these are also a problem for the /latest/ change in general. For instance, you could have /latest/ (version 1.0) loaded in your browser when 2.0 is released, and click a link to one of the now-renamed structs.

Hmm, this should only be a problem if you have the page open for a long time, right? Because (with caching as current, but with #1527) the second you reload the reload the page you'll get the newer version. I think the combination of open for a long time + and intervening release + the struct was renamed is low enough that just having search is fine.

You load /regex/latest/regex on Nov 26. You know the crate was updated to 2.0 yesterday, renaming a bunch of structs. Because of stale-while-revalidate, your browser shows you version 1.0. That's confusing! Of course, if you reload, you'll get 2.0.

Yeah, that seems confusing. I'm not sure that "if you reload you'll get 2.0" is true though - don't you need to do a hard refresh to ignore the cache directive? I don't think we should do that for the /latest/ page. It seems ok for pages other than /latest/ though, they should only change if a bug in rustdoc itself was fixed and the crate was rebuilt.

jyn514 · 2021-11-26T21:03:11Z

That said, I'm fairly familiar with service workers from working at Cloudflare so if that sounds fun I say go for it 😁

jsha · 2021-11-26T21:38:55Z

I'm not sure that "if you reload you'll get 2.0" is true though - don't you need to do a hard refresh to ignore the cache directive?

With max-age=0, stale-while-revalidate, I think it's true. The first load will serve from cache. During the ~dozen seconds you spend looking at the page, the browser will refresh the cache from origin, so by the time you reload there should be a fresh copy in cache.

it shouldn't be terribly difficult to run a fork of docs.rs somewhere and add Access-Control-Allow-Origin there.

Wouldn't it require a lot of CPU and storage to store all the crates? I'm thinking of something that would exist for a period of months, where we'd invite testers to try using it as their daily driver version of docs.rs, to see what weird cases would come out of real-life browsing patterns.

jyn514 · 2021-11-26T21:48:44Z

During the ~dozen seconds you spend looking at the page, the browser will refresh the cache from origin, so by the time you reload there should be a fresh copy in cache.

Ahh, that makes sense, I didn't realize that's what the directive did.

Wouldn't it require a lot of CPU and storage to store all the crates? I'm thinking of something that would exist for a period of months, where we'd invite testers to try using it as their daily driver version of docs.rs, to see what weird cases would come out of real-life browsing patterns.

I don't see a realistic way to do this. Either we experiment with it in prod (maybe with a feature flag?) or we can write more tests; it's just not feasible to replicate docs.rs at scale.

syphar · 2021-11-27T08:58:06Z

I admit I never worked with this kind of frontend caching, but I'm excited to see it if it works.

Since caching is hard this feels like that there might be edge-cases with confusing mixtures of cached and uncached pages (and assets), so IMHO having a (even user-visible) feature flag / testing phase would be a great idea.

Or building a second setup. I mean, having a staging platform is not a terrible idea :)

jyn514 · 2021-11-27T09:16:54Z

Yes, I definitely want to set up a staging server at some point where people can try things out interactively. I just want to set reasonable expectations for it; it's going to end up like staging.crates.io where maybe 5 people a week visit, it won't let us see problems that only appear at scale.

jsha · 2021-11-28T04:23:56Z

I just tested stale-while-revalidate, and it does make the page nicely available when the network is offline, at least in Chrome.

Proposal: Let's add Cache-Control: max-age=0, stale-while-revalidate=N for all versioned URLs, but not yet for /latest/ URLs (#1527) since things are a little trickier there. I propose N = 2 months to start.

jyn514 · 2021-11-28T04:40:14Z

Sounds like a plan! :)

jsha · 2021-12-01T00:33:05Z

A little hiccup: Iron doesn't seem to support stale-while-revalidate, and doesn't allow setting custom strings for the cache-control header: https://docs.rs/iron/0.6.1/iron/headers/enum.CacheDirective.html

jyn514 · 2021-12-01T02:20:31Z

@jsha does Extension not support custom headers?

Anyway, iron hasn't had a publish in 3 years, I wouldn't get your hopes too high. @syphar has been working on and off on switching to Axum.

syphar · 2023-10-24T08:45:10Z

note that the axum migration is done for some time.

jyn514 mentioned this issue Jun 21, 2020

Downloadable docs #174

Closed

jyn514 added A-frontend Area: Web frontend wishlist P-low Low priority issues and removed wishlist labels Jun 22, 2020

jyn514 added the help-wanted label Jul 27, 2020

jyn514 mentioned this issue Aug 15, 2021

compressed storage for rustdoc- and source-files #1342

Merged

7 tasks

jyn514 changed the title ~~Turn docs.rs into an offline-first PWA (Progressive Web App)~~ Make it possible to use docs.rs offline for pages that have been visited at least once Nov 13, 2021

jsha mentioned this issue Nov 26, 2021

full page caching for rustdoc pages in the CDN #1552

Closed

jsha mentioned this issue Dec 1, 2021

Add stale-while-revalidate to CacheDirective iron/iron#640

Open

jyn514 added the S-blocked Status: marked as blocked ❌ on something else such as an RFC or other implementation work. label Dec 1, 2021

jsha mentioned this issue Dec 1, 2021

Add Cache-Control to rustdoc pages #1569

Merged

syphar added E-medium Effort: This requires a fair amount of work and removed S-blocked Status: marked as blocked ❌ on something else such as an RFC or other implementation work. labels Oct 24, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make it possible to use docs.rs offline for pages that have been visited at least once #845

Make it possible to use docs.rs offline for pages that have been visited at least once #845

jyn514 commented Jun 21, 2020

jyn514 commented Jun 21, 2020

Kixiron commented Jun 21, 2020

jyn514 commented Jun 21, 2020 •

edited

Loading

jyn514 commented Jun 21, 2020 •

edited

Loading

Kixiron commented Jun 21, 2020

teohhanhui commented Jun 22, 2020

jyn514 commented Jun 22, 2020

wanderrful commented Sep 20, 2020 •

edited

Loading

jyn514 commented Sep 20, 2020

wanderrful commented Sep 20, 2020 •

edited

Loading

jyn514 commented Sep 20, 2020 •

edited

Loading

GuillaumeGomez commented Nov 13, 2021

jyn514 commented Nov 22, 2021

GuillaumeGomez commented Nov 22, 2021

jsha commented Nov 25, 2021

jsha commented Nov 26, 2021

jsha commented Nov 26, 2021

jyn514 commented Nov 26, 2021 •

edited

Loading

jyn514 commented Nov 26, 2021

jsha commented Nov 26, 2021

jyn514 commented Nov 26, 2021

jyn514 commented Nov 26, 2021

jsha commented Nov 26, 2021

jyn514 commented Nov 26, 2021 •

edited

Loading

syphar commented Nov 27, 2021

jyn514 commented Nov 27, 2021 •

edited

Loading

jsha commented Nov 28, 2021

jyn514 commented Nov 28, 2021

jsha commented Dec 1, 2021 •

edited

Loading

jyn514 commented Dec 1, 2021

syphar commented Oct 24, 2023

Make it possible to use docs.rs offline for pages that have been visited at least once #845

Make it possible to use docs.rs offline for pages that have been visited at least once #845

Comments

jyn514 commented Jun 21, 2020

jyn514 commented Jun 21, 2020

Kixiron commented Jun 21, 2020

jyn514 commented Jun 21, 2020 • edited Loading

jyn514 commented Jun 21, 2020 • edited Loading

Kixiron commented Jun 21, 2020

teohhanhui commented Jun 22, 2020

jyn514 commented Jun 22, 2020

wanderrful commented Sep 20, 2020 • edited Loading

jyn514 commented Sep 20, 2020

wanderrful commented Sep 20, 2020 • edited Loading

jyn514 commented Sep 20, 2020 • edited Loading

GuillaumeGomez commented Nov 13, 2021

jyn514 commented Nov 22, 2021

GuillaumeGomez commented Nov 22, 2021

jsha commented Nov 25, 2021

jsha commented Nov 26, 2021

jsha commented Nov 26, 2021

jyn514 commented Nov 26, 2021 • edited Loading

jyn514 commented Nov 26, 2021

jsha commented Nov 26, 2021

jyn514 commented Nov 26, 2021

jyn514 commented Nov 26, 2021

jsha commented Nov 26, 2021

jyn514 commented Nov 26, 2021 • edited Loading

syphar commented Nov 27, 2021

jyn514 commented Nov 27, 2021 • edited Loading

jsha commented Nov 28, 2021

jyn514 commented Nov 28, 2021

jsha commented Dec 1, 2021 • edited Loading

jyn514 commented Dec 1, 2021

syphar commented Oct 24, 2023

jyn514 commented Jun 21, 2020 •

edited

Loading

jyn514 commented Jun 21, 2020 •

edited

Loading

wanderrful commented Sep 20, 2020 •

edited

Loading

wanderrful commented Sep 20, 2020 •

edited

Loading

jyn514 commented Sep 20, 2020 •

edited

Loading

jyn514 commented Nov 26, 2021 •

edited

Loading

jyn514 commented Nov 26, 2021 •

edited

Loading

jyn514 commented Nov 27, 2021 •

edited

Loading

jsha commented Dec 1, 2021 •

edited

Loading