Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate 404.html page (#539) #1221

Merged
merged 6 commits into from
Jun 23, 2020

Conversation

manuel-woelker
Copy link
Contributor

Prototype spike for a 404 page (cf #539).

This PR generates a 404.html file with sidebar navigation and search intact, to better handle broken or stale links.
Additionally, the builtin server serves this file on 404 errors.

Comments and feedback appreciated!

@ehuss
Copy link
Contributor

ehuss commented May 17, 2020

Thanks, looks useful!

Would it be possible to allow the user to customize the page? Like if src/404.md is found, use that? And maybe in the config have output.html.404 key that lets you set the filename if you don't want to use the default of src/404.md, or set that value to false to disable it altogether?

@manuel-woelker
Copy link
Contributor Author

Thanks for the feedback and suggestions, I'll try to incorporate them over the next days.

@manuel-woelker manuel-woelker force-pushed the fb-539-not-found-page branch from 22390f4 to 6396552 Compare May 21, 2020 16:32
@manuel-woelker
Copy link
Contributor Author

Added the suggestions and rebased on the current master, which had become warped ;-)

HtmlConfig in [output.html] now has "input-404" (default: "404.md") and "output-404" keys (default: "404.html") to customize 404 generation.
404 generation can be disabled by setting output-404 = ""

@manuel-woelker manuel-woelker changed the title WIP: Generate 404.html page (#539) Generate 404.html page (#539) May 21, 2020
@ehuss
Copy link
Contributor

ehuss commented May 26, 2020

Thanks! Yea, sorry about swapping the webserver out, hopefully it wasn't too hard to figure out.

I noticed that it doesn't work for bad links in subdirectories. For example, visit http://localhost:3000/foo/bar and none of the css loads properly. I suspect it will need some HTML magic to get all the links to work properly.

Is there a particular reason to have separate configs for the input and output name? I suspect nearly everyone will just use the default, and the small number that don't can probably just set the input name and have that mirrored to the output. Also, if the output path has a different directory structure, the links won't work. I think I would prefer to keep the config simple, unless there is a particular use case for that.

@manuel-woelker
Copy link
Contributor Author

Regarding subdirectories: I initially hoped this would be easy, alas the Content-Base and Content-Location headers that would have allowed these base URI to be set to fix this is no longer supported.

The other option is to set the URI in the html, which would either mean editing the index.html.hbs template, or selectively "fixing up" the 404 page to add it. The template approach would make relative URL resolution simpler, probably removing a lot of {{ path_to_root }} calls in the process. Any ideas about which approach is better here?

As for separate configs for the input and output name: I felt that it would be good for separation of concerns. The input path is a matter of repo layout - projects might want this file "out of the way" e.g. put it in a subdirectory or name it "_notfound.md". The output file is a "web serving" concern, i.e. where the web server expects the file. But I am open to changing that to make it simpler.

@ehuss
Copy link
Contributor

ehuss commented May 28, 2020

Hm, that's a tough one. It seems like the only option would be to define a site-url in the config, and use absolute instead of relative paths. I took a look at another static site generator, and that's what it does. Links for normal pages are relative, but the 404 page uses absolute links relative to the site-url. For example, https://example.com/foo/bar/ would result in a link like /foo/bar/css/general.css in 404.html, and then the development server would override the site-url. That seems like a major change, and I'm uncertain what kind of fallout that would have, or how many places that needs to touch (maybe it is just path_to_root?).

@manuel-woelker
Copy link
Contributor Author

Well, one option would be to only set the base url for all generated files to e.g. "/" using / in the head. Then all relative urls to css/js and other links are resolved relative to that base url, instead of the actual url path (AFAICT). This would simplify links for all pages, as well as fix the 404 in subdirectory issue. The only downside is that the site-url would have to be configured as an absolute path.

@ehuss
Copy link
Contributor

ehuss commented May 29, 2020

I would prefer not to change the behavior for existing files. That would require knowing the site-url, and might break various deployment scenarios.

@manuel-woelker
Copy link
Contributor Author

Hmm, I don't see any neat solution that allows path resolution from any URL without knowing the site url.

A hack would be to embed some javascript, which successively tries to "walk" the url upwards until it successfully downloads the css file, and with that information rewrites all the css and script includes in the DOM. This should work, but it's a bit of a kludge.

@ehuss
Copy link
Contributor

ehuss commented Jun 3, 2020

I don't think it would be a problem for mdbook to require a site-url for the 404 page to work properly. If the site-url isn't set, the 404 page could generate absolute paths (/css/general.css), and emit a warning that site-url should be set for it to work properly.

@manuel-woelker
Copy link
Contributor Author

Great, that sounds like a viable plan.

@manuel-woelker manuel-woelker force-pushed the fb-539-not-found-page branch from 6396552 to cda8ad4 Compare June 7, 2020 12:15
@manuel-woelker
Copy link
Contributor Author

Added the config parameter output.html.site-url, and used that via the tag in the 404 page.

This also makes the navigation links work correctly.

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

serve will need to modify the config to remove site-url (or maybe set it to /).

The documentation will need to be updated.

I would also still prefer to remove the output-404 and only have 1 option to configure it.

I'm also having doubts about whether or not it makes sense to have this part of the book, or part of the theme. I'm not really sure, just thought I'd toss it out there as an idea.

@@ -9,6 +9,7 @@ edition = "2018"

[output.html]
mathjax-support = true
site-url = "/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should probably be https://rust-lang.github.io/mdBook/, otherwise it would break on deployment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to "/mdBook/", the server should not be needed

let base_url = if let Some(site_url) = &html_config.site_url {
site_url
} else {
warn!("HTML 'site-url' parameter not set, defaulting to '/'. Please configure this to ensure the 404 page work correctly, especially if your site is hosted in a subdirectory on the HTTP server.");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to avoid spamming all users to set the site-url. I'm not sure which is best, though. The 404 page could be disabled by default (with the assumption that sites usually have their own 404 mechanism), or it could just silently set site-url to / (or just lower this message to debug).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to debug! now

@@ -437,6 +437,54 @@ impl Renderer for HtmlHandlebars {
is_index = false;
}

// Render 404 page
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can all this new code be moved to a separate function, so this function isn't so long?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

refactored into its own function in impl HtmlHandlebars

src/cmd/serve.rs Outdated
let routes = livereload.or(book_route);
let book_route = warp::fs::dir(build_dir.clone());
// The fallback route for 404 errors
let fallback_file = "404.html";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will need to check the config for the name of the 404 page.

// Render 404 page
if html_config.output_404 != Some("".to_string()) {
let default_404_location = src_dir.join("404.md");
let content_404 = if let Some(ref filename) = html_config.input_404 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure why there are two different code blocks here. I think it can be done with a single one by using something like let filename = html_config.input_404.as_deref().unwrap_or("404.md");.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a case separation there: if the 404 input file is explicitly configured, but not found, that should a hard error. If it's not explicitly configured, a missing 404.md file should not error, but fall back to the default 404 message.

We could also remove the default behaviour, and make it an explicit opt-in.

let default_404_location = src_dir.join("404.md");
let content_404 = if let Some(ref filename) = html_config.input_404 {
let path = src_dir.join(filename);
std::fs::read_to_string(&path).map_err(|failure| {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of map_err, use with_context, something like this:

                std::fs::read_to_string(&path)
                    .with_context(|| format!("unable to open 404 input file {:?}", path))?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That reads much better, changed.

)
})?
} else {
"# 404 - Document not found\n\nUnfortunately, this URL is no longer valid, please use the navigation bar or search to continue.".to_string()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Try to keep lines under 100 characters long. Break this up with backslashes.

Also, I'm not sure about the wording here. Maybe combine it with the example you wrote, maybe something like this:

# Document not found (404)

This URL is invalid, sorry. Please use the navigation bar or search to continue.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated default 404 wording and formatting.

"/"
};
data_404.insert("base_url".to_owned(), json!(base_url));
data_404.insert("path".to_owned(), json!("404.md"));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure what "path" is used for, but I would guess this should be the path from the config?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

AFAICT, that path is only used to create the path to root for navigation links and script urls, i.e. it should not be the config path in order for the links to work correctly.

…404 page, making links and relative script/css loads behave correctly even in subdirectory paths
 - removed config output_404
 - ensure serve overrides the site url, and hosts the correct 404 file
 - refactor 404 rendering into separate fn
 - formatting
@manuel-woelker manuel-woelker force-pushed the fb-539-not-found-page branch from 67a4d58 to 6d6e540 Compare June 10, 2020 10:46
@manuel-woelker
Copy link
Contributor Author

Thanks for the thorough review and feedback!

I rebased and added the changes, as well as the documentation.

I removed the output-404 option as well.

Let me know if you have any further suggestions!

Copy link
Contributor

@ehuss ehuss left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks!

@ehuss ehuss merged commit 9268884 into rust-lang:master Jun 23, 2020
@ehuss ehuss mentioned this pull request Jun 23, 2020
Ruin0x11 pushed a commit to Ruin0x11/mdBook that referenced this pull request Aug 30, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants