-
Notifications
You must be signed in to change notification settings - Fork 1.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Generate 404.html page (#539) #1221
Conversation
Thanks, looks useful! Would it be possible to allow the user to customize the page? Like if |
Thanks for the feedback and suggestions, I'll try to incorporate them over the next days. |
22390f4
to
6396552
Compare
Added the suggestions and rebased on the current master, which had become warped ;-) HtmlConfig in [output.html] now has "input-404" (default: "404.md") and "output-404" keys (default: "404.html") to customize 404 generation. |
Thanks! Yea, sorry about swapping the webserver out, hopefully it wasn't too hard to figure out. I noticed that it doesn't work for bad links in subdirectories. For example, visit http://localhost:3000/foo/bar and none of the css loads properly. I suspect it will need some HTML magic to get all the links to work properly. Is there a particular reason to have separate configs for the input and output name? I suspect nearly everyone will just use the default, and the small number that don't can probably just set the input name and have that mirrored to the output. Also, if the output path has a different directory structure, the links won't work. I think I would prefer to keep the config simple, unless there is a particular use case for that. |
Regarding subdirectories: I initially hoped this would be easy, alas the Content-Base and Content-Location headers that would have allowed these base URI to be set to fix this is no longer supported. The other option is to set the URI in the html, which would either mean editing the index.html.hbs template, or selectively "fixing up" the 404 page to add it. The template approach would make relative URL resolution simpler, probably removing a lot of {{ path_to_root }} calls in the process. Any ideas about which approach is better here? As for separate configs for the input and output name: I felt that it would be good for separation of concerns. The input path is a matter of repo layout - projects might want this file "out of the way" e.g. put it in a subdirectory or name it "_notfound.md". The output file is a "web serving" concern, i.e. where the web server expects the file. But I am open to changing that to make it simpler. |
Hm, that's a tough one. It seems like the only option would be to define a |
Well, one option would be to only set the base url for all generated files to e.g. "/" using / in the head. Then all relative urls to css/js and other links are resolved relative to that base url, instead of the actual url path (AFAICT). This would simplify links for all pages, as well as fix the 404 in subdirectory issue. The only downside is that the site-url would have to be configured as an absolute path. |
I would prefer not to change the behavior for existing files. That would require knowing the site-url, and might break various deployment scenarios. |
Hmm, I don't see any neat solution that allows path resolution from any URL without knowing the site url. A hack would be to embed some javascript, which successively tries to "walk" the url upwards until it successfully downloads the css file, and with that information rewrites all the css and script includes in the DOM. This should work, but it's a bit of a kludge. |
I don't think it would be a problem for mdbook to require a site-url for the 404 page to work properly. If the site-url isn't set, the 404 page could generate absolute paths ( |
Great, that sounds like a viable plan. |
6396552
to
cda8ad4
Compare
Added the config parameter output.html.site-url, and used that via the tag in the 404 page. This also makes the navigation links work correctly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
serve
will need to modify the config to remove site-url
(or maybe set it to /
).
The documentation will need to be updated.
I would also still prefer to remove the output-404 and only have 1 option to configure it.
I'm also having doubts about whether or not it makes sense to have this part of the book, or part of the theme. I'm not really sure, just thought I'd toss it out there as an idea.
book-example/book.toml
Outdated
@@ -9,6 +9,7 @@ edition = "2018" | |||
|
|||
[output.html] | |||
mathjax-support = true | |||
site-url = "/" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This should probably be https://rust-lang.github.io/mdBook/
, otherwise it would break on deployment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to "/mdBook/", the server should not be needed
let base_url = if let Some(site_url) = &html_config.site_url { | ||
site_url | ||
} else { | ||
warn!("HTML 'site-url' parameter not set, defaulting to '/'. Please configure this to ensure the 404 page work correctly, especially if your site is hosted in a subdirectory on the HTTP server."); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to avoid spamming all users to set the site-url. I'm not sure which is best, though. The 404 page could be disabled by default (with the assumption that sites usually have their own 404 mechanism), or it could just silently set site-url to /
(or just lower this message to debug
).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changed to debug! now
@@ -437,6 +437,54 @@ impl Renderer for HtmlHandlebars { | |||
is_index = false; | |||
} | |||
|
|||
// Render 404 page |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can all this new code be moved to a separate function, so this function isn't so long?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
refactored into its own function in impl HtmlHandlebars
src/cmd/serve.rs
Outdated
let routes = livereload.or(book_route); | ||
let book_route = warp::fs::dir(build_dir.clone()); | ||
// The fallback route for 404 errors | ||
let fallback_file = "404.html"; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this will need to check the config for the name of the 404 page.
// Render 404 page | ||
if html_config.output_404 != Some("".to_string()) { | ||
let default_404_location = src_dir.join("404.md"); | ||
let content_404 = if let Some(ref filename) = html_config.input_404 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure why there are two different code blocks here. I think it can be done with a single one by using something like let filename = html_config.input_404.as_deref().unwrap_or("404.md");
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's a case separation there: if the 404 input file is explicitly configured, but not found, that should a hard error. If it's not explicitly configured, a missing 404.md file should not error, but fall back to the default 404 message.
We could also remove the default behaviour, and make it an explicit opt-in.
let default_404_location = src_dir.join("404.md"); | ||
let content_404 = if let Some(ref filename) = html_config.input_404 { | ||
let path = src_dir.join(filename); | ||
std::fs::read_to_string(&path).map_err(|failure| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of map_err
, use with_context
, something like this:
std::fs::read_to_string(&path)
.with_context(|| format!("unable to open 404 input file {:?}", path))?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That reads much better, changed.
) | ||
})? | ||
} else { | ||
"# 404 - Document not found\n\nUnfortunately, this URL is no longer valid, please use the navigation bar or search to continue.".to_string() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Try to keep lines under 100 characters long. Break this up with backslashes.
Also, I'm not sure about the wording here. Maybe combine it with the example you wrote, maybe something like this:
# Document not found (404)
This URL is invalid, sorry. Please use the navigation bar or search to continue.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated default 404 wording and formatting.
"/" | ||
}; | ||
data_404.insert("base_url".to_owned(), json!(base_url)); | ||
data_404.insert("path".to_owned(), json!("404.md")); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure what "path" is used for, but I would guess this should be the path from the config?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
AFAICT, that path is only used to create the path to root for navigation links and script urls, i.e. it should not be the config path in order for the links to work correctly.
…404 page, making links and relative script/css loads behave correctly even in subdirectory paths
- removed config output_404 - ensure serve overrides the site url, and hosts the correct 404 file - refactor 404 rendering into separate fn - formatting
…tput.html.site-url`
67a4d58
to
6d6e540
Compare
Thanks for the thorough review and feedback! I rebased and added the changes, as well as the documentation. I removed the output-404 option as well. Let me know if you have any further suggestions! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
…d-page Generate 404.html page (rust-lang#539)
Prototype spike for a 404 page (cf #539).
This PR generates a 404.html file with sidebar navigation and search intact, to better handle broken or stale links.
Additionally, the builtin server serves this file on 404 errors.
Comments and feedback appreciated!