Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Translate relative links to other .md files during rendering (fixes #588) #589

Closed
wants to merge 1 commit into from
Closed

Translate relative links to other .md files during rendering (fixes #588) #589

wants to merge 1 commit into from

Conversation

udoprog
Copy link

@udoprog udoprog commented Jan 26, 2018

I'm trying to avoid as many is_file checks as possible by analysing the destination when possible.

Adds two dependencies:

  • url, to conveniently parse URLs to determine if they are relative or not.
  • relative-path (A crate of mine), parses only forward slash paths and permits convenient decomposition into components that can be translated.
    This also makes sure that there are no platform-dependent path manipulations.

I can try to avoid using relative-path if wanted, but url is a bit harder.

@Michael-F-Bryan
Copy link
Contributor

I think this really belongs as a Preprocessor (introduced in #532), not in the render_markdown() function. That way we'll be able to use relative links for every backend instead of just the HTML one, and it makes the entire build process a lot cleaner (markdown to html conversion doesn't need to know location on disk, etc).

@udoprog
Copy link
Author

udoprog commented Jan 27, 2018

I'll have a look implementing it as a preprocessor. I suspected this might be an issue with other backends as well.

Does the preprocessor infrastructure have access to a collection of md files which are valid link destinations?

@udoprog
Copy link
Author

udoprog commented Jan 27, 2018

So looking at the existing pre-processor support it seems like we have to do the following:

  • Parse that content as markdown.
  • Replace all links.
  • Dump the parsed content to the content as a string.

Is this correct?

Another thought I had is that different backends might require different handling of cross linking. So this feature might be better suited to be implemented closer to the backends themselves. PDFs as an example, (I think) would want to generate anchors, for which it would be easier if the link destination remained intact. Or similarly if you'd want to build a single-page document.

@Michael-F-Bryan
Copy link
Contributor

Does the preprocessor infrastructure have access to a collection of md files which are valid link destinations?

When we invoke a preprocessor we give it a representation of the entire book loaded into memory, this gives them an idea of what objects they need to update. I believe they also have access to the book's root directory to enable things like the {{#include ...}} helpers. A Book is essentially a collection of BookItems where a Chapter may contain nested book items, forming a tree structure.

Is this correct?

That's pretty much exactly what I had in mind!

So this feature might be better suited to be implemented closer to the backends themselves.

I guess there's no reason why a Renderer can't invoke a Preprocessor on its own copy of the Book. The Preprocessor trait is just a general tool which allows you to apply a set of operations to the items in a book, so there's no reason why you couldn't let the link rewriter Preprocessor take in a function which lets you specify how to rewrite a link... If that makes sense? Another idea is the link rewriter preprocessor could be told what backend will use its output, allowing it to change its logic appropriately.

The main purpose for Preprocessors was to formalize a lot of this and allow us to create reusable tools instead of ad-hoc function calls buried in the middle of a backend.

@udoprog
Copy link
Author

udoprog commented Jan 27, 2018

@Michael-F-Bryan so I just pushed something which is work in progress.

Instead of tying the link translation into the render_markdown function, I've only extended it so that you can provide a LinkFilter hook. This hook is currently hard-coded to do the same thing as before in hbs_renderer.

This hook could be provided from anywhere, including pre-processors if you want it to. I'm honestly not sure if it makes sense to be part of a Preprocessor as it looks here, because it kind of undermines the pre part of the name. Which implies something happening before parsing/rendering, not during.

So the hook has a generic interface (Fn(&RelativePath) -> Option<String>), which can be implemented either as part of the renderer as shared code (as it is now). Or provided through some other modular approach - even just returning a Option<&LinkFilter> from a Preprocessor if that is what you want.

I'm investigating this because I'm not too keen about the whole markdown roundtripping that would have to happen if this were to be implemented as a pre-processor. So I'm hoping this could be a more desirable alternative. It sounded like your larger objection was tying render-specific code into the markdown rendering.

What do you think?

@udoprog
Copy link
Author

udoprog commented Jan 27, 2018

Also: If you're interested in this approach I'll figure out how to change the is_dest implementation to look in the book instead of the filesystem. But it would require some relative path resolution which is currently not present. E.g. figuring out that ../format/summary.md is the same as format/summary.md, but relative to a different location.

@udoprog
Copy link
Author

udoprog commented Jan 28, 2018

So final experiment (promise). I've converted most internal path representations to RelativePath, since they were being treated as such anyways.

This has the following benefits:

  • Most paths are treated as relative paths, so there is less juggling back and forth.
  • Relative paths are portable (when encoded into JSON).

This also permitted me to easily implement relative path comparisons, so warnings are now printed based on the existing chapters. I've added a test page to the example book, which contains bad links, these now print the following warnings:

[WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existent destination: "format"                                      
[WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existent destination: "format/bad.md" 

@Michael-F-Bryan
Copy link
Contributor

I just had another look at this and I think there may have been a misunderstanding. I originally thought this would just be a case of changing any *.md links to *.html, making sure we only update links within the same document (presumably because they don't start with something like http:// and are just a raw path). That's a fairly minimal change in how HTML gets rendered, indeed pulldown-cmark's push_html() function takes an iterator of Events so it'd just be a case of chaining the original pulldown_cmark::Parser with a map() call which will update all links and pass through everything else.

At the moment, mdbook uses the <base> tag so that all links are relative to the book's src/ directory, not the source file's directory. It looks like this PR changes link and path behaviour everywhere from being relative to the book's src/ directory to the source file's directory... Which is going to subtly break a lot of existing documents (effectively all books which have nested chapters).

Additionally, we shouldn't prevent a book from linking outside itself (e.g. links like ../../../foo/bar.html) because a book isn't necessarily served from the root of a website, and you use those types of links to link to pages outside the book. I actually encountered this when running mdbook-linkcheck over The Book (rust-lang/book#1092 (comment)) because I found a bunch of false positives like "../../std/prelude/index.html" doesn't exist..

I'm really sorry about the miscommunication @udoprog! It may be beneficial to break this out into two PRs, one which just changes *.md links into *.html, and then another which makes the change from absolute to relative paths. That second one will probably need more discussion though, seeing as it's a fairly major breaking change.

@udoprog
Copy link
Author

udoprog commented Jan 28, 2018

@Michael-F-Bryan The goal is to be backwards compatible. With the last change I pushed there is no difference in how the rust book second edition is rendered, this claims that the resulting html is identical:

cargo run --manifest-path=$HOME/repo/mdBook-master/Cargo.toml -- build
mv book book-before
cargo run --manifest-path=$HOME/repo/mdBook/Cargo.toml -- build
diff -r book-before book

We do emit the following warning which we can choose to more selectively ignore, but 3 of them are actually links that should be fixed after this change:

2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/prelude/index.html"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/string/struct.String.html"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/io/struct.Stdin.html"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/io/struct.Stdin.html#method.read_line"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/io/type.Result.html"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/result/enum.Result.html"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "ch06-00-enums.html"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/result/enum.Result.html#method.expect"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "ch06-02-match.html"
2018-01-28 12:03:47 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/primitive.str.html#method.parse"
2018-01-28 12:03:48 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/net/enum.IpAddr.html"
2018-01-28 12:03:48 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/option/enum.Option.html"
2018-01-28 12:03:48 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/option/enum.Option.html"
2018-01-28 12:03:48 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../std/collections/index.html"
2018-01-28 12:03:48 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "ch02-00-guessing-game-tutorial.html#handling-potential-failure-with-the-result-type"
2018-01-28 12:03:52 [WARN] (mdbook::renderer::html_handlebars::hbs_renderer): link to non-existant destination: "../../reference/macros.html"

EDIT: this removes these warnings: 5bc7c0d

At the moment, mdbook uses the tag so that all links are relative to the book's src/ directory, not the source file's directory. It looks like this PR changes link and path behaviour everywhere from being relative to the book's src/ directory to the source file's directory... Which is going to subtly break a lot of existing documents (effectively all books which have nested chapters).

This still works. I think. Do you have a test that you can try it on?

Additionally, we shouldn't prevent a book from linking outside itself (e.g. links like ../../../foo/bar.html) because a book isn't necessarily served from the root of a website, and you use those types of links to link to pages outside the book. I actually encountered this when running mdbook-linkcheck over The Book (rust-lang/book#1092 (comment)) because I found a bunch of false positives like "../../std/prelude/index.html" doesn't exist..

This still works for two reasons, only relative links are filtered, and only links which point to an existing document will be translated, see:
https://github.com/rust-lang-nursery/mdBook/pull/589/files?diff=unified#diff-22028e040c17d23dbccf8fc78c428d53R63

Note that RelativePath is not used to change all internal chapter paths. Only the links in the SUMMARY.md are fed into RelativePath after they have been translated to chapters.

I'd be happy to split them up though, in fact, the commits already are split up :). But the is_dest check is harder to implement without translating chapter paths to relative paths (third commit).

@udoprog
Copy link
Author

udoprog commented Jan 28, 2018

@Michael-F-Bryan I've cleaned up and opened #597 separately which contains the conversion to RelativePath. If that looks good, I'd rather have that merged before this to help out with validating link destinations without doing a filesystem check.

@udoprog
Copy link
Author

udoprog commented Jan 30, 2018

@Michael-F-Bryan if you'd like, I can remove the need to convert the project to use RelativePath for this PR. The last commit is almost standalone. The code might look a bit interesting and not catch all corner cases, but that will remove the dependency. Alternatively we can live with the Path::is_file check until it has been converted.

@udoprog
Copy link
Author

udoprog commented Jan 30, 2018

I've now modified this to no longer depend on #597.

@Michael-F-Bryan Michael-F-Bryan self-requested a review February 2, 2018 12:16
@Dushistov
Copy link

Any updates on this? Or project is dead?

@Dylan-DPC-zz
Copy link

yes the project is active but this PR would have diverged a lot since it has been inactive for a while.

@udoprog
Copy link
Author

udoprog commented Jul 29, 2019

@Dylan-DPC I'd be happy to fix it in case there's still interest.

@udoprog
Copy link
Author

udoprog commented Jul 29, 2019

After some digging I realized a fix for this was already merged in #603

@udoprog udoprog closed this Jul 29, 2019
@udoprog
Copy link
Author

udoprog commented Jul 29, 2019

On second look, I'm not sure links to other resources (images, static files) are made currently made relative. Has someone tested this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants