-
-
Notifications
You must be signed in to change notification settings - Fork 3.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown reader: support GitHub wiki's internal links (#2923) #6458
Conversation
779df1e
to
1909c89
Compare
I have tried to take care of comments:
Sadly, my tests are failing and I do not see what I am doing wrong, if someone can have a look and see something obvious, I would be grateful. |
Any blocking points? |
, test markdownGH "bad link with title" $ | ||
"[[title|random string]]" =?> | ||
para (link "random string" "wikilink" (text "title")) | ||
] | ||
, testGroup "Headers" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does GitHub (or the markdown parser they use in wikis, which I think may be kramdown) have a suite of tests for this syntax? If so, it may be worth finding it and seeing if any relevant tests can be added.
src/Text/Pandoc/Readers/Markdown.hs
Outdated
if T.any (== '|') contents | ||
then do | ||
st <- getState | ||
setState $ st{ stateAllowLinks = False } | ||
label <- mconcat <$> manyTill inline (char '|') | ||
setState $ st{ stateAllowLinks = True } | ||
url <- manyTillChar anyChar $ try $ string "]]" | ||
return $ B.link url title <$> label |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This approach seems potentially unreliable. Example:
[[hello `|` there]]
IN this case contents
will be hello `|` there
, but when we do manyTill inline (char '|')
we will keep parsing past the ]]
, because the |
character occurs within an inline (a code span).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is tricky, and I'm not sure how it should behave, because we don't know whether to treat the content as plain text or inlines until we know whether there's a |
character, but that |
character might be part of regular inline content. I'm really not sure what to do here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I have actually borrowed it from the VimWiki parser, we may have the same issue.
Worth figuring out whether the GitHub implementation of this allows inline formatting in the description part. If it's treated as a plain string, maybe we could do the same, and it would solve the problem noted above. Finding their test suite for this would be nice! |
If there is a test suite, it is well hidden. I have looked for hours, tried babelmark2, no other implementation supports it, no documentation... I begin to doubt of it. I have tried to make it recursive but it causes issues with autolinks. |
Great news! wikilinks are actually way simpler than we expected: https://github.com/blackheaven/pandoc/wiki |
it should be good now |
e0a8a08
to
e191747
Compare
src/Text/Pandoc/Readers/Markdown.hs
Outdated
githubWikiLink = try $ do | ||
guardEnabled Ext_wikilinks | ||
string "[[" | ||
inlinedContents <- option (return $ B.fromList []) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note that B.fromList []
is just mempty
which would be clearer and faster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you want to get both the inline contents and the raw contents, you can use the withRaw
combinator (T.P.Parsing).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I do not know if I did correctly, but it broke the tests
src/Text/Pandoc/Readers/Markdown.hs
Outdated
let allowedInline x = case x of | ||
B.Str _ -> True | ||
B.Space -> True | ||
B.Link {} -> True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are all markdown links allowed? Or just bare URIs?
If just bare URIs, then maybe a better approach would be to use a parser that just targets what we need.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
GitHub's documentation says: [[Link Text|nameofwikipage]]
So there are two questions. 1) What can go in "Link Text"? Your experiments appear to reveal that only plain text is allowed, no special formatting. 2) What can go in "nameofwikipage"? Your experiments show that a fully qualified URL is allowed, but clearly that's not all -- names of wiki pages are also allowed. I imagine that just about any plain text could be the name of a wiki page, so presumably we should allow e.g. [[Link Text|This is my page]]
. For this we should emit a link with URL This is my page
but also a special attribute indicating that it's a wikilink, so that users can do transformations on the path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(OK, looks like that's what you're doing, from the tests!)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is not pertinent to GitHub-flavored Markdown, but it may be informative to note that wikilinks have been growing in popularity due to the explosion of note-taking and (personal) knowledge management tools, in addition to static-site generators and digital gardens, which are often based on Markdown or Markdown-ish syntaxes. Some (nonstandard) implementations allow link text to contain inline formatting, and some even allow page titles (and thus link targets) to contain both formatting and nested links.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyway, seems to me we should try to simplify the parser. Instead of parsing inlines and raw, and then trying to piece things together, why not something like
wikilink = try $ do
string "[["
firstPart <- wikiText
(char '|' *> complexWikilink firstPart) <|> (string "]]" *> return (wikilink with firstPart as link text and (stringify firstPart) as url)
complexWikilink firstPart = do
url <- stringify <$> wikiText
string "]]"
return $ wikilink with firstPart as link text and url as URL
wikiText = many ( whitespace <|> bareURL <|> str <|> endline <|> escapedChar )
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have tried you suggestion, but it seems that I miss the point
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's okay, let me merge your commit, and then I'll see if there's a way to simplify it.
Sorry I left this sit for so long! I've made some comments in the code. |
Changes overview: * Add a `Ext_markdown_github_wikilink` extension * Added in the `githubMarkdownExtensions` list * Added in the `getAllExtensions` list * Add the parser `githubWikiLink` in `Text.Pandoc.Readers.Markdown` * Added it high in the `inline` parsers list * Add two tests, one for each form
e191747
to
8be2b86
Compare
No worries, my sparse time has drastically decreased in the mean time. |
Thanks for this contribution! |
Sorry! After merging this I realized something that has caused me to revert the commit. Since this is designed to be used with I'm sorry I didn't think of this earlier, before you put the time into this PR. Writing a commonmark-hs extension is an entirely different thing, and I need to think more about how it would be done with this syntax. |
The work you've done in constructing the tests and in the extensions module can be re-used once we have the new commonmark-hs module in place. |
I've started to add a |
Sure, no problem |
This is a really exciting extension. I was already about to submit a feature request when I came across this PR. I just want to undermine that this extension isn't only useful for Github Wiki pages. Using Wikilinks in Markdown is getting fairly popular recently and presumably even more in the future with the recent raise of local-first, Markdown-based knowledge management tools like Obsidian, Logseq, Dendron to name just a few. |
I've run into a problem implementing this. |
There are now many incompatible implementations and interpretations of "wikilinks" in existence, which is partly due to the recent growth of note-taking tools that I and others have mentioned above. I foresee that many users of pandoc are going to want to operate on documents containing wikilinks of all kinds. This is a challenge. Perhaps an encompassing solution would be "DIY." Instead of favoring one implementation out of many for a single extension called |
Sorry @hftf, didn't notice your similar remarks. I guess you're right, it's a challenging problem which ideally would be solved in the most flexible way you've outlined, but I can image that this can lead to a maintainability nightmare. From what I can see two alternative extensions should be sufficient if the extensions are limited on capturing the pure links: One for classical Wikilinks ( One remark regarding the understandably frustrated comment
by @jgm in the commonmark-hs issue. It should be noted that GitHub were the ones breaking with the established way. The other tools are using the classical Wikilink-style introduced by UseModWiki and used in almost all Wiki systems [1] and in all the new systems from the new generation of knowledge management tools (as far as I can see). [1]: https://en.wikipedia.org/wiki/History_of_wikis#CamelCase_and_internal_links |
Fair point: GitHub was never great about looking at prior art in making markdown extensions. |
closes #2923
Changes overview:
Ext_markdown_github_wikilink
extensiongithubMarkdownExtensions
listgetAllExtensions
listgithubWikiLink
inText.Pandoc.Readers.Markdown
inline
parsers listNotes:
Thanks by advance for your feedback.