Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Weird parsing for markdown checkboxes and emphasis #3690

Closed
LarsEKrueger opened this issue May 21, 2017 · 1 comment
Closed

Weird parsing for markdown checkboxes and emphasis #3690

LarsEKrueger opened this issue May 21, 2017 · 1 comment

Comments

@LarsEKrueger
Copy link

This touches #3051 and #863.

I'm working on a filter that handles part of what the two issues above request: checkboxes and vimwiki syntax. During testing, I noticed some weird parsing behaviour of pandoc itself.

Version:
pandoc 1.19.2.1
Compiled with pandoc-types 1.17.0.4, texmath 0.9.4, skylighting 0.3.3

This has been tested on Linux 64 bit.

If you parse a 'half-checked' checkbox, the emphasis is not detected.

$ echo "[o] _mark_" | pandoc -f markdown -t native
[Para [Str "[o]",Space,Str "_mark_"]]

If you parse an empty checkbox, the emphasis is parsed correctly.

$ echo "[ ] _mark_" | pandoc -f markdown -t native
[Para [Str "[",Space,Str "]",Space,Emph [Str "mark"]]]

Dots also work correctly.

$ echo "[.] _mark_" | pandoc -f markdown -t native
[Para [Str "[.]",Space,Emph [Str "mark"]]]

It seems that numbers or letters produce the wrong behaviour, while other characters, e.g. * do not. Also if the string in the middle is longer than one character, it works again.

$ echo "[12] _mark_" | pandoc -f markdown -t native
[Para [Str "[12]",Space,Emph [Str "mark"]]]

It also works with other styles, e.g.

$ echo "[o] ~~mark~~" | pandoc -f markdown -t native
[Para [Str "[o]",Space,Strikeout [Str "mark"]]]
@jgm
Copy link
Owner

jgm commented May 21, 2017

I see what's happening here, I think.

Remember, _ emphasis is disabled "intraword." Pandoc's parser doesn't have a "lookback," so we store the last position of a word character in state, so we can determine if _ comes right after one. The problem is that this can break in some cases where we use parseFromString to get a fallback to avoid backtracking. One such case is reference links, and that's what's happening here, I suspect.

[This should be fixed, obviously. I'm just recording this here so I can remember what I learned in looking at the code just now.]

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants