-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Markdown.parse doesn't parse raw HTML properly. #17837
Comments
I fear that we're going to end up having a full HTML parser in Base :| |
Nice! Which Javascript library are you going to use? |
Yeah, that's not something I'd like to end up happening. Most markdown parsers seem to just use some regex monstrosities to catch raw HTML, which appears to work alright. |
Reminds me of this: http://stackoverflow.com/a/1732454 |
How about using CommonMark? It already has the libcmark library that supports HTML tags. |
Does CommonMark support some form of table syntax yet @bicycle1885? From the last time I looked through the spec I didn't come across anything. I think it would probably be a good idea to wrap libcmark anyway (at some point) even if it's just to make it easier to check how much of CommonMark we actually adhere to, which is most likely not much at the moment. |
No, CommonMark seems to be very conservative to add extensions like table syntax. I'm not sure but I think we can do some preprocessing to convert table syntax extension to HTML tables before passing a string to libcmark. |
In CommonMark, the intention is that it should be possible to recognise HTML using simple rules, i.e. without a full parser. You don't have to do any processing on it so it's not to tricky to match up |
I currently use this NodeJS hack as a work around for generating AWS documentation: https://github.com/samoconnor/AWSCore.jl/blob/master/src/HTML2MD.jl It would be nice if HTML in markdown just worked. |
I don't know if it helps, but there is a lexer I ported from mbostok's htl project, it's at -- https://github.com/MechanicalRabbit/HypertextLiteral.jl/blob/master/src/lexer.jl Anyway, I don't like this idea, even if Markdown supports it. I'd prefer we use Julia interpolation. |
I noticed
Markdown.parse
doesn't parse HTML tags properly while writing docs withDocumenter.jl
(xref: JuliaDocs/Documenter.jl#176).Most Markdown parsers support this feature, so I think
Base.Markdown
should do as well.For example, two consecutive hyphens are recognized as an em dash as follows:
CC: @MichaelHatherly
The text was updated successfully, but these errors were encountered: