Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Markdown.parse doesn't parse raw HTML properly. #17837

Open
bicycle1885 opened this issue Aug 5, 2016 · 10 comments
Open

Markdown.parse doesn't parse raw HTML properly. #17837

bicycle1885 opened this issue Aug 5, 2016 · 10 comments
Labels
docsystem The documentation building system markdown stdlib Julia's standard library

Comments

@bicycle1885
Copy link
Member

I noticed Markdown.parse doesn't parse HTML tags properly while writing docs with Documenter.jl (xref: JuliaDocs/Documenter.jl#176).

Most Markdown parsers support this feature, so I think Base.Markdown should do as well.

For example, two consecutive hyphens are recognized as an em dash as follows:

julia> Markdown.parse("<!-- comment -->")
  <!– comment –>

CC: @MichaelHatherly


julia> versioninfo()
Julia Version 0.5.0-rc1+0
Commit cede539* (2016-08-04 08:48 UTC)
Platform Info:
  System: Darwin (x86_64-apple-darwin14.5.0)
  CPU: Intel(R) Core(TM) i5-4288U CPU @ 2.60GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell)
  LAPACK: libopenblas64_
  LIBM: libopenlibm
  LLVM: libLLVM-3.7.1 (ORCJIT, haswell)

@MichaelHatherly MichaelHatherly added the docsystem The documentation building system label Aug 5, 2016
@StefanKarpinski
Copy link
Member

I fear that we're going to end up having a full HTML parser in Base :|

@eschnett
Copy link
Contributor

eschnett commented Aug 8, 2016

Nice! Which Javascript library are you going to use?

@MichaelHatherly
Copy link
Member

I fear that we're going to end up having a full HTML parser in Base :|

Yeah, that's not something I'd like to end up happening. Most markdown parsers seem to just use some regex monstrosities to catch raw HTML, which appears to work alright.

@KristofferC
Copy link
Member

Reminds me of this: http://stackoverflow.com/a/1732454

@bicycle1885
Copy link
Member Author

How about using CommonMark? It already has the libcmark library that supports HTML tags.

@MichaelHatherly
Copy link
Member

Does CommonMark support some form of table syntax yet @bicycle1885? From the last time I looked through the spec I didn't come across anything.

I think it would probably be a good idea to wrap libcmark anyway (at some point) even if it's just to make it easier to check how much of CommonMark we actually adhere to, which is most likely not much at the moment.

@bicycle1885
Copy link
Member Author

No, CommonMark seems to be very conservative to add extensions like table syntax. I'm not sure but I think we can do some preprocessing to convert table syntax extension to HTML tables before passing a string to libcmark.

@MikeInnes
Copy link
Member

I fear that we're going to end up having a full HTML parser in Base :|

In CommonMark, the intention is that it should be possible to recognise HTML using simple rules, i.e. without a full parser. You don't have to do any processing on it so it's not to tricky to match up < and > characters and avoid escaping that section.

@samoconnor
Copy link
Contributor

I currently use this NodeJS hack as a work around for generating AWS documentation: https://github.com/samoconnor/AWSCore.jl/blob/master/src/HTML2MD.jl

It would be nice if HTML in markdown just worked.

@clarkevans
Copy link
Member

clarkevans commented Jun 22, 2021

I don't know if it helps, but there is a lexer I ported from mbostok's htl project, it's at -- https://github.com/MechanicalRabbit/HypertextLiteral.jl/blob/master/src/lexer.jl

Anyway, I don't like this idea, even if Markdown supports it. I'd prefer we use Julia interpolation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
docsystem The documentation building system markdown stdlib Julia's standard library
Projects
None yet
Development

No branches or pull requests

9 participants