Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Find a better markdown processor #2314

Closed
vladikoff opened this issue Nov 18, 2017 · 20 comments
Closed

Find a better markdown processor #2314

vladikoff opened this issue Nov 18, 2017 · 20 comments
Labels
package:markdown-gfm status:discussion type:improvement This issue reports a possible enhancement of an existing feature.

Comments

@vladikoff
Copy link

marked with over 300 issues and 166 pull requests the library is not maintained anymore. The last release was 10 months ago :(

Ref: markedjs/marked#938
Ref: compodoc/compodoc#349
Ref: https://github.com/showdownjs/showdown

@Reinmar
Copy link
Member

Reinmar commented Nov 19, 2017

I'll be boring here (for those who know me), but – why not using a more stable data format than Markdown? Markdown is broken – there are many flavours, no spec and buggy implementations which are incompatible with each other.

Of course, it doesn't make your ticket invalid. We actually implemented this package and people will want to use it. But whenever they try they should think twice whether they really need to. Markdown is like regexp – you have a problem so you pick Markdown to solve it and then you have two problems.

@benedyktdryl
Copy link

@Reinmar I agree in 100%, but still it's widely used by editors and CMSes. Maybe such big company like CKSource can takeover this project or at least try to encourage OS community to do it?

@vladikoff
Copy link
Author

@Reinmar yeah agreed. The reason I filed this ticket for instance was the missing support for RTL in marked. showdown seems to handle RTL in some way, but due to the missing markdown spec there is no clear answer how RTL should be formatted.

@Reinmar
Copy link
Member

Reinmar commented Nov 20, 2017

Maybe such big company like CKSource can takeover this project or at least try to encourage OS community to do it?

If we really were a big company we could consider that, but we're not ;) Besides, we have little business in maintaining a library which we don't use and, what's even more important, which we don't recommend to others (due to the Markdown itself).

Anyway, right now we have no resources to put work in this. That's regardless of the option that we'd choose – which is switching to another library or releasing marked2 (inside joke: which could be called CKM, short of CKMarked :D).

@vladikoff
Copy link
Author

Another Markdown parser suggested by people on the Internet: https://github.com/markdown-it/markdown-it

@szymonkups
Copy link
Contributor

szymonkups commented Nov 23, 2017

Some time ago GitHub moved to CommonMark (with their own extensions): https://github.com/blog/2333-a-formal-spec-for-github-flavored-markdown. This data processor was crated before that and it uses two separate libraries to support conversion to and from GitHubFlavored markdown. I was thinking about creating a CommonMark data processor. It has an official parser that is compatible with the standard. It could be used to convert markdown to our internal format. Thanks to the specification it should be quite easy to write conversion in separate direction. @vladikoff do you think it would be useful for you?

@vladikoff
Copy link
Author

@vladikoff do you think it would be useful for you?

In our project we are okay for now. We will be looking into improving the autoformat plugin rather than the Markdown gfm for now. However, for the sake of this project and 1.0.0 it would be great to release with something that has CommonMark spec support, rather than the outdated marked library.

@oliverguenther
Copy link

oliverguenther commented Nov 25, 2017

why not using a more stable data format than Markdown? Markdown is broken – there are many flavours, no spec and buggy implementations which are incompatible with each other.

This is exactly why the CommonMark specification exists. It has a forum for open discussion on how to move the spec forward so that we can expect a fully compatible MD syntax. Additional flavors will always exist like in GitHub/GitLab Flavored Markdown [1][2] which are basically the same. Both share the CommonMark spec as a basis, so I would place my bets there moving forward.

With the internal AST structure of the CommonMark reference JS implementation, it should be straight forward to extend the library with a CK5 'renderer' that transforms the AST into the internal format. Basically, any parser that has an internal format will do. I'm also looking into markdown-it due to its extensibility and (among many other plugins), GFM table support.

This will get rid of the additional overhead of the intermediate HTML format only to be parsed
again in CKEditor.

I am currently in the process of playing with the existing GFM plugin for CKEditor and will start working on such a renderer to transform the CM AST to CKEditors internal format starting next week for a GitHub-flavored Markdown-backed CKEditor at OpenProject. If anyone wants to discuss the bootstrapping of this move, drop me a mail or reach out via Gitter.

/edit After discussions with Szymon who started the commonmark experiment in t/16, I've adapted that in my current fork the plugin.

@winzig
Copy link

winzig commented Mar 23, 2018

I am getting ready to integrate CKEditor 5 and am interested in exporting markdown, so I was wondering what the status is on some of these forks/branches that are looking to improve things before I dive in.

Our use of CKE will be limited to basic formatting (bold, italics, links, lists, possibly some font sizing). Anyone have any recommendations?

@Reinmar
Copy link
Member

Reinmar commented Mar 23, 2018

some font sizing

And how will font size be represented in your output?

@winzig
Copy link

winzig commented Mar 23, 2018

Essentially, just for headers. Even just having the capability for one header size would be enough.

@Reinmar
Copy link
Member

Reinmar commented Mar 23, 2018

Oh, ok. I thought about font-size :D

@Reinmar
Copy link
Member

Reinmar commented Mar 23, 2018

Anyway, what I wrote in https://github.com/ckeditor/ckeditor5-markdown-gfm/issues/16#issuecomment-345672541 is still true from our side.

@winzig
Copy link

winzig commented Mar 23, 2018

Primary interest in transforming to markdown before sending to our server is for safety. Sanitizing HTML input from users can be ... difficult. ;-) Perhaps there are some ckeditor features related to sanitizing that I'm not yet aware of.

@Reinmar
Copy link
Member

Reinmar commented Mar 23, 2018

CKEditor is a client-side application so it shouldn't be used for anything related to security. After all, someone can simply bypass it and send you a POST request.

As for Markdown being safer... You need a secure Markdown to HTML converter. Please remember that Markdown can contain HTML too. In such a case, sanitizing Markdown... means sanitizing HTML. Of course, perhaps in some converters, you can completely disable support for HTML, but then you should worry about the security of the converter too – I can easily think about potential bugs in Markdown parser that might lead to plain Markdown becoming a dangerous HTML.

So, I think that you'd be comparably safe regardless of whether you'll choose some battle-tested HTML sanitiser or a battle-tested Markdown converter.

I'd recommend checking https://www.npmjs.com/package/dompurify or https://www.npmjs.com/package/sanitize-html and staying on the good side :D

@wwalc
Copy link
Member

wwalc commented Apr 12, 2018

@Reinmar is right regarding the security as introducing a different format of a data, which is later transformed into HTML does not add any level of security, if in fact may make things even harder.

Reading the following resources may help in understanding what we mean:

In short: if you convert to HTML at some point, you should sanitise the final HTML output anyway.

@Reinmar
Copy link
Member

Reinmar commented Apr 12, 2018

I love this quote:

So, is it all lost? Not really. The answer is not to filter the input, but rather the output. After the input text is converted into full fledged HTML, you can then reliably apply the correct XSS filters to remove any dangerous or malicious content.

@Reinmar
Copy link
Member

Reinmar commented Apr 25, 2018

I've just stumbled upon this post: https://talk.commonmark.org/t/beyond-markdown/2787. Great read about complexities inherited from Markdown.

@mlewand mlewand transferred this issue from ckeditor/ckeditor5-markdown-gfm Oct 8, 2019
@mlewand mlewand added status:confirmed status:discussion type:improvement This issue reports a possible enhancement of an existing feature. package:markdown-gfm labels Oct 8, 2019
@mlewand mlewand added this to the unknown milestone Oct 8, 2019
@phanirithvij
Copy link

@vladikoff, @mlewand I think this issue can be closed because marked is under active development again?

@jodator
Copy link
Contributor

jodator commented Aug 18, 2020

As per #7307 comments - It looks that not doing it was a good move as marked was brought back to life ;).

Let's close this as we decided to stay with marked.

@jodator jodator closed this as completed Aug 18, 2020
@Reinmar Reinmar removed this from the unknown milestone Aug 18, 2020
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
package:markdown-gfm status:discussion type:improvement This issue reports a possible enhancement of an existing feature.
Projects
None yet
Development

No branches or pull requests

10 participants