Sanitize and sanitizer #1232

joshbruce · 2018-04-19T00:24:57Z

joshbruce
Apr 19, 2018
Maintainer

Marked version: 0.3.19

Markdown flavor: n/a

Proposal type: other

What pain point are you perceiving?

Marked is taking on too much; or, at least more than it needs to.

What solution are you suggesting?

It appears at some point someone had the idea of adding sanitization to marked. Later, someone else had the idea of letting people plug in their own sanitizer.

I suggest we go for the latter and deprecate the former. Turning on sanitize doesn't really seem to give us much in all honesty.

UziTech · 2018-04-19T04:06:21Z

UziTech
Apr 19, 2018
Maintainer

I say we remove sanitizing all together and recommend DOMPurify.sanitize(marked(...))

0 replies

Martii · 2018-04-19T06:20:28Z

Martii
Apr 19, 2018

I'm +1 for using any external sanitizers only as we've never really used the internal one here. Even in the DOM we don't utilize it e.g. using false (technically undefined on our end but doc'd currently as false ) default to my knowledge. The one we use is highly configurable and the wheel there is well polished imho.

Be aware that this change could affect other packages so strong announcements and CHANGELOGs are highly appreciated (and of course semver announcements too). :)

0 replies

joshbruce · 2018-04-19T14:02:05Z

joshbruce
Apr 19, 2018
Maintainer Author

@Martii: Thanks for chiming in as well. Right now we don't have a direct way to contact users. I was hoping more people would have added themselves to the "users" portion of the Authors page, to be honest. So, we're counting on the tickets, milestones, and projects to give people the warning.

We'll probably spend most of 1.x preparing people to lose things.

@UziTech: Agreed.

Tagging #1114

0 replies

alystair · 2018-05-27T06:37:23Z

alystair
May 27, 2018

If you're discussing escaping HTML... here's a solution that utilizes a browsers own filtering. If it breaks it means there's a browser vulnerability and you made yourself some nice pocket change via their vulnerability rewards program.

var escape = document.createElement('textarea');
function escapeHTML(html) { escape.textContent=html; return escape.innerHTML; }
function unescapeHTML(html) { escape.innerHTML=html; return escape.textContent; }

This doesn't solve the XSS vuln in the comment above but does for plenty of other things, assuming one doesn't want HTML code passthrough :)

0 replies

styfle · 2018-08-20T13:00:50Z

styfle
Aug 20, 2018
Maintainer

Another 3rd party sanitizer I just found: https://github.com/rhysd/marked-sanitizer-github

I can't say how secure it is but the point remains that marked doesn't need to ship with a sanitizer.

Update: let's make a list of popular sanitizers so our users don't need to search the millions of packages and then they can make a decision:

dompurify: https://www.npmjs.com/package/dompurify
sanitize-html: https://www.npmjs.com/package/sanitize-html
insane: https://www.npmjs.com/package/insane
marked-sanitizer-github: https://www.npmjs.com/package/marked-sanitizer-github

➡️ See my response in #1388 for example code.

0 replies

OrKoN · 2019-02-23T23:34:15Z

OrKoN
Feb 23, 2019

I think it's nice if marked has sanitization built-in.

I tried to use dompurify but I find it too heavy and slow for Lambda environment/nodejs. I have tried sanitize-html but it seems to be simply removing everything that is not allowed and strips class names produced by highlight.js so requires a lot of tuning/testing.

2 replies

tipiirai Feb 10, 2023

Agreed! dompurify is really not a good solution to the problem. It added 12M of extra clutter to node_modules and my application start time is now significantly slower.

UziTech Feb 10, 2023
Maintainer

That is the reason we don't have sanitation built it. In order to do it right we would need to maintain a lot more code and things would go much slower. I'm pretty sure we wouldn't be able to do a better job on either of those fronts than dompurify or any other package focused on sanitizing HTML.

UziTech · 2019-02-24T01:20:08Z

UziTech
Feb 24, 2019
Maintainer

We will most likely have a plug-in to do sanitization with a third party library in the future to make it easy for people to use, but it doesn't make sense for core marked to have sanitization built in because it is not part of the markdown specs.

0 replies

UziTech · 2019-02-24T01:38:22Z

UziTech
Feb 24, 2019
Maintainer

As a matter of fact the common mark spec explicitly allows unsanitized html
Example #138

0 replies

OrKoN · 2019-02-24T08:53:46Z

OrKoN
Feb 24, 2019

A plugin would be very nice. I am not an expert in XSS prevention and HTML sanitization (and I feel many other users aren't too) so it would be awesome if there is a standard/recommended way to do it. I found some articles online with examples like this one https://shuheikagawa.com/blog/2015/09/21/using-highlight-js-with-marked/ which seem to open up a possibility for XSS. Perhaps the plugin can be included by default (or in the getting started guide) because I am afraid many users don't realize the risk and apply the same marked configuration to untrusted data.

0 replies

OrKoN · 2019-02-24T09:25:34Z

OrKoN
Feb 24, 2019

I wrote a benchmark to evaluate different sanitizers https://github.com/OrKoN/marked-benchmark I got some interesting results:

  builtInSanitizer   x 221 ops/sec ±1.48% (85 runs sampled)
  domPurifySanitizer x 148 ops/sec ±2.85% (78 runs sampled)
  noSanitizer        x 216 ops/sec ±0.62% (83 runs sampled)

So with dom purify it seems to be roughly twice slower for me compared to the built-in sanitizer.

P.S. updated the results. The previous message contained the wrong ones.
P.P.S. it'd be nice to create instances of marked
P.P.P.S. the code tag renderer seems to be always getting undefined as escaped param https://github.com/markedjs/marked/blob/master/lib/marked.js#L1200 no matter what the value of sanitize flag is
P.P.P.P.S. it looks like dom purify does not escape but simply removes not allowed tags

0 replies

OrKoN · 2019-02-27T19:05:22Z

OrKoN
Feb 27, 2019

Another 3rd party sanitizer which looks suitable https://github.com/bevacqua/insane

0 replies

lionel-rowe · 2019-03-02T17:47:50Z

lionel-rowe
Mar 2, 2019

FWIW, I just tried plugging in a third-party sanitizer (insane) for use with the html method of the renderer, and I would strongly recommend against this approach. There are a lot of edge cases for which user-supplied HTML isn't recognized as such, especially if it's inline with some markdown or the last line of the input. Example:

const renderer = new marked.Renderer();
renderer.html = html => insane(html);

marked.setOptions({ renderer });

marked('<img src="sdfg" onerror="alert(1)">');
// expected: '<p><img src="sdfg"></p>'
// actual:   '<p><img src="sdfg" onerror="alert(1)"></p>'

It seems like the only safe way of doing this is passing the whole rendered output through the chosen third-party sanitizer, i.e. first set whitelist options that play nicely with your markdown setup (example for insane + marked + hljs), then use as a wrapper like this:

insane(marked(input), insaneOptions);

0 replies

OrKoN · 2019-03-03T09:06:42Z

OrKoN
Mar 3, 2019

@lionel-rowe I am wondering what is your use case? Because for me the expected result for the img case would be: <p><img src="sdfg" onerror="alert(1)" /></p> and not <p><img src="sdfg"></p> I am looking for the solution to keep what is not allowed in the output but escape it instead of removing it? So far I see this behavior only with the marked's sanitize=true option.

0 replies

OrKoN · 2019-03-03T09:10:47Z

OrKoN
Mar 3, 2019

@lionel-rowe I have added your config for insane to my benchmark repo

Does anyone have a good config for DOMPurify for the makred + hljs use case?

0 replies

styfle · 2019-03-03T11:48:27Z

styfle
Mar 3, 2019
Maintainer

@OrKoN See this DOMPurify comment for example usage.

0 replies

lunaru · 2019-08-13T21:03:00Z

lunaru
Aug 13, 2019

@UziTech I just wanted to chime in here to say that I agree with @OrKoN that there are 2 different concepts of "sanitize" that are being conflated. I fear that the move to remove the sanitize from the core of this library is missing out on one of the concepts.

sanitize=true today does not actually sanitize -- it actually escapes. E.g.:

marked("<div>foo</div>", { sanitize: true })

outputs

<p>&lt;div&gt;foo&lt;/div&gt;</p>

This is desired when the user entered content should be escaped HTML. One nice thing about keeping this functionality in the core of marked is that marked is the only functional layer that understands what's output and what's input. Consider this:

marked("\\<div>foo</div>", { sanitize: true })

It's actually part of the Markdown spec to escape "<" with "\" (backslash) to identify html elements. This should also output the same thing as the example from above.

None of the proposed solutions (DOMPurify, et al) solve this problem at all. I actually think this parameter should be renamed escapeInput and its functionality should be kept since it's actually core to recognizing and parsing markdown syntax.

TL;DR: sanitize=true operates on the input and is markdown dependent. Proposed solutions operate on the output and are html operations. These are not the same. Please reconsider removing sanitize functionality from core or just rename sanitize to reflect what it actually does.

0 replies

lunaru · 2019-08-13T21:18:11Z

lunaru
Aug 13, 2019

@koczkatamas I'd like your thoughts on the above comment as well, since it looks like the current deprecation warnings were in your commit

0 replies

koczkatamas · 2019-08-13T21:36:57Z

koczkatamas
Aug 13, 2019

@lunaru well, if you can solve escaping securely, then it's okay on my part, but I am not sure that the current (hardened) sanitize cannot be bypassed and used for XSS that's why I recommended deprecation.

0 replies

lunaru · 2019-08-13T21:40:08Z

lunaru
Aug 13, 2019

@koczkatamas I think the sanitize vs escape concept might be conflated. Why does escape need to be 100% secure when sanitize will ultimately "guarantee" proper sanitization? A proper secure pipeline looks like this: rawText -> escape -> marked -> sanitize. If a user cares about sanitization, they can use a sanitizer. But that's not what sanitize=true actually does. It does escaping, which doesn't need to be fully secure, but it does need to be core to marked, since escaping properly requires proper parsing of markdown syntax.

As an example, this is perfectly valid:

DOMPurify.sanitize(marked(..., { sanitize: true))

And it's not the same as this:

DOMPurify.sanitize(marked(..., { sanitize: false))

That's because the sanitize parameter should actually be called escape

0 replies

UziTech · 2019-08-14T01:20:35Z

UziTech
Aug 14, 2019
Maintainer

@lunaru escaping input and sanitizing output are separate tasks from converting markdown to html, which is why I believe it should be done separately.

marked is the only functional layer that understands what's output and what's input

I agree that marked should allow extensions to access it's tokenized data and alter it like #1232 (comment)

I would like marked to be more extensible so we can focus on the task of converting markdown to html according to the specs, and other developers can focus on other tasks (sanitizing, escaping, etc).

0 replies

lunaru · 2019-08-14T01:59:18Z

lunaru
Aug 14, 2019

@UziTech I don't disagree with the intention. However, I do think before removing the sanitize param, a functional equivalent to escape (aka sanitize=true) should be discussed first. If that means altering tokenized data, that seems reasonable if documented. I'm just cautioning against putting the cart before the horse, where sanitize is removed without a suitable replacement. If you look at all of the existing suggestions (e.g. DOMPurify) they're not even addressing the right functionality, which concerns me, because sanitize=true is not a sanitizer, it's an escaper.

Also, I don't think the problem is that trivial. Consider the following, taken as an adaptation of your comment above:

var entityMap = {
  '&': '&amp;',
  '<': '&lt;',
  '>': '&gt;',
  '"': '&quot;',
  "'": '&#39;',
  '/': '&#x2F;'
};
var dirty = "\\<div>";
var lexed = marked.lexer(dirty);
var cleanLex = lexed.map(token => {
  token.text = token.text.replace(/[&<>"'\/]/g, key => entityMap[key]);
  return token;
});
cleanLex.links = lexed.links;
var cleanHtml = marked.parser(cleanLex);

Note that the output to cleanHtml is incorrect (it outputs <p>&lt;div></p>, which incorrectly double encoded the ampersand) because it doesn't handle the escaped \<div> properly, but this works fine:

marked("\\<div>", { sanitize: true })

This outputs <p><div></p> which is the correct output since \<div> and <div> are equivalent when escaped.

0 replies

UziTech · 2019-08-14T02:23:29Z

UziTech
Aug 14, 2019
Maintainer

before removing the sanitize param, a functional equivalent to escape (aka sanitize=true) should be discussed first.

I agree. The game plan right now is to get to 100% commonmark and gfm compliance then release v1.0 with (almost) all options deprecated, then make marked more extensible and remove options in v2.0

We will try to maintain a few extensions that function like each option we remove.

0 replies

A1rPun · 2019-09-06T09:18:54Z

A1rPun
Sep 6, 2019

I agree sanitizing HTML is not the responsibility for marked. However since the deprecation notice I'm using this combination everywhere where I previously used only marked (browser apps):

import marked from 'marked';
import DOMPurify from 'dompurify';

DOMPurify.sanitize(marked(myVar))

0 replies

azu · 2020-04-20T12:42:52Z

azu
Apr 20, 2020

Pluggable feature is good.
But, it is a bit difficult that use DOMPurify as universal library. (It require to split entry point into browser and node.js)
The reason is that I've created safe-marked.

0 replies

UziTech · 2020-04-20T15:04:48Z

UziTech
Apr 20, 2020
Maintainer

That is exactly what we want to see, the community making extensions 💯. Ideally marked will handle converting markdown to html according to the spec and extensions will allow users to change the functionality.

We will hopefully be releasing v1.0 soon with a new extension system so extending marked will be easier in the future.

0 replies

azu · 2020-04-21T13:16:18Z

azu
Apr 21, 2020

@UziTech Nice!

0 replies

azu · 2020-04-21T14:36:49Z

azu
Apr 21, 2020

I've just created marked-plugin-sanitizer as PoC.
This new extension system looks like custom render system.
I found some problems in this marked.use

Difficult to write post process like sanitizer
- I've avoided this problem by wrapping default Renderer
- https://github.com/azu/marked-plugin-sanitizer/blob/07018001396051795723387e0bcdef78841c57ef/src/SanitizeRender.ts#L10-L43
A plugin can not get marked's options, so user should pass marked option to the plugin.
- But, I've noticed taht Rednder respect marked.setOptions by default.
- https://github.com/azu/marked-plugin-sanitizer/blob/fcc7b586a0a6ad77ee0bd92a6b26e481ff1eb292/test/index.test.ts#L55-L64

So marked-plugin-sanitizer works, but it is tricky code.

0 replies

UziTech · 2020-04-21T20:44:11Z

UziTech
Apr 21, 2020
Maintainer

marked.use is only for modifying the marked options so multiple extensions can override the renderer and tokenizer without overriding each other.

If you are trying to do any pre-processing of the markdown or post-processing of the html your better off just having them send your package the markdown and run it through marked inside your package.

0 replies

Zemnmez · 2020-11-24T01:34:04Z

Zemnmez
Nov 24, 2020

a sanitizer is not really a panacaea for these kind of problems. it would be good to see marked use safe DOM operations (.textContent) to generate DOM nodes instead of raw, potentially unsafe HTML code

0 replies

UziTech · 2020-11-24T01:39:12Z

UziTech
Nov 24, 2020
Maintainer

@Zemnmez marked converts markdown to html not to DOM nodes. Although you could create a renderer that does that.

0 replies

Sanitize and sanitizer #1232

joshbruce Apr 19, 2018 Maintainer

What pain point are you perceiving?

What solution are you suggesting?

Replies: 49 comments · 2 replies

UziTech Apr 19, 2018 Maintainer

joshbruce Apr 19, 2018 Maintainer Author

styfle Aug 20, 2018 Maintainer

UziTech Feb 10, 2023 Maintainer

UziTech Feb 24, 2019 Maintainer

UziTech Feb 24, 2019 Maintainer

styfle Mar 3, 2019 Maintainer

UziTech Aug 14, 2019 Maintainer

UziTech Aug 14, 2019 Maintainer

UziTech Apr 20, 2020 Maintainer

UziTech Apr 21, 2020 Maintainer

UziTech Nov 24, 2020 Maintainer

joshbruce
Apr 19, 2018
Maintainer

Replies: 49 comments 2 replies

UziTech
Apr 19, 2018
Maintainer

joshbruce
Apr 19, 2018
Maintainer Author

styfle
Aug 20, 2018
Maintainer

UziTech Feb 10, 2023
Maintainer

UziTech
Feb 24, 2019
Maintainer

UziTech
Feb 24, 2019
Maintainer

styfle
Mar 3, 2019
Maintainer

UziTech
Aug 14, 2019
Maintainer

UziTech
Aug 14, 2019
Maintainer

UziTech
Apr 20, 2020
Maintainer

UziTech
Apr 21, 2020
Maintainer

UziTech
Nov 24, 2020
Maintainer