-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Detect HTML/JS injection attacks and warn users pro-actively #2886
Comments
@RunDevelopment Would love if you have any thought on this from a Prism.js perspective. |
Adding a console waring is a good idea IMO. Honestly, I don't understand this merging but that's probably the HTML pass-thru you mentioned. (Is this behavior documented anywhere?) For Prism, we have Keep Markup to implement an HTML pass-thru but this can actually cause some issues (PrismJS/prism#2384). |
This only matters on the client-side when retrieving content from an HTML element... it doesn't have any relevance to SSR at all unless I'm missing something. The only case we're trying to prevent here is client-side:
Now that code block is VERY dangerous because that code will execute. It should be escaped... and if this was a more complex example then all sorts of harm could result... so I'm saying with a major version release a breaking change that simply DISALLOWS HTML inside the code block... it must be raw text or an error... the idea being that in development people would catch this problem (because of the visible error) and it would never make it out into the wild (where someone is going to exploit it).
We could add |
Not really, but the idea is simple. We pass HTML thru without touching it in the same position in the content:
We will highlighting this as such (on the browser):
We added keyword wrappers but the HTML passes thru unmolested... making it useful for things like highlighting a line or a piece of code. It's definitely an edge case but one some people really want - but it's a ton of complexity. Occasionally we get an issue here "where did all my HTML go" and this issue was meant to perhaps stave off such issues by building that "education" into the library iself. |
The code someone would use to do Highlight.js SSR is very different than the code they'd use on the client. Most people probably do one or the other not both. |
My concern is that the server might pass already highlighted code to the client, the client rehydrates the pages, and that causes re-highlighting. But maybe that's not a problem? |
That would be user error. And that already doesn't work. IF someone was using it on both sides then you'd just add |
Hi. I'm not sure if I should post a message to a closed issue or open a new issue, but a link to here came into my JS console. I suspect the warning is incorrectly triggered in some cases, where the HTML is correctly escaped. See for example: documenting PHP code that includes HTML strings, all correctly escaped and without a danger to the person reading the page: https://www.pmwiki.org/wiki/PmWiki/WikiStyles#highlight We are using a custom function that finds the elements that need to be highlighted, sets their classes and then calls highlightBlock -- and now highlightElement -- to highlight them. Thank you for looking into it -- please let me know if I can give any relevant information. |
Your very first snippet has embedded unescaped HTML - two hyperlinks. We have no way to know if that might be malicious code or not - code blocks are expected to contain only text, not HTML. If you know for sure you can disable the warning (see documentation), but those hyperlinks will still be removed - we do not process HTML by default any longer. If you need the HTML to be preserved you should see the issue on this subject and may need to use a plugin now: #2889 |
Aha, thank you very much!! Indeed they are automated links to our internal variable documentation. I'll review it and decide what to do. Many thanks! |
@joshgoebel I wanted to use This is the simple code thats triggering it, am I doing something wrong or is this a false alarm? |
Please make a JS fiddle or a concrete real example I can view in the web browser and I'll take a look.  |
I created a simple example on JSFiddle and it throws the same error "One of your code blocks includes unescaped HTML. This is a potentially serious security risk." |
In this example |
All you did (AFAICT) is avoid calling our code that would show the warning - so now you could still be injecting HTML, but you won't get any warning. If you're using See my example and try something similar with your own code... do you see the alert? If two then you're vulnerable. |
@joshgoebel I'm very intrigued, here's how my HTML looks like after the rendering Rendered raw HTML <div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code class="hljs language-bash">$ <span class="hljs-built_in">cat</span> /etc/passwd | grep <span class="hljs-string">"<span class="hljs-subst">$(whoami)</span>"</span>
myuser:x:1000:1000:,,,:/home/myuser:/bin/bash
</code></pre></div></div> This is how it appears on the page Does this mean I didn't really fix the issue? Btw, I changed the code from <script defer>
document.addEventListener("DOMContentLoaded", (event) => {
document.querySelectorAll("pre code").forEach((el) => {
el.innerHTML = el.textContent;
});
hljs.highlightAll();
});
</script> |
I find this problematic in that it's far too vague... what matters is the RAW HTML coming from the web server... if "after render" includes after JS has been run then it's almost meaningless to tell you anything useful since the entire page could potentially now be different (modified by JS). There is no HTML in your example (escaped or not) so it's hard to know... You need to add some HTML to your "pre-render" files:
If it makes it thru (to the browser), then you have a potential problem. Highlight.js isn't necessary for do any of this testing. |
@joshgoebel This was very helpful! Thank you, you're right :) |
I understand that the problem is, that this data we hand over to the highlightElement function, which you mentioned above, is supposed to be an element with just text content. I wonder, is this Stack Overflow answer a solution for this problem? For example ... ... When I replace on that site, in the developer tools (local overrides enabled), in the index.js, the line ...
... with
... according to that SO answer, and reload the page, then the script is not executed anymore. What is your opinion on this? Does this solve the problem? And if so, couldn't you devs just do this in the location where you log the warning message? Greetings, Nils |
How? The page is rendered long before we have control. At that point it's too late. the HTML has already been rendered, any injected code has already done any damage it intended to do. Again, see the sample above. If you can show me some way to fix it that with PURE JS I'll be very impressed indeed.
I tend to dislike 'cut and paste' solutions where implementors [those doing the cutting and pasting] don't have a full understanding of the actual problem. Please see: https://stackoverflow.com/questions/64772302/is-parsing-html-with-domparser-safe-from-xss with some thoughts on DOMParser in this context. It seems perhaps adequate if one uses it properly and entirely inadequate if not... and I don't think it does anything (in an of itself) to make it clear which is which... unlike say a If one doesn't understand these problems fully it's probably best they read until they learn, or use a platform/library that already solves these problems for them. |
I just see, your library already reads the node's children as I am not sure if the additional warning message helps. It made me think your library does not read the children as textContent, for whatever reason, so I did Yes, you are absolutely right, the server should properly filter and escape. And testing is always good. Regarding the SO question linked by you, the idea is of course to parse the Html string with the DOMParser (which does not execute script tags while parsing), and then get the desired part as .textContent, not as .firstChild (Or, if the elements are needed, to |
I'm not sure why, nothing it says implies that as far as I can see.
It purposely doesn't say "don't worry we use textContent" because that is irrelevant - as already pointed out - and could imply a false sense of security. We only use textContent (and show a warning) to purposely BREAK the HTML content to make them aware of the issue. This does not solve the issue, but hopefully shines some light on it in development rather than production. If you have better verbiage feel free to suggest. In version 12 this will be a hard error and we will not render the content at all (unless someone sets a special flag). This feature can also be enabled in version 11 today to allow errors to be caught in development easier.
I understand the idea myself. |
Ok, I would suggest replacing console.warn("One of your code blocks includes unescaped HTML. This is a potentially serious security risk.");
console.warn("https://github.com/highlightjs/highlight.js/issues/2886");
console.warn(element); with console.warn("One of your code blocks contains unescaped HTML elements.");
console.warn("This allows attackers to execute malicious Javascript code (XSS Attack).");
console.warn("The library will remove all inner HTML elements by reading them as textContent.");
console.warn("This does NOT protect you from these XSS attacks, as this HTML (including Javascript and CSS)");
console.warn("gets executed by your browser when it loads the page.");
console.warn("In version 12, this library will refuse to highlight such elements, unless you set a special flag.");
console.warn("Currently you can disable this warning by doing `hljs.configure({ignoreUnescapedHTML: true});`");
console.warn("See also https://github.com/highlightjs/highlight.js/issues/2886");
console.warn("The element in question is:");
console.warn(element); I would also suggest renaming the function from
That's nice, but you are not the only one reading this thread. A lot of people will, as you point them here. In that context, I may once again point to Preventing XSS Attacks, which is a readable introduction into the topic. |
I think it was implied I meant "in the same amount of words, give or take". :-) Console warnings isn't the place to write a book IMHO... I think I may just replace the link to this issue with a new link to (just created): https://github.com/highlightjs/highlight.js/wiki/security I don't feel it's our job to explain the details of XSS/HTML injection in detail (though perhaps I'm wrong on this), but we can certainly link to those resources. This is something that IMHE requires research and understanding or someone is going to get it wrong. We can be the beginning of someone's learning journey, but probably not the end. |
@joshgoebel Man, thank you, for investing your time in this. This is a good wiki entry. It explains short and clearly what the issue is, including a code example. It states your position on this. It links to helpful locations. One could not wish more. I also like that you included the link to the Acunetix article, which is written in a simple language, and will help beginners to grasp the basic principles of how to defend against XSS. People will have their questions answered with this wiki entry. |
You are a real big boss (in Chinese, big boss means very powerful); Before repair:After repair:update repairjekyll |
sigh This makes me wonder if further changes aren't needed here... It's one thing to prefix advice with "make sure you know what you're doing [first]" but I highly doubt most people are actually doing that step since just disabling the warning does not fix the problem or the broken HTML. The HTML shown above (posted by @y377) with naked The correct solution here would be for Jekyll to sanitize/encode HTML entities properly inside code blocks (or have a mode/setting to do this). |
@y377 Can you expand the console log or include the raw HTML generated so we can see what HTML tags are actually getting generated? The error is only generated when their are children tags inside a The unescaped |
So far Jeykll seems to do the "right thing" of the box. See jekyll/jekyll#8903 Those who are seeing this with Jeykll I'd love to have more information to actually track down the problem... here is looking at you @y377 , etc... |
|
So as concluded on the other thread, don't use TWO different syntax highlighters. Trying to run Highlight.js on top of |
@joshgoebel I can't upvote your post, but I can thank you here. This was my issue. Specifically I'm using highlightjs via grunt-md2html and the docs weren't clear that if I linked to the script AND set options in the config that it would run highlight twice. |
hello @joshgoebel and all, const snippet = pre it works but I still have some issues with the unescaped HTML. |
You need to properly escape You may need to do it on your ClipboardCopy line also, I'm not sure if your framework handles that auto-magically for you or not (it may)... |
@joshgoebel thanks so much for your help. If I remove the function in the snippet, and replace it by "hello word" it works properly without error : but if I add the escaped snippet, it still display the error and and does not handle the espaced code : |
Use the web inspector to expand the |
Thanks again for your feedback @joshgoebel. I did expand it to see were can be the unescaped text. But the whole function appear as you can see. Every {} brakets and () looks to be generating the issue. Should I write the snippet in a different way ? |
Oh you'd need to temporarily hack your HLJS to STOP highlighting so you can see the actual issue... sometimes I forget that JS objects live-update in the log as they continue to change... https://highlightjs.readthedocs.io/en/latest/api.html#configure Turn on |
Just want to add that in TWIG you can just use this before/after your code
|
Is your request related to a specific problem you're having?
#2884
ccampbell/rainbow#249
This comes up again and again (though thankfully not TOO often). Beginners are VERY confused about this whole HTML escaping thing.
The solution you'd prefer / feature you'd like to see added...
While we can't do anything smart about this yet (because we unfortunately allow HTML inside code blocks for "clever" users) I'd like this to change with v11. With v11 we should drop this HTML pass-thru behavior and move it to a plugin (making it very much opt-in). The default behavior should be that HTML is silently dropped and I'd even consider adding some sort of error:
This would of course be a breaking change so we'd need to wait until v11. For 95% of users I can't see the downside to this and it seems we could potentially educate and prevent a lot of harm. Someone wanting the HTML to pass thru would install a plug-in and thus change the behavior, get the old behavior back, etc.
Any alternative solutions you considered...
Silent dropping but no error... but that just leads to support issues... I suppose we could log the error to the console vs actually showing it on the webpage.
The text was updated successfully, but these errors were encountered: