-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
security: finish fixing unsafe heading regex #1226
Conversation
Looks like the tests are failing because the new test takes more than a second to pass. Is the test supposed to be that slow? |
You successfully demonstrated the blow-up 😃 |
It was "worse" before the patch. I didn't measure carefully. I know I say this a lot, but I can't see a good solution for these without (1) weakening the spec, or (2) writing a parser. |
We can do some parsing in Lexer.prototype.token Lines 235 to 244 in 4711f6b
|
Ya similar to #1227. changing |
OK. I will give this a try. Thanks for the pointer. |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
901b222 makes the regex more generous and applies parsing during tokenizing. The test case still fails but due to another regex now. Progress! |
lib/marked.js
Outdated
@@ -16,7 +16,8 @@ var block = { | |||
code: /^( {4}[^\n]+\n*)+/, | |||
fences: noop, | |||
hr: /^ {0,3}((?:- *){3,}|(?:_ *){3,}|(?:\* *){3,})(?:\n+|$)/, | |||
heading: /^ *(#{1,6}) *([^\n]+?) *(?:#+ *)?(?:\n+|$)/, | |||
// cap[2] might be ' HEADING # ' and must be trimmed appropriately. | |||
heading: /^ *(#{1,6})([^\n]*)(?:\n+|$)/, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could we replace [^\n]
with .
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes haha
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth noting that this regex is substituted into block.paragraph
and the follow-up parsing will not be performed there. I think we're OK spec-wise but might want to double-check.
FWIW Here's a Node-only way to deal with all REDOS issues:
Any interest in offering an API along these lines (with WebWorker equivalent in browser)? I can run some performance numbers if interested. |
@davisjam I would say a new module. In fact, this is small enough to just be in the docs as BEST_PRACTICES.md |
I still think security should trump other visions for the product. We could bench against a standard use case for a solution. Then, for the security (mostly DOS) side of things, let it run longer because the chances are slim that someone is going to hit that. This gives us an A-B test. A for the most likely case and B for the edge security-vulnerable case. Would like to put this in 0.4.0 as it solves a security issue. We can iterate on it later as we focus on the other missions:
We've demonstrated multiple times that Marked isn't as fast as it could be and no longer faster than other compilers. |
@davisjam I submitted a PR to this PR that fixes all of the CM ATX header examples except one: davisjam#1 the example that still fails is because of whitespace so I'm not sure exactly how to fix that one yet:
|
I'll take a look soon. |
@UziTech In your PR you said one case was still failing. When I run |
Apply similar patch to a similar heading regex. Follow-on to f052a2c. Test: Add a test case to demonstrate the slower blow-up.
0ceca90
to
15ac630
Compare
Problem: replace(/X+$/, '') is vulnerable to REDOS Solution: Replace all instances I could find with a custom rtrim
15ac630
to
0cfe39e
Compare
lib/marked.js
Outdated
@@ -4,6 +4,38 @@ | |||
* https://github.com/markedjs/marked | |||
*/ | |||
|
|||
// Return str with all trailing {c | all but c} removed | |||
// allButC: Default false | |||
function rtrim(str, c, allButC) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should there be unit tests for this function?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We would need to export the function. I think that would constitute testing the implementation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree with @UziTech.
Spec: No leading whitespace within a paragraph Fix: I strip leading whitespace on each line while rendering paragraphs Unanticipated problem: Causes the (previously failing but hidden) CommonMark 318 to fail. I added that to the list of expected-to-fail tests. This commit addresses a failing header test case noted in 943d995 whose root cause was paragraph rendering not up to spec.
b55871f
to
dd26af8
Compare
Sketch implementing text regex as a linear-time RegExp imitator. - A few nits here and there - I haven't tested all of the offsetOfX routines, so 'npm run test' hangs on some bug
Looking for feedback on the approach in 24d4a5e. It's a WIP (an off-by-one error somewhere) but I want to know whether you guys like the direction. |
One of the benefits of having the regexes do most of the work is that it makes marked easy to extend. If I want to use marked but change the way text is parsed I can with: marked.InlineLexer.rules.text = /some other regex/; If we are going to start moving away from just regexes we need to figure out how to keep marked easily extendable. |
On a security note. I also think we need to distinguish ReDoS attacks between "Can be triggered by a malicious actor" and "Can be triggered accidentally in a normal context". I don't think it is our job to prevent a malicious actor from causing marked from taking a long time to parse the input. We should educate our dependents on the safe way to deal with parsing user input. (i.e. web worker/vm.runInNewContext)
|
@UziTech Solid points. Extendability Agreed. Since I'm retaining the Security Pursuing the direction you advocate would mean documenting a threat model for marked. This would tell users and security researchers what we consider a threat and what we consider "not our problem". But it's difficult to say what might make a "normal person" add 1000 spaces here or there, and it's also difficult to estimate the number of spaces necessary for a noticeable lag --- mobile vs. microservices vs. VMs vs. bare metal vs. ... But personally I think saying "use module at your own risk, see threat model" is just another way of saying "don't use this module". The input will be untrusted in many cases, since many markdown use cases involve user-specified input being rendered on someone else's machine (e.g. GitHub and StackOverflow both let you enter Markdown in comment sections). It would make more sense to me to expose a safe API (perhaps with an unsafe faster version). Such an API would solve all of these REDOS problems in one shot and let us focus our energies elsewhere. |
See, I think marked is that unsafe faster version. There are many other (safer) markdown parsing libraries. marked's "claim to fame" is it's speed and compliance with the original (pedantic) test suite |
Do we know how true this is anymore? There are many markdown libraries, but this one has 3K dependents on npm. I feel reasonable security guarantees trump performance for any software, especially software with a large userbase. |
Leaning toward @davisjam on this one. Back to the priorities:
If I had to add extensibility it would be 6.
Agreed. (Maybe creating a threat model would be beneficial anyway. Give white hats a block of wood to widdle on - new challenges + new badges.) It seems there are two major topics of concern keeping us from merging this: speed and extensibility, neither of which supersedes the priority of security. As long as it's not a "serious" performance hit I say security wins and it seems like @davisjam is doing his best to strike a balance; especially if the performance hit doesn't show up until someone puts in 500+ blank spaces. Marked isn't the fastest kid on the block anymore (if it ever was) from the stats we've seen: #963. The primary reason @UziTech and I showed up was security, seems like we might be wavering a bit. (Marked appears to be twice as slow as Showdown in #963.) Maybe we should put in a more proper and consistent benchmarking methodology and add it to Travis. Then actually capture the benchmark somewhere (right now we have the 1s, which doesn't seem accurate given the size of the tests). Give us the ability to say "we won't go slower than" - unless X, Y, Z. With the JSON we can even put a target Marked is going to change as we progress. I even mentioned this to @chjj at the beginning of the transition just before getting the org...it may not resemble the code he wrote...it's claim to fame when we showed up was that it was dead but no one had the heart to put it down humanely. Now it's maybe 60% spec compliant with CommonMark and GFM. Uses GFM by default (not sure how many people actually use Pedantic - with or without Marked - most of us didn't even know how to get our hands on the original Perl...think a couple of spike contributors didn't know about Gruber and Daring Fireball). It's trying to come back to life. What if Marked's new claim to fame is that it is the safest, most spec-compliant, and fastest. (I've definitely appreciated our security solutions since @davisjam showed up.) |
@davisjam @joshbruce I agree with that list of priorities and security should come first. I'm just saying there is such a thing as too much security and we need to draw that line somewhere. It would be really nice to get some sort of benchmarking so we can tell if these types of changes do in fact affect performance rather than just guessing. P.S. I'm not against the changes in this PR or in 24d4a5e I just think we need to watch where that line is. |
@UziTech: I think it will be easier to draw the lines once we have a way to measure impacts on performance until then it's mainly conjecture. Therefore, I think drawing the line toward security is the better way to go. To @davisjam's point, I think we can also build the publicly safe API during 0.x then scale back a bit through options once we have a way to better measure performance. We can do that work in #963 for something after 0.4. We're starting to rub up against the 2-week target for security things. What, if anything, can I do to help? |
|
The benchmarks were updated in #1019 |
@davisjam is this PR still being worked on? |
I'm going to close this PR. The changes will need to be moved to the |
Apply similar patch to a similar heading regex.
Follow-on to f052a2c (#1224).
Test: Add a test case to demonstrate the slower blow-up.