Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

security: replace vulnerable regex with parser #1223

Merged
merged 5 commits into from
Apr 17, 2018

Conversation

davisjam
Copy link
Contributor

Problem: link regex was vulnerable
Solution: dedicated parser

Fixes: #1222

Problem: link regex was vulnerable
Solution: dedicated parser

Fixes: markedjs#1222
@davisjam
Copy link
Contributor Author

If this is merged, a new CI on #1220 should come back clean.

@UziTech
Copy link
Member

UziTech commented Apr 16, 2018

I feel like this complicates the code a lot. Is there no way we can do this with a constructed regex?

@davisjam
Copy link
Contributor Author

Is there no way we can do this with a constructed regex?

I spent about 30 minutes tinkering with regex variations but couldn't find one. The difficulty is that the "destination" can contain characters overlapping with the "title".

The relatively straightforward regex I replaced captured the specification (reasonably) appropriately, though it actually cheated in a few aspects (e.g. that a title can be opened by ' and closed by "). However, because the regex engine is backtracking-based, untrusted input cannot be evaluated safely by such a regex.

@styfle styfle mentioned this pull request Apr 16, 2018
lib/marked.js Outdated
if (m = destinationRe.exec(destination)) {
// <destination> -> destination
var dest2 = m[1].trim();
destination = unwrapCarats(dest2);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

destination is assigned but never used. Is this intentional?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nope. Will fix.

lib/marked.js Outdated
.replace('label', inline._label)
.getRegex();

function unwrapCarats (str) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Carets?? Carat and carrot are different. :)

Angle brackets might be most appropriate if I'm reading the regex correctly.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Haha I was wondering if I was spelling that right. I'll switch to AngleBrackets anyway.

lib/marked.js Outdated
.replace('label', inline._label)
.getRegex(),
link: {
exec: function (s) {
Copy link
Member

@joshbruce joshbruce Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Might be worth adding some doc blocks to introduce the why behind some of this...nothing too major, just to help those new to the code.

@davisjam
Copy link
Contributor Author

@styfle @joshbruce fbf93a8 addresses your comments.

lib/marked.js Outdated
}

var destinationRe = /^(<?[\s\S]*>?)/;
if (m = destinationRe.exec(destination)) {
Copy link
Member

@styfle styfle Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The two if blocks are nearly identical. Can you make a common function for those? Something like this:

function getMatch(r, fullMatch) {
  var m = r.exec(fullMatch[2]);
  if (m) {
    var dest = unwrapAngleBrackets(m[1].trim());
    var title = m[2];
    return [fullMatch[0], fullMatch[1], dest, title];
  }
}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree, but not a deal breaker for my review.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will do.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lib/marked.js Outdated
}
}

if (match) {
Copy link
Member

@styfle styfle Apr 16, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you flip this so that the if statement is smaller:

if (!match) {
  return null;
}
// ... get dest and title here
return [dest, title];

lib/marked.js Outdated
}

var fullMatch = generalLinkRe.exec(s);
if (fullMatch) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you flip this so that the if statement is smaller:

if (!fullMatch) {
  return null;
}
// ... split and such here
return [fullMatch[0], text, destinationAndTitle[0], destinationAndTitle[1]];

lib/marked.js Outdated
match = parsingRegexes[i].exec(destination);
if (match) {
dest = match[1];
title = match[2];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You don't need to assign dest and title here. Simply use match below.

@davisjam
Copy link
Contributor Author

@styfle Addressed your comments in 47f4388, thank you.

Copy link
Member

@styfle styfle left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM 👍

@styfle
Copy link
Member

styfle commented Apr 16, 2018

@UziTech Can you take a look at the code now? Any other reservations?

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason marked is so fast is because of the speed of regexes. I just don't want this type of thing to become the norm. There is no reason to have another slow markdown parser.

@styfle
Copy link
Member

styfle commented Apr 16, 2018

@UziTech Good point.

Why is CI not running on this PR?

@UziTech
Copy link
Member

UziTech commented Apr 16, 2018

Not sure. That seems to be happening a lot lately. Maybe a travis bug?

@joshbruce
Copy link
Member

@UziTech: Good point. I think parsers like this should be done as a last resort unless we can demonstrate through benchmarking that there isn't a big performance difference...I don't think we should get in the position of asserting something is more performant in all cases.

The CI thing is weird. Is it possible that the key thing is causing issues?

@styfle
Copy link
Member

styfle commented Apr 16, 2018

Looks like this failed lint: https://travis-ci.org/markedjs/marked/builds/367190715

@davisjam Can you try running lint and tests on your machine and fix any issues?

@davisjam
Copy link
Contributor Author

Lint passes now, that CI was on a previous version.

@styfle styfle merged commit 5ab4ae3 into markedjs:master Apr 17, 2018
zhenalexfan pushed a commit to zhenalexfan/MarkdownHan that referenced this pull request Nov 8, 2021
* security: replace vulnerable regex with parser

Problem: link regex was vulnerable
Solution: dedicated parser

Fixes: markedjs#1222
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants