Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gfm tables #1245

Merged
merged 2 commits into from
May 1, 2018
Merged

Gfm tables #1245

merged 2 commits into from
May 1, 2018

Conversation

tomtheisen
Copy link

@tomtheisen tomtheisen commented May 1, 2018

Marked version: 0.3.19

Markdown flavor: GitHub Flavored Markdown

Description

This PR introduces full compliance with GFM tables. (I think) I'm new here, and this is a somewhat bigger code change. Despite my best efforts, I may have messed something up, so feedback is welcome.

Contributor

  • Test(s) exist to ensure functionality and minimize regression (if no tests added, list tests covering this PR); or,
  • no tests required for this PR.
  • If submitting new feature, it has been documented in the appropriate places.

The gfm_tables tests cover this functionality.

Committer

In most cases, this should be a different person than the contributor.

  • Draft GitHub release notes have been updated.
  • CI is green (no forced merge required).
  • Merge PR

@tomtheisen
Copy link
Author

It seems there's some catastrophic backtracking in some of the regexes. For some reason, my local tests did not find these, but I can see a list in the travis-ci link.

@UziTech
Copy link
Member

UziTech commented May 1, 2018

That is strange. No errors show up on my computer either when I run npm run test:redos on this PR.

/cc @davisjam

It looks like the failing tests aren't ones you changed in this PR anyway.

Copy link
Member

@UziTech UziTech left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is awesome 💯 Thanks

Copy link
Contributor

@Martii Martii left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HTML5 compliance?

@@ -944,7 +949,7 @@ Renderer.prototype.tablerow = function(content) {
Renderer.prototype.tablecell = function(content, flags) {
var type = flags.header ? 'th' : 'td';
var tag = flags.align
? '<' + type + ' style="text-align:' + flags.align + '">'
? '<' + type + ' align="' + flags.align + '">'
Copy link
Contributor

@Martii Martii May 1, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Re:

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is GFM compliant, not HTML compliant. Compliance with GFM requires that we break HTML compliance.

https://github.github.com/gfm/#example-192

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tomtheisen

This is GFM compliant, not HTML compliant. Compliance with GFM requires that we break HTML compliance.

That's why it's in the form of a question. Good to know of a major breaking change as @styfle put it in the issue. If we continue to use this package we'll need to undo this for our sanitizer (which we do for HTML5 anyhow) as we are always striving for HTML5 compliance. Anyone using this change that has a HTML5 compliant badge will need to do this as well since it's non-standard or they'll lose their logo/rating.

Since I've been on GH just about as long as they've been around I can make an educated guess that they did this spec for older browser compatibility and also their sanitizer. So when the HTML4.x specifications drop out (presumably around 2020ish when everything hits the fan) the backward compatibility reintroduced here will not render tables correctly and it may need to be reversed. Time will tell.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe extensions can be made based on flavor in the future?? https://rawgit.com/fletcher/human-markdown-reference/master/index.html - Multimarkdown uses the inline style solution, which is not the solution used in GFM - check out the tables example and inspect element. We are definitely focused on two or three flavors to get us there.

If this gets merged now, it would be part of the 0.4, which would indicate breaking change under the zero major. If it gets merged following 0.4 then that release would be 0.5.

@styfle, @UziTech, and @davisjam Maybe a "breaking" label?? to serve as an indicator for everyone...think @styfle adding the name of 0.4.0 to the next release was a good move.

See also #1225 and #746 (We've also talked somewhere else about not making Markdown flavors but the extension.)

@Martii
Copy link
Contributor

Martii commented May 1, 2018

Be interesting to compare this PR on Android as my GFM tables are way messed up there with Android 7/8 and latest Chrome/Firefox too. Desktops have been acceptable for years but portables really look weird. Example rendered page where a GFM table is made and generated with this issue:

desktop

portable

Copy link
Member

@joshbruce joshbruce left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For someone who says they're not sure what they're doing, you did well on this one.

@joshbruce
Copy link
Member

Would like to get a review from @davisjam due to the failing security scan.

@davisjam
Copy link
Contributor

davisjam commented May 1, 2018

Here are the explanations.

  1. As @styfle noted in the associated issue, the REDOS detector has a lag. On new regexes it returns UNKNOWN (translated to SAFE by the eslint plugin) and analyzes them in the background; a subsequent query may return VULNERABLE. This is probably what affected @tomtheisen.
  2. I recently updated the REDOS detector logic to identify new classes of vulnerable regexes. See this commit. For example, /\n+$/ would not have been flagged before but was vulnerable. That particular regex is fixed in the rtrim commit of security: finish fixing unsafe heading regex #1226, though some of the others may still be relevant.
  3. The eslint plugin used in npm run test:redos relies on the vuln-regex-detector module, which maintains a local cache. I have not yet published a version that expires older entries in the event of updates to the REDOS detector (like the one described in the previous bullet). This may be what's affecting @UziTech's runs. The REDOS server previously concluded that these regexes were SAFE and that result would have been cached on developer machines.

Travis is run repeatedly so it overcomes limitation 1 eventually. Travis does not have persistent storage across runs, so it has no cache and thus is not affected by limitations 2 and 3.

If @tomtheisen did not introduce these regexes, the PR is fine from a security standpoint, although a separate issue should be opened to track them. This is relevant to the discussion about a safe API in #1226.

@styfle
Copy link
Member

styfle commented May 1, 2018

I still stand by my solution for a versioned API to avoid this problem in the future. Then we could disable the caching.

@styfle
Copy link
Member

styfle commented May 1, 2018

@davisjam In the meantime, can we merge this PR?

@davisjam
Copy link
Contributor

davisjam commented May 1, 2018

If @tomtheisen did not introduce these regexes, the PR is fine from a security standpoint. I didn't check whether or not he did so.

@UziTech
Copy link
Member

UziTech commented May 1, 2018

@styfle yes, the travis log lists the line numbers of the vulnerable regexes and none of them were touched by this PR

image

@davisjam Just curious, how is the regex / *\| */ vulnerable to catastrophic backtracking?

@davisjam
Copy link
Contributor

davisjam commented May 1, 2018

@UziTech / *\| */ lacks a leading anchor, and a mismatch will be triggered on whitespace followed by no pipe. Each mismatch costs O(n). Meanwhile the V8 regex engine has an outer for all starting offsets loop, thus O(n) * O(n) = O(n^2) complexity.

(11:27:14) jamie@woody ~ $ cat /tmp/dos.js
/ *\| */.exec(' '.repeat(100000))
(11:27:16) jamie@woody ~ $ time node /tmp/dos.js 

real	0m5.742s
user	0m5.734s
sys	0m0.009s

@UziTech
Copy link
Member

UziTech commented May 1, 2018

I see, so any regex that doesn't have a character limit or lead anchor will be subject to catastrophic backtracking?

@davisjam
Copy link
Contributor

davisjam commented May 1, 2018

@UziTech Effectively I think it means that a regex /REGEX/ without a left-anchor (^) is equivalent to a regex /^.*REGEX/. Then decide whether or not the equivalent regex is vulnerable.

zhenalexfan pushed a commit to zhenalexfan/MarkdownHan that referenced this pull request Nov 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants