-
Notifications
You must be signed in to change notification settings - Fork 3.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Follow GFM spec on EM and STRONG delimiters #1686
Conversation
This pull request is being automatically deployed with Vercel (learn more). 🔍 Inspect: https://vercel.com/markedjs/markedjs/qv2lelel1 |
I don't have a lot of time to look at this right now but it could be a conflict with lists since |
Added a check for the previous character to the *em* Tokenizer. Needed to pass any tests where the em block starts with a punctuation character (e.g. commonmark example 368)
OK, so I figured out what the issue was. I now have it working quite well except for two test cases I could use some help on:
Note, I removed a line from the Also note, I haven't applied these changes to the |
The Basically making marked use the original spec instead of CommonMark. |
OK gotcha. My change to the em tokenizer confused it. I can fix this pretty easily. Any insight on detecting whether or not a set of square brackets is a reflink or not? |
Can anyone help me out with detecting when square brackets are part of a reflink or not for commonmark Example 519? @UziTech Would you have any suggestion? |
You could try a lookahead or you might need to do some parsing in the |
Modifies the em rule after the block tokens are generated to detect known reflinks and skip over them so they don't get mistakenly italicized.
Tada!! My solution was to inject known reflink labels into the em rules right after the block sequence in the lexer is finished. This way the em rule can properly skip over any links that might contain I'm hoping this tweaking of the lexer is alright. I'm not sure if there's some way people could inject malicious regex this way by giving their link labels some weird names, but I assume it would just require some further character escaping on the label names before injection. If this looks good, I can move forward with the |
Now fixes three more cases
Underscore em rules added. Fixes 3 more examples (371, 372, 406) I would love some feedback on this PR! |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great work!
What do you guys think? Is this baby ready to go? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM for ReDoS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great. Thanks for all your hard work. 💯
So 424 and 425 are considered out of scope of this change set? Or more simply:
(Should be |
@brainchild0 if you want to fix it I would be happy to review a PR 😁 |
For me the starting point is understanding what makes this particular case more challenging than the ones that have been resolved in this merge. Intuitively they all seem roughly equally complex. Obviously, I make the remark abstractly, having no familiarity with the design. Or another way to ask the question, starting from the top and moving down: The general rule is to collect a stack of emphasis delimiters, which may be any of |
The current implementation does not use a stack at all. It simply checks for existence of a left delimiter, then if found, find the first available matching right delimiter. Finally, ensure the text between the two is valid, meaning ignore any delimiters found inside links or code spans etc., and any other delimiters inside must occur in even pairs. If not valid, get the next possible end delimiter and check the middle again, until you run out of matching delimiters or you get a valid middle. We already have regex for the left and right delimiters, so it would just be the extra effort of building up a stack in the tokenizer. |
There is a good chance that this PR introduced #1754. |
Starting toward better adherence to the GFM spec on Emphasis, specifically Left-flanking-delimiter-runs.
Marked version: 1.1.0
Markdown flavor: CommonMark|GitHub Flavored Markdown
Description
(em) 341, 367, 368, 371, 372, 379, 390, 406, 417, 441, 444
(strong) 391, 397, 399, 400, 401, 431, 443, 471, 475, 476, 479, 480
What was attempted
Applying the GFM spec for Left-flanking-delimiter-runs and right-flanking-delimiter runs more accurately for EM tags and STRONG tags.
Contributor
Committer
In most cases, this should be a different person than the contributor.