Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Core: Fixed bug with greedy matching #2632

Merged

Conversation

RunDevelopment
Copy link
Member

I stumble across this when trying to implement #2115. The problem is that greedy matching is disabled for the last element of the token stream. This is an optimization but it will only correct if the pattern doesn't use lookbehinds (either native lookbehinds or assertions; Prism lookbehind groups are fine).

The fix is to simply remove the optimization. This shouldn't affect performance. Without the optimization in place, all the loops that find the position to insert the greedy token at will short-circuit because there are no tokens after the last token. I honestly don't understand why this optimization was there in the first place.


The test case illustrates the bug in action.

In the first round for the test bab, all substrings matching /a/ will be tokenized. The resulting token stream will be:

[
	"b",
	["a", "a"],
	"b"
]

In the second round, all greedy matches of /^b/. Obviously, the b at the start of the string matches, so we will get the token stream:

[
	["b", "b"],
	["a", "a"],
	"b"
]

But matching isn't done yet. After the a token is skipped, we reach the last b. Since it's the last item in the token stream currentNode != tokenList.tail.prev will be false, so we will match it as if it wasn't greedy. This is a problem because it means that the regex will be executed like a non-greedy pattern with the following settings:

var pattern = /^b/g;
pattern.lastIndex = 0;
var match = pattern.exec("b");

A match will be found because the ^ assertion matches the start of the substring.

This is incorrect behavior. Greedy patterns always have to be matched against the whole string.

@RunDevelopment RunDevelopment merged commit 8fa8dd2 into PrismJS:master Nov 25, 2020
@RunDevelopment RunDevelopment deleted the core-greedy-tail-string-bug branch November 25, 2020 21:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant