Core: Fixed bug with greedy matching #2632
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
I stumble across this when trying to implement #2115. The problem is that greedy matching is disabled for the last element of the token stream. This is an optimization but it will only correct if the pattern doesn't use lookbehinds (either native lookbehinds or assertions; Prism lookbehind groups are fine).
The fix is to simply remove the optimization. This shouldn't affect performance. Without the optimization in place, all the loops that find the position to insert the greedy token at will short-circuit because there are no tokens after the last token. I honestly don't understand why this optimization was there in the first place.
The test case illustrates the bug in action.
In the first round for the test
bab
, all substrings matching/a/
will be tokenized. The resulting token stream will be:In the second round, all greedy matches of
/^b/
. Obviously, theb
at the start of the string matches, so we will get the token stream:But matching isn't done yet. After the
a
token is skipped, we reach the lastb
. Since it's the last item in the token streamcurrentNode != tokenList.tail.prev
will be false, so we will match it as if it wasn't greedy. This is a problem because it means that the regex will be executed like a non-greedy pattern with the following settings:A match will be found because the
^
assertion matches the start of the substring.This is incorrect behavior. Greedy patterns always have to be matched against the whole string.