Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tests for nullable quantifiers in RegExps #4185

Merged
merged 2 commits into from
Aug 5, 2024

Conversation

Aurele-Barriere
Copy link
Contributor

The JavaScript regex quantifiers (star, plus, counted repetition) have unique semantics when it comes to matching the empty string.
Optional iterations of quantifiers that match the empty string are forbidden.
This is documented in ECMAScript (point 2.b)
and the PLDI2024 paper Linear Matching of JavaScript Regular Expressions.

In many other regex languages however, the quantifier semantics are different.
Instead of only checking at the end of an iteration, it is forbidden to visit any part of the regex twice without having consumed a character.

In some rare cases, the two semantics give different results.
For instance, matching /(a?b??)*/ on string "ab" matches "ab" in JavaScript (two iterations of the star, each matching one character), but only "a" in other languages.
See for instance https://regex101.com/r/iuVSat/1

I don't believe this JavaScript-specific behavior has been documented in Test262. I suggest adding a test case for this.
Many classical algorithms for regex matching (like NFA simulation or lazy DFA) typically do not implement the JavaScript quantifier semantics.
This led to a bug in the linear engine of V8 (which uses NFA simulation):
Bug Report, Relevant test case in V8, V8 fix, Explanation of the fix (section 4.1).

@Aurele-Barriere Aurele-Barriere requested a review from a team as a code owner August 1, 2024 13:13
Copy link
Contributor

@ptomato ptomato left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the test and for the comprehensive explanation.

Aurele-Barriere and others added 2 commits August 5, 2024 12:58
The JavaScript semantics for a quantifier matching the empty
string are different from other regex languages.
This adds a test that documents this JavaScript-specific
behavior.

This is part of my work at the SYSTEMF lab at EPFL.
@ptomato ptomato merged commit ea37a19 into tc39:main Aug 5, 2024
7 of 8 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants