Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maximum call stack size exceeded error happens with a long string literal #42

Closed
sapphi-red opened this issue Jan 24, 2024 · 6 comments · Fixed by #43
Closed

Maximum call stack size exceeded error happens with a long string literal #42

sapphi-red opened this issue Jan 24, 2024 · 6 comments · Fixed by #43

Comments

@sapphi-red
Copy link

When a long string literal is contained in the input code, js-tokens throws Maximum call stack size exceeded error.

Reproduction: https://github.com/sapphi-red-repros/js-tokens-maximum-call-stack-size-exceeded-repro

It happens with

import jsTokens from 'js-tokens'

const code = `const foo = "${'f'.repeat(1000 * 1000 * 10)}"`
const iterable = jsTokens(code, { jsx: false })
for (const token of iterable) {
  //
}

as well.

Original issue: vitejs/vite#15703

@lydell
Copy link
Owner

lydell commented Jan 28, 2024

First finding: It’s the Identifier regex that throws this errors when there’s a very long identifier.

To reproduce (8388575 is the smallest number of “f”s that triggers it, both in Node.js, Chrome and Firefox):

/(\x23?)(?=[$_\p{ID_Start}\\])(?:[$_\u200C\u200D\p{ID_Continue}]|\\u[\da-fA-F]{4}|\\u\{[\da-fA-F]+\})+/yu.test('f'.repeat(8388575))

Interestingly, Safari does not throw an error. It just returns false instead (silently gives up). And earlier: 999999 “fs” is the smallest number triggering the behavior.

Next up: Trying to reduce that regex to make an even smaller reproduction and narrow down exactly when it happens.

@lydell
Copy link
Owner

lydell commented Jan 28, 2024

Ok, so this seems to be enough to make Chrome’s and Firefox’s regex engine sad:

/(?:a|b)+/.test('a'.repeat(8388575))

However, Safari manages that one. Maybe it has some optimization that kicks in for that simple case, but not the real, more complicated one.

This works though:

/[ab]+/.test('a'.repeat(8388575))

(Maybe Safari automatically optimizes to that.)

And adding one plus sign to the original regex makes that pass:

/(\x23?)(?=[$_\p{ID_Start}\\])(?:[$_\u200C\u200D\p{ID_Continue}]+|\\u[\da-fA-F]{4}|\\u\{[\da-fA-F]+\})+/yu.test('f'.repeat(8388575))
//                                                              ^ added!

That works because it’s optimizing for the common case. A very long identifier of alternating f and escapes still crashes (and with a 10x shorter string):

> /(\x23?)(?=[$_\p{ID_Start}\\])(?:[$_\u200C\u200D\p{ID_Continue}]+|\\u[\da-fA-F]{4}|\\u\{[\da-fA-F]+\})+/yu.test('f\\u1234'.repeat(838857))
true

Luckily, having escapes in identifiers is very rare.

Next up: Seeing what other regexes might need a similar optimization, and how to document regex engine limitations.

@lydell
Copy link
Owner

lydell commented Jan 28, 2024

@sapphi-red 'f'.repeat(1000 * 1000 * 10) is just the smallest way to reproduce, right? Can I see the original code that caused this error? Because I can’t imagine there’s a 10 million chars long identifier in it. Maybe a very long string though? 🤔

(Edit: I now realize the original post literally mentions ”long string literal”, and the provided example code does create a string literal. Reading more carefully helps 🤦 )

@sapphi-red
Copy link
Author

sapphi-red commented Jan 28, 2024

Yes, that's just the smallest way.
The reproduction repo uses the orginal code. It contains a long string that is a data uri of a wasm file.
https://github.com/sapphi-red-repros/js-tokens-maximum-call-stack-size-exceeded-repro/blob/main/input.js

@lydell
Copy link
Owner

lydell commented Feb 3, 2024

Released in v8.0.3.

@sapphi-red
Copy link
Author

Thanks 💚

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants