Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Maglev on x64 causes segmentation fault while running TypeScript #52797

Closed
remcohaszing opened this issue May 2, 2024 · 45 comments
Closed

Maglev on x64 causes segmentation fault while running TypeScript #52797

remcohaszing opened this issue May 2, 2024 · 45 comments
Labels
confirmed-bug Issues with confirmed bugs. linux Issues and PRs related to the Linux platform. regression Issues related to regressions. v8 engine Issues and PRs related to the V8 dependency. v22.x v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch.

Comments

@remcohaszing
Copy link

Version

v22.0.0

Platform

Linux vali 6.8.0-76060800daily20240311-generic #202403110203171407766522.04~4c8e9a0 SMP PREEMPT_DYNAMIC Thu A x86_64 x86_64 x86_64 GNU/Linux

Subsystem

No response

What steps will reproduce the bug?

On Linux using Node.js 22:

git clone git@github.com:remcohaszing/typescript-bug-58369.git
cd typescript-bug-58369
npm ci
tsc

See also this failed GitHub action: https://github.com/remcohaszing/typescript-bug-58369/actions/runs/8899456400/job/24438867767

How often does it reproduce? Is there a required condition?

For this reproduction it’s reproduced consistently on Linux on both my machine and GitHub actions.

While troubleshooting by trimming down the content of node_modules/@types/mdast/index.d.ts, I got into a state where it seemed to happen randomly. The major factor is the 👉 emoji in a comment.

The error did not occur on macOS in the GitHub action, but it did happen consistently for @wooorm on their macbook.

The problem was not reproducible on Windows.

What is the expected behavior? Why is that the expected behavior?

No segmentation fault

What do you see instead?

Segmentation fault (core dumped)

Additional information

This was originally reported to TypeScript: microsoft/TypeScript#58369. This issue contains more information.

This has coincidentally already been fixed for the upcoming TypeScript 5.5. Still, a segfault should not occur.

We were unable to make a smaller reproduction.

@RedYetiDev
Copy link
Member

Hi! Could you possibly provide some example code to reproduce this? Preferably code that has been compiled into plain JS.

@remcohaszing
Copy link
Author

I would absolutely love to. Unfortunately the bug is reproduced by running tsc. Neither I nor the TypeScript team have been able to make a small reproduction.

It was caused by microsoft/TypeScript#53081 and fixed by microsoft/TypeScript#58339 (unreleased yet).

@RedYetiDev
Copy link
Member

AFAICT this doesn't seem like an issue with Node.js itself, but rather a compiler (such as tsc), but I'm no expert, and I'd love to get a second opinion.

@RedYetiDev RedYetiDev added the help wanted Issues that need assistance from volunteers or PRs that need help to proceed. label May 2, 2024
@remcohaszing
Copy link
Author

Sorry, I now see I forgot to provide a critical piece of information. This is a regression in Node.js 22.0.0. It wasn’t a problem before.

@RedYetiDev
Copy link
Member

Ahh, okay, thank you.

@RedYetiDev RedYetiDev added regression Issues related to regressions. v22.x v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch. labels May 2, 2024
@mcollina
Copy link
Member

mcollina commented May 2, 2024

@targos I have a feel we rushed the V8 upgrades.

cc @nodejs/v8 @RafaelGSS

@RedYetiDev RedYetiDev added v8 engine Issues and PRs related to the V8 dependency. and removed help wanted Issues that need assistance from volunteers or PRs that need help to proceed. labels May 2, 2024
@targos
Copy link
Member

targos commented May 2, 2024

It's probably related to V8, but I'm not sure waiting would have changed anything? We released v22.0.0 with the version of V8 that's in current Chrome.

@targos
Copy link
Member

targos commented May 2, 2024

Seems specific to Linux or x64 as I cannot reproduce on ARM64 macOS.

@targos
Copy link
Member

targos commented May 2, 2024

We also don't know which version of V8 introduced the bug (assuming it's in V8).

@targos
Copy link
Member

targos commented May 2, 2024

So, it's specific to x64. I can reproduce with node-v22.0.0-darwin-x64 on Rosetta.

@targos
Copy link
Member

targos commented May 2, 2024

I'm going to compile a debug build on one of the Hetzner machines to get a meaningful stack trace.

@wooorm
Copy link

wooorm commented May 2, 2024

I’m on macOS and repro it consistently btw

@targos
Copy link
Member

targos commented May 2, 2024

@woorm ARM or Intel?

@wooorm
Copy link

wooorm commented May 2, 2024

It's probably related to V8,

The code that started/stopped crashing in TS had do to with indexing into strings.
The string that TS was looking into starts/stops crashing when there is/isn’t an emoji.

One TS maintainer potentially saw the crash appear/disappear when adding a console.log somewhere. So it may be related to some optimization routine

@wooorm
Copy link

wooorm commented May 2, 2024

I’m on an 2.6 GHz 6-Core Intel Core i7, Sonoma 14.4.1 (23E224).
I have a bunch of the GNU utils tho. So perhaps there could be something that doesn’t happen on mac normally, but my machine looks more like Linux.

@RedYetiDev RedYetiDev added confirmed-bug Issues with confirmed bugs. repro-exists Issues with reproductions. and removed repro-exists Issues with reproductions. labels May 2, 2024
@RedYetiDev
Copy link
Member

Ignore the repro-exists tag, I didn't mean to add it, and it won't effect anything.

@RedYetiDev RedYetiDev added the linux Issues and PRs related to the Linux platform. label May 2, 2024
@jakebailey
Copy link

jakebailey commented May 2, 2024

Just to give a side by side using https://github.com/remcohaszing/typescript-bug-58369:

$ grep -A8 'function scanJSDocCommentTextToken' ./node_modules/typescript/lib/tsc.js
  function scanJSDocCommentTextToken(inBackticks) {
    fullStartPos = tokenStart = pos;
    tokenFlags = 0 /* None */;
    if (pos >= end) {
      return token = 1 /* EndOfFileToken */;
    }
    for (let ch = text.charCodeAt(pos); pos < end && (!isLineBreak(ch) && ch !== 96 /* backtick */); ch = codePointAt(text, ++pos)) {
      if (!inBackticks) {
        if (ch === 123 /* openBrace */) {
$ node ./node_modules/typescript/lib/tsc.js
[1]    1090384 segmentation fault  node ./node_modules/typescript/lib/tsc.js

Now, add debugger to the loop (console.log works too but is loud):

$ grep -A8 'function scanJSDocCommentTextToken' ./node_modules/typescript/lib/tsc.js
  function scanJSDocCommentTextToken(inBackticks) {
    fullStartPos = tokenStart = pos;
    tokenFlags = 0 /* None */;
    if (pos >= end) {
      return token = 1 /* EndOfFileToken */;
    }
    for (let ch = text.charCodeAt(pos); pos < end && (!isLineBreak(ch) && ch !== 96 /* backtick */); ch = codePointAt(text, ++pos)) {
      debugger; // ADDED
      if (!inBackticks) {
$ node ./node_modules/typescript/lib/tsc.js

I have not been able to extract out a test which just calls the parser via the public API, nor by extracting this code and giving it the same inputs.

@RedYetiDev
Copy link
Member

Weird, thanks for the information!

@targos
Copy link
Member

targos commented May 2, 2024

(gdb) run node_modules/typescript/lib/tsc.js
Starting program: /home/iojs/tmp-targos/node/out/Debug/node node_modules/typescript/lib/tsc.js
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/x86_64-linux-gnu/libthread_db.so.1".
[New Thread 0x7ffff7a4f640 (LWP 18826)]
[New Thread 0x7ffff724e640 (LWP 18827)]
[New Thread 0x7ffff6a4d640 (LWP 18828)]
[New Thread 0x7ffff624c640 (LWP 18829)]
[New Thread 0x7ffff5a4b640 (LWP 18830)]
[New Thread 0x7ffff51a9640 (LWP 18831)]

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x00005554d878345d in ?? ()
(gdb) bt
#0  0x00005554d878345d in ?? ()
#1  0x00001967bd580c69 in ?? ()
#2  0x000000000000200e in ?? ()
#3  0x0000200e00000000 in ?? ()
#4  0x0000000000000000 in ?? ()

Perfect 🙃

@jakebailey
Copy link

Just to note it, you can also add noop(); instead of debugger; if you want a debugger-statement free crasher.

@targos
Copy link
Member

targos commented May 2, 2024

This is due to the Maglev compiler. I confirm that ./configure --v8-disable-maglev fixes it.

Maglev was enabled in #51360

/cc @kvakil

@remcohaszing
Copy link
Author

remcohaszing commented May 12, 2024

The original issue is TypeScript processing @types/mdast, which is used by remark plugins such as remark-emoji. So this is the exact same issue.

remcohaszing added a commit to remcohaszing/DefinitelyTyped that referenced this issue May 13, 2024
lpsinger added a commit to lpsinger/afm that referenced this issue May 14, 2024
lpsinger added a commit to lpsinger/afm that referenced this issue May 14, 2024
lpsinger added a commit to nasa-gcn/afm that referenced this issue May 14, 2024
wooorm added a commit to unifiedjs/unified that referenced this issue Jun 6, 2024
@monsanto
Copy link
Contributor

monsanto commented Jun 7, 2024

Given that there hasn't been any activity here in a while or on the V8 tracker at all--is there any appetite for reverting Maglev being on by default? Maybe this is an overreaction to Node 22 constantly segfaulting on my desktop, but as-is I wouldn't feel comfortable putting it in production.

It would be different if I could just put --no-maglev-inlining in NODE_OPTIONS, but that doesn't work either.

43081j added a commit to 43081j/shiki-element that referenced this issue Jun 17, 2024
There's some nasty segfault being dealt with upstream in node itself.

Until that gets fixed, lets use 20.x.

Issue can be tracked here:
nodejs/node#52797
43081j added a commit to 43081j/shiki-element that referenced this issue Jun 17, 2024
There's some nasty segfault being dealt with upstream in node itself.

Until that gets fixed, lets use 20.x.

Issue can be tracked here:
nodejs/node#52797
@dharesign
Copy link
Contributor

FYI this issue with maglev inlining also affects Node 22. With this issue there's no segfault, but the JS doesn't do what it's supposed to.

$ cat foo.js
function issue() {
  let problems = [];

  let i = 0;
  function increment() {
    i = i + 1;
  }
  while (i < 100000000) {
    let prevI = i;
    increment();
    if (i == prevI) {
      problems.push(`${prevI} ${i}`);
    }
  }

  return problems;
}
let didItBreak = issue();
console.log(didItBreak.length);

$ ~/Downloads/node-v22.6.0-darwin-x64/bin/node foo.js
21314

@mcollina
Copy link
Member

mcollina commented Aug 8, 2024

Apparently there seem to be backport available for the latter.

@nodejs/v8 are we planning an update of v8 before we hit LTS?

@targos
Copy link
Member

targos commented Aug 8, 2024

A full V8 update in v22 is very unlikely (too many breaking changes in the C++ API)

@joyeecheung
Copy link
Member

joyeecheung commented Aug 14, 2024

Looking at the related bugs I think we should disable maglev by default for v22, considering it will get stuck with V8 12.4 for ~2 years, and it will probably be increasingly difficult to backport any maglev patches to it even if they get fixed in the upstream. We could continue enabling it on the main branch and maybe consider enabling it in v23 to test it out, but it doesn't look great to ship the crashes to LTS.

@mcollina
Copy link
Member

It seems the safest approach, considering that there isn't really much we can do.

RafaelGSS pushed a commit that referenced this issue Sep 6, 2024
It needs a handle scope for the context handle. Since the
FastApiCallbackOptions struct doesn't have isolate on it
in V8 12.4 on Node.js 22, use Isolate::TryGetCurrent() to
get to the isolate needed for the handle scope creation and
fallback to the slow callback if no isolate is entered.

PR-URL: #54384
Refs: #52797
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
RafaelGSS pushed a commit that referenced this issue Sep 6, 2024
This reverts commit 1a5acd0.

Reason to revert: we have seen several crashes/unexpected JS behaviors
with maglev on v22 (which ships V8 v12.4). The bugs lie in the codegen
so it would be difficult for users to work around them or even figure
out where the bugs are coming from. Some bugs are fixed in the upstream
while some others probably remain. As v22 will get stuck with V8 v12.4
as LTS, it will be increasingly difficult to backport patches for them
even if the bugs are fixed. So disable it by default on v22 to reduce
the churn and troubles for users.

PR-URL: #54384
Refs: #52797
Reviewed-By: Richard Lau <rlau@redhat.com>
Reviewed-By: Rafael Gonzaga <rafael.nunu@hotmail.com>
Reviewed-By: Chengzhong Wu <legendecas@gmail.com>
Reviewed-By: Yagiz Nizipli <yagiz@nizipli.com>
Reviewed-By: Michaël Zasso <targos@protonmail.com>
Reviewed-By: Luigi Pinca <luigipinca@gmail.com>
Reviewed-By: Matteo Collina <matteo.collina@gmail.com>
Reviewed-By: James M Snell <jasnell@gmail.com>
@targos
Copy link
Member

targos commented Sep 19, 2024

Maglev was disabled on v22.x with #54384

@targos targos closed this as completed Sep 19, 2024
@targos targos added v22.x v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch. and removed v22.x v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch. labels Sep 19, 2024
@jakebailey
Copy link

It's still enabled on main, right? So it's still not fixed until V8 is fixed?

@targos
Copy link
Member

targos commented Sep 19, 2024

Do we know that this issue also affects main?

@joyeecheung
Copy link
Member

BTW on the V8 slack @deepak1556 mentioned that they were able to reproduce it in Electron with v8 12.4 and 12.6 but not 12.8 so it's likely fixed.

@deepak1556
Copy link
Contributor

deepak1556 commented Sep 19, 2024

Yup the repro mentioned in this post can be run with ELECTRON_RUN_AS_NODE=1 <electron x64 binary> node_modules/.bin/tsc

Simple bisect of our prebuilt binaries showed, first good version is 128.0.6571.0 and last known bad version is 127.0.6521.0. So I can confirm that it is definitely addressed in V8 12.8 but haven't bisected to which commit addressed it.

Here is the chromium changelist url https://chromium.googlesource.com/chromium/src/+log/127.0.6521.0..128.0.6571.0?n=10000&pretty=fuller and V8 changelist https://chromium.googlesource.com/v8/v8/+log/12.7.189..12.8.185?n=10000&pretty=fuller

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
confirmed-bug Issues with confirmed bugs. linux Issues and PRs related to the Linux platform. regression Issues related to regressions. v8 engine Issues and PRs related to the V8 dependency. v22.x v22.x Issues that can be reproduced on v22.x or PRs targeting the v22.x-staging branch.
Projects
None yet
Development

No branches or pull requests