Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Calling worker.terminate() while http/2 reads on a stream crashes the whole process with FATAL ERROR / SIGABRT #38418

Closed
Prinzhorn opened this issue Apr 26, 2021 · 13 comments · Fixed by #42874
Labels
worker Issues and PRs related to Worker support.

Comments

@Prinzhorn
Copy link

Prinzhorn commented Apr 26, 2021

  • Version: 14.16.0
  • Platform: Ubuntu 20.04
  • Subsystem: Stream?
/node_modules/electron/dist/electron[56003]: ../../third_party/electron_node/src/stream_base.cc:351:MaybeLocal<v8::Value> node::StreamBase::CallJSOnreadMethod(ssize_t, Local<v8::ArrayBuffer>, size_t, node::StreamBase::StreamBaseJSChecks): Assertion `onread->IsFunction()' failed.
/node_modules/electron/dist/electron exited with signal SIGABRT

I'm sorry for not following the issue template. This just happened locally during development of an Electron 12.0.1 app which comes with Node 14.16.0. As far as I can tell Electron does not patch stream_base.cc so I assume this was a Node thing. Maybe this has already been fixed in a more recent Node release (edit: looks like 14.16.1 was only a security release), but I couldn't find anything.

Unfortunately I haven't been able to reproduce it again. I assume I ran into a race condition. I will try to track this down, but it's hard.

Here's what I do know:

  1. I'm running a worker thread
  2. Inside the worker thread I'm starting fastify and fastify-websocket
  3. The SIGABRT happened exactly the moment I hit enter in the UI, which triggered both an HTTP POST and a new WebSocket to said local server, but I don't know if any of these cause the issue with the stream. But I also don't think this was a coincidence.

I will try to put my race-condition helmet on and get on my Fuzzer and see if I can trigger this again. But maybe someone can point in a direction if this assertion should ever be able to fail due to a user error (e.g. a bug in http or fastify)? I'm not even sure what the onread is in this case but I think forgetting to set it in user land should not take down the whole process, should it?

@jasnell
Copy link
Member

jasnell commented Apr 26, 2021

It's unlikely that Node.js itself is at fault on this. What's more likely is a dependency module somewhere that is bypassing the public supported API and going directly to Node.js' internal API surface. Are you able to provide a reproducable example?

@Prinzhorn
Copy link
Author

Prinzhorn commented Apr 26, 2021

Thanks for the quick reply. I'm still trying to make this reproducible. Is there any way to get more info out of SIGABRT? There are 0 search results on npm and I can't really find anything useful. I'm trying to make this happen again within the Electron app and getting some sort of hint were the cause lies would make stripping it down way easier.

Edit: I'm still not able to trigger it again. One thing I didn't mention above is that if the SIGABRT would not have happened the worker would have exited because of an uncaughtException (I throw inside the uncaughtException handler which then exits the thread). So this could very well be a race condition with cleaning up the thread, but I don't know how that is implemented.

@addaleax
Copy link
Member

Is there any way to get more info out of SIGABRT?

You can look into using a tool like llnode to get a proper stack trace in cases like this.

@Prinzhorn
Copy link
Author

I was not able to trigger this again. Unfortunately when it happened I did not gather enough information. For the future I'm now launching Electron in dev with ulimit -c unlimited so let's hope it will happen again at some point. If I understand correctly I need this to get a core dump I can use with llnode.

@Prinzhorn
Copy link
Author

Prinzhorn commented Apr 28, 2021

So it happened again yesterday. This time during regular closing of the Electron app. Regular closing in my case means doing some clean up and then app.quit() which will terminate my worker as well (I don't fully clean up the worker, because things like fastify.close() would take way too long. I only do critical things like closing SQLite.). So the problem seems to be with terminating/cleaning up a worker in Electron.

Thanks @addaleax for pointing to llnode. I was able to get something out of the core dump. I also just came across electron/electron#28468 and related issues (electron/electron#28486). Given that this is happening pre-patch (Electron 12.0.1), could you (with some confidence) say that it's the same problem looking at the following? Does this even have any value?

(llnode) v8 bt
 * thread #1: tid = 23759, 0x00007f82400d118b, name = 'electron', stop reason = signal SIGABRT
  * frame #0: 0x00007f82400d118b
    frame #1: 0x00005637a9598b8b node`node::blob_data + 81470
    frame #2: 0x00005637a964f7c2 node`ucase_props_trieIndex + 22645
    frame #3: 0x00005637a95c169e node`huff_decode_table + 9041
    frame #4: 0x00005637a95bc999 node
    frame #5: 0x00005637a96b9db4 node`icudt67_dat + 169559
    frame #6: 0x00005637a95bf9bf node`huff_decode_table + 1650
    frame #7: 0x00005637a95c2b9c node`huff_decode_table + 14415
    frame #8: 0x00005637a9656a1f node`propsVectors + 26546
    frame #9: 0x00005637a7b5ffef node`node::AsyncWrap::Initialize(v8::Local<v8::Object>, v8::Local<v8::Value>, v8::Local<v8::Context>, void*) + 8546
    frame #10: 0x00005637a7b652ec node`node::FullPromiseHook(v8::PromiseHookType, v8::Local<v8::Promise>, v8::Local<v8::Value>) + 799
    frame #11: 0x00005637a7b56fae node`std::_Function_handler<v8::MaybeLocal<v8::Value> (node::StartExecutionCallbackInfo const&), node::LoadEnvironment(node::Environment*, char const*, std::unique_ptr<node::InspectorParentHandle, std::default_delete<node::InspectorParentHandle> >)::'lambda'(node::StartExecutionCallbackInfo const&)>::_M_invoke(std::_Any_data const&, node::StartExecutionCallbackInfo const&) + 145
    frame #12: 0x00005637a9639563 node`norm2_nfc_data_trieData + 2198
    frame #13: 0x00005637a963c811 node`norm2_nfc_data_trieData + 15172

I now updated all dependencies (not sure if the fix even landed until 12.0.5 since there is no release note and I couldn't find the commit). So hopefully this won't happen again. Debugging SIGABRT is no fun 😅

@Prinzhorn
Copy link
Author

@jasnell @addaleax After many hours I'm now able to consistently reproduce this with vanilla Node (v14.16.1 and v16.0.0) with http/2 and no dependencies. There is a race condition when closing worker threads while http/2 is reading on a stream.

At some point I realized that gRPC was causing the SIGABRT if the Electron app was closing just when "STREAM: do read" was happening (I was able to reproduce it way more often when I cranked up the number gRPC calls).

Then I was able to strip away Electron and reproduce it just by worker.terminate(). This was giving the same SIGABRT I was getting in my Electron app (still with @grpc/grpc-js):

node[11971]: ../src/stream_base.cc:351:v8::MaybeLocal<v8::Value> node::StreamBase::CallJSOnreadMethod(ssize_t, v8::Local<v8::ArrayBuffer>, size_t, node::StreamBase::StreamBaseJSChecks): Assertion `onread->IsFunction()' failed.
 1: 0xa04200 node::Abort() [node]
 2: 0xa0427e  [node]
 3: 0xacfd1a  [node]
 4: 0xa2591e node::http2::Http2StreamListener::OnStreamRead(long, uv_buf_t const&) [node]
 5: 0xa3177c node::http2::Http2Session::OnDataChunkReceived(nghttp2_session*, unsigned char, int, unsigned char const*, unsigned long, void*) [node]
 6: 0x1871745 nghttp2_session_mem_recv [node]
 7: 0xa30c19 node::http2::Http2Session::ConsumeHTTP2Data() [node]
 8: 0xa30f6e node::http2::Http2Session::OnStreamRead(long, uv_buf_t const&) [node]
 9: 0xadc2b8 node::LibuvStreamWrap::OnUvRead(long, uv_buf_t const*) [node]
10: 0x1387237  [node]
11: 0x1387bf0  [node]
12: 0x138e5e5  [node]
13: 0x137c438 uv_run [node]
14: 0xabc4a3 node::worker::Worker::Run() [node]
15: 0xabcf28  [node]
16: 0x7f1cb1dad609  [/lib/x86_64-linux-gnu/libpthread.so.0]
17: 0x7f1cb1cd4293 clone [/lib/x86_64-linux-gnu/libc.so.6]

Then I went ahead and used the vanilla http/2 server and client and now I'm getting a slightly different crash but still:

STREAM 17984: ondata
STREAM 17984: emitReadable null
STREAM 17984: dest.write true
STREAM 17984: readableAddChunk HTTP2 17984: Http2Stream 7869 [Http2Session server]: initiating response

STREAM 17984: ondata
HTTP2 17984: Http2Stream 7869 [Http2Session server]: initiating response
STREAM 17984: dest.write true
STREAM 17984: readableAddChunk STREAM 17984: emitReadable_ false 0 true

STREAM 17984: ondata
STREAM 17984: emitReadable_ false 0 true
STREAM 17984: dest.write true
STREAM 17984: readableAddChunk STREAM 17984: flow null

STREAM 17984: ondata
STREAM 17984: flow null
STREAM 17984: dest.write true
STREAM 17984: maybeReadMore read 0
STREAM 17984: read 0
STREAM 17984: need readable true
STREAM 17984: length less than watermark true
STREAM 17984: do read
TIMER 17984: process timer lists 3802
TIMER 17984: timeout callback 2000
WORKER 17984: [0] terminates Worker with ID 1
FATAL ERROR: v8::FromJust Maybe value is Nothing.
TIMER 17984: 2000 list empty
 1: 0xa04200 node::Abort() [node]
 2: 0x94e4e9 node::FatalError(char const*, char const*) [node]
 3: 0xb794aa v8::Utils::ReportApiFailure(char const*, char const*) [node]
 4: 0xad0808 node::ReportWritesToJSStreamListener::OnStreamAfterReqFinished(node::StreamReq*, int) [node]
 5: 0xad0b4f node::WriteWrap::OnDone(int) [node]
 6: 0xa2f970  [node]
 7: 0xa31376 node::http2::Http2Session::OnStreamAfterWrite(node::WriteWrap*, int) [node]
 8: 0xad0b4f node::WriteWrap::OnDone(int) [node]
 9: 0xb4b456 node::TLSWrap::InvokeQueued(int, char const*) [node]
10: 0xb4c7c3 node::TLSWrap::EncOut() [node]
11: 0xb4da40  [node]
12: 0x9a3ff4 node::Environment::RunAndClearNativeImmediates(bool) [node]
13: 0x9a4687 node::Environment::CheckImmediate(uv_check_s*) [node]
14: 0x1383aa9  [node]
15: 0x137c448 uv_run [node]
16: 0xabc4a3 node::worker::Worker::Run() [node]
17: 0xabcf28  [node]
18: 0x7f2a8b9a5609  [/lib/x86_64-linux-gnu/libpthread.so.0]
19: 0x7f2a8b8cc293 clone [/lib/x86_64-linux-gnu/libc.so.6]
Aborted (core dumped)

This can be reproduced with a very simple repo here, it does not require any dependencies and uses the http/2 client and server example code right from the docs: https://github.com/Prinzhorn/electron-sigabrt

In this video was lucky and I got it immediately, it usually reproduces within a handful of tries:

vokoscreen-2021-05-03_08-39-08.mp4

@Prinzhorn Prinzhorn reopened this May 3, 2021
@Prinzhorn Prinzhorn changed the title stream_base.cc:351:MaybeLocal<v8::Value> node::StreamBase::CallJSOnreadMethod(ssize_t, Local<v8::ArrayBuffer>, size_t, node::StreamBase::StreamBaseJSChecks): Assertion `onread->IsFunction()' failed Calling worker.terminate() while http/2 reads on a stream crashes the whole process with FATAL ERROR / SIGABRT May 3, 2021
@addaleax
Copy link
Member

addaleax commented May 3, 2021

@Prinzhorn I don’t think that would be quite the same bug, but still, thanks for reporting. I think just changing the .FromJust() in the mentioned function to .FromMaybe(false) would work here.

@Prinzhorn
Copy link
Author

So you're saying the first crash (with gRPC) is something else that just happens to be triggered by the same conditions? To me that sounds like a Node issue as well, but since it requires @grpc/grpc-js should I report it there? Or would you look into it if you can reproduce that as well?

If you clone my repo you can go back to the gRPC version, it just makes a simple empty rpc call:

git checkout 45caa541d9d16bbeb1b7a9e31ebbd59e61381b7d
npm i

# In two terminals
npm run server
npm run client
vokoscreen-2021-05-03_10-15-42.mp4

@addaleax
Copy link
Member

addaleax commented May 3, 2021

@Prinzhorn It might be related in some way, but it’s definitely a different bug that would require a different solution. (I don’t think I have spare time to look into this in more detail, sorry.)

@Prinzhorn
Copy link
Author

@addaleax thanks, I'll look into this some more eventually and will open an issue with @grpc/grpc-js then

@novemberborn
Copy link

The FATAL ERROR does seem related to worker threads, not @grpc/grpc-js, or even HTTP2. It's failing AVA's test suite, see https://github.com/avajs/ava/runs/2757417407.

I can reproduce this by selecting a test that terminates a worker thread:

  • Clone avajs/ava@c57067b

  • Install dependencies

  • Repeat until failure:

    npx tap --grep='/fail-fast.+workerThreads: true.+timeout/' test-tap/api.js
Logs
test-tap/api.js 2> FATAL ERROR: v8::ToLocalChecked Empty MaybeLocal.
test-tap/api.js 2>  1: 0x1037bbee5 node::Abort() (.cold.1) [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2>  2: 0x102471dc9 node::Abort() [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2>  3: 0x102471f2f node::OnFatalError(char const*, char const*) [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2>  4: 0x1025f0090 v8::V8::ToLocalEmpty() [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2>  5: 0x10246cc4f node::fs::FSReqPromise<node::AliasedBufferBase<double, v8::Float64Array, void> >::Resolve(v8::Local<v8::Value>) [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2>  6: 0x102476f6f node::fs::AfterInteger(uv_fs_s*) [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2>  7: 0x102e68e98 uv__work_done [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2>  8: 0x102e6df9b uv__async_io [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2>  9: 0x102e81e4c uv__io_poll [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2> 10: 0x102e6e521 uv_run [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2> 11: 0x1023a3edf node::SpinEventLoop(node::Environment*) [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2> 12: 0x102516b31 node::worker::Worker::Run() [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2> 13: 0x10251a6b2 node::worker::Worker::StartThread(v8::FunctionCallbackInfo<v8::Value> const&)::$_3::__invoke(void*) [/usr/local/var/nodenv/versions/16.0.0/bin/node]
test-tap/api.js 2> 14: 0x7fff205ca954 _pthread_start [/usr/lib/system/libsystem_pthread.dylib]
test-tap/api.js 2> 15: 0x7fff205c64a7 thread_start [/usr/lib/system/libsystem_pthread.dylib]
 FAIL  test-tap/api.js 5 799.629ms
  command: /usr/local/var/nodenv/versions/16.0.0/bin/node
  args:
    - test-tap/api.js
  exitCode: null
  signal: SIGABRT

This was with 16.0.0 locally, but the CI failure was with 16.3.0.

It feels as if not all work is interrupted and a value is accessed unexpectedly.

@novemberborn
Copy link

One more observation, the PR this started failing in switched the worker thread to use ESM. This PR passed, and is using CJS.

@novemberborn
Copy link

Waiting to terminate the worker thread until we've received at least one message from it seems to help, see avajs/ava#2778.

@targos targos added the worker Issues and PRs related to Worker support. label Aug 9, 2021
airtable-keyhanvakil added a commit to airtable-keyhanvakil/node that referenced this issue Apr 26, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: nodejs#38418
airtable-keyhanvakil added a commit to airtable-keyhanvakil/node that referenced this issue Apr 26, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: nodejs#38418
nodejs-github-bot pushed a commit that referenced this issue May 6, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: #38418

PR-URL: #42874
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
RafaelGSS pushed a commit that referenced this issue May 10, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: #38418

PR-URL: #42874
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
juanarbol pushed a commit that referenced this issue May 31, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: #38418

PR-URL: #42874
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
danielleadams pushed a commit that referenced this issue Jun 27, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: #38418

PR-URL: #42874
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
targos pushed a commit that referenced this issue Jul 12, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: #38418

PR-URL: #42874
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
targos pushed a commit that referenced this issue Jul 31, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: #38418

PR-URL: #42874
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
guangwong pushed a commit to noslate-project/node that referenced this issue Oct 10, 2022
`OnStreamAfterReqFinished` uses `v8::Object::Has` to check if it needs
to call `oncomplete`. `v8::Object::Has` needs to execute Javascript.
However when worker threads are involved, `OnStreamAfterReqFinished` may
be called after the worker thread termination has begun via
`worker.terminate()`. This makes `v8::Object::Has` return `Nothing`,
which triggers an assert.

This diff fixes the issue by simply defaulting us to `false` in the case
where `Nothing` is returned. This is sound because we can't execute
`oncomplete` anyway as the isolate is terminating.

Fixes: nodejs/node#38418

PR-URL: nodejs/node#42874
Reviewed-By: Darshan Sen <raisinten@gmail.com>
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Antoine du Hamel <duhamelantoine1995@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
worker Issues and PRs related to Worker support.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants