Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Signal SIGSEGV in v8::internal::GlobalHandles::Create(v8::internal::Object*) () #393

Closed
legraphista opened this issue Nov 9, 2018 · 19 comments

Comments

@legraphista
Copy link

legraphista commented Nov 9, 2018

Hi!

I've noticed some crashes sporadically occurring in v8 when calling the constructor of a class from the OnOK handler of AsyncWorkers. The crashes only seem to affect node 10.x (tested on 10.5, 10.13, 9.11.2). I'm running node-addon-api 1.6.0

Stack trace: (gdb)

Thread 1 "node" received signal SIGSEGV, Segmentation fault.
0x0000000000e92f8b in v8::internal::GlobalHandles::Create(v8::internal::Object*) ()
(gdb) bt
#0  0x0000000000e92f8b in v8::internal::GlobalHandles::Create(v8::internal::Object*) ()
#1  0x0000000000ad8138 in v8::V8::GlobalizeReference(v8::internal::Isolate*, v8::internal::Object**) ()
#2  0x00000000008e62fd in (anonymous namespace)::v8impl::Reference::New(napi_env__*, v8::Local<v8::Value>, unsigned int, bool, void (*)(napi_env__*, void*, void*), void*, void*) ()
#3  0x00000000008ee37f in napi_wrap ()
#4  0x00007fffdfdf2d7e in Napi::ObjectWrap<DarknetImage>::ObjectWrap (this=0x2568890, callbackInfo=...) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:2824
#5  0x00007fffdfdf021a in DarknetImage::DarknetImage (this=0x2568890, info=...) at ../src/DarknetImage.cc:37
#6  0x00007fffdfdf4954 in Napi::ObjectWrap<DarknetImage>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}::operator()() const (__closure=0x7fffffff96b0)
    at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:3221
#7  0x00007fffdfdf51e7 in Napi::details::WrapCallback<Napi::ObjectWrap<DarknetImage>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}>(Napi::ObjectWrap<DarknetImage>::ConstructorCallbackWrapper(napi_env__*, napi_callback_info__*)::{lambda()#1}) (callback=...) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:104
#8  0x00007fffdfdf4a85 in Napi::ObjectWrap<DarknetImage>::ConstructorCallbackWrapper (env=0x25f93a0, info=0x7fffffff9730) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:3219
#9  0x00000000008e6905 in (anonymous namespace)::v8impl::FunctionCallbackWrapper::Invoke(v8::FunctionCallbackInfo<v8::Value> const&) ()
#10 0x0000000000b5e71b in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<true>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#11 0x0000000000b60a7d in v8::internal::Builtins::InvokeApiFunction(v8::internal::Isolate*, bool, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*, v8::internal::Handle<v8::internal::HeapObject>) ()
#12 0x0000000000e702b1 in v8::internal::Execution::New(v8::internal::Isolate*, v8::internal::Handle<v8::internal::Object>, v8::internal::Handle<v8::internal::Object>, int, v8::internal::Handle<v8::internal::Object>*) ()
#13 0x0000000000afc9e6 in v8::Function::NewInstanceWithSideEffectType(v8::Local<v8::Context>, int, v8::Local<v8::Value>*, v8::SideEffectType) const ()
#14 0x0000000000afcd1c in v8::Function::NewInstance(v8::Local<v8::Context>, int, v8::Local<v8::Value>*) const ()
#15 0x00000000008ef175 in napi_new_instance ()
#16 0x00007fffdfde360c in Napi::Function::New (this=0x7fffffff9dc0, argc=4, args=0x7fffffff9ed0) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:1747
#17 0x00007fffdfde35b9 in Napi::Function::New (this=0x7fffffff9dc0, args=...) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:1737
#18 0x00007fffdfde4258 in Napi::FunctionReference::New (this=0x7fffdffff320 <DarknetImage::constructor>, args=...) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:2521
#19 0x00007fffdfdf1eab in DarknetImageWorkers::RGB2DarknetImage::OnOK (this=0x1c8519c0) at ../src/DarknetImage.h:108
#20 0x00007fffdfde532d in Napi::AsyncWorker::OnWorkComplete(napi_env__*, napi_status, void*)::{lambda()#1}::operator()() const (__closure=0x7fffffff9fa8)
    at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:3622
#21 0x00007fffdfde6391 in Napi::details::WrapCallback<Napi::AsyncWorker::OnWorkComplete(napi_env__*, napi_status, void*)::{lambda()#1}>(Napi::AsyncWorker::OnWorkComplete(napi_env__*, napi_status, void*)::{lambda()#1}) (
    callback=...) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:104
#22 0x00007fffdfde5435 in Napi::AsyncWorker::OnWorkComplete (status=napi_ok, this_pointer=0x1c8519c0) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:3620
#23 0x00000000008e6e4c in (anonymous namespace)::uvimpl::Work::AfterThreadPoolWork(int) ()
#24 0x0000000000a42fb5 in uv__work_done (handle=0x24a3f50 <default_loop_struct+176>) at ../deps/uv/src/threadpool.c:313
#25 0x0000000000a4732f in uv__async_io (loop=0x24a3ea0 <default_loop_struct>, w=<optimized out>, events=<optimized out>) at ../deps/uv/src/unix/async.c:118
#26 0x0000000000a58018 in uv__io_poll (loop=loop@entry=0x24a3ea0 <default_loop_struct>, timeout=-1) at ../deps/uv/src/unix/linux-core.c:375
#27 0x0000000000a47c6b in uv_run (loop=0x24a3ea0 <default_loop_struct>, mode=UV_RUN_DEFAULT) at ../deps/uv/src/unix/core.c:370
#28 0x00000000008e5255 in node::Start(v8::Isolate*, node::IsolateData*, std::vector<std::string, std::allocator<std::string> > const&, std::vector<std::string, std::allocator<std::string> > const&) ()
#29 0x00000000008e34a2 in node::Start(int, char**) ()
#30 0x00007ffff6a96b97 in __libc_start_main (main=0x89dc10 <main>, argc=2, argv=0x7fffffffe328, init=<optimized out>, fini=<optimized out>, rtld_fini=<optimized out>, stack_end=0x7fffffffe318) at ../csu/libc-start.c:310
#31 0x000000000089dd45 in _start ()

(gdb) frame 4
#4  0x00007fffdfdf2d7e in Napi::ObjectWrap<DarknetImage>::ObjectWrap (this=0x2568890, callbackInfo=...) at /home/ubuntu/darknet-binding/node_modules/node-addon-api/napi-inl.h:2824
2824	  status = napi_wrap(env, wrapper, instance, FinalizeCallback, nullptr, &ref);

(gdb) info local
env = 0x25f93a0
wrapper = 0x7fffffff9980
status = napi_ok
ref = 0x25f93a0
instance = 0x2568890
instanceRef = 0x25f93a0
env = <optimized out>
wrapper = <optimized out>
status = <optimized out>
ref = <optimized out>
instance = <optimized out>
instanceRef = <optimized out>

(gdb) info args
this = 0x2568890
callbackInfo = @0x7fffffff95b0: {_staticArgCount = 6, _env = 0x25f93a0, _info = 0x7fffffff9730, _this = 0x7fffffff9980, _argc = 4, _argv = 0x7fffffff95e0, _staticArgs = {0x7fffffff9978, 0x7fffffff9970, 0x7fffffff9968,
    0x7fffffff9960, 0x24dd768, 0x24dd768}, _dynamicArgs = 0x0, _data = 0x0}
(gdb)


I've put together a list to the best of my knowledge resembling the code path from the stack trace:

frame 4: https://github.com/nodejs/node-addon-api/blob/master/napi-inl.h#L2824

frame 5: https://github.com/legraphista/darknet-binding/blob/089917035a5b188197a3f71b6f7bc2a87fa3604b/src/DarknetImage.cc#L37

frame 19: https://github.com/legraphista/darknet-binding/blob/089917035a5b188197a3f71b6f7bc2a87fa3604b/src/DarknetImage.h#L102

Has this happened to anyone else, or am I doing funky stuff i shouldn't be doing?

Thanks

@addaleax
Copy link
Member

@legraphista This is something that shouldn’t be happening, no. Could you provide a way to reproduce this?

@legraphista
Copy link
Author

Will do! I'll throw together a demo project that illustrates the issue.

In the meantime, the same scenario also occasionally throws this stack trace: (gdb)

#0  0x000000000253fc50 in ?? ()
#1  0x00000000008a653e in (anonymous namespace)::v8impl::Reference::FinalizeCallback(v8::WeakCallbackInfo<(anonymous namespace)::v8impl::Reference> const&) ()
#2  0x0000000000e42a23 in v8::internal::GlobalHandles::DispatchPendingPhantomCallbacks(bool) ()
#3  0x0000000000e42c4a in v8::internal::GlobalHandles::PostGarbageCollectionProcessing(v8::internal::GarbageCollector, v8::GCCallbackFlags) ()
#4  0x0000000000e80d7b in v8::internal::Heap::PerformGarbageCollection(v8::internal::GarbageCollector, v8::GCCallbackFlags) ()
#5  0x0000000000e81c74 in v8::internal::Heap::CollectGarbage(v8::internal::AllocationSpace, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#6  0x0000000000e821fc in v8::internal::Heap::CollectAllGarbage(int, v8::internal::GarbageCollectionReason, v8::GCCallbackFlags) ()
#7  0x0000000000b14459 in v8::internal::MaybeHandle<v8::internal::Object> v8::internal::(anonymous namespace)::HandleApiCallHelper<false>(v8::internal::Isolate*, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::HeapObject>, v8::internal::Handle<v8::internal::FunctionTemplateInfo>, v8::internal::Handle<v8::internal::Object>, v8::internal::BuiltinArguments) ()
#8  0x0000000000b14fc9 in v8::internal::Builtin_HandleApiCall(int, v8::internal::Object**, v8::internal::Isolate*) ()
#9  0x0000325432e841bd in ?? ()
#10 0x000039348a240e61 in ?? ()
#11 0x0000325432e84121 in ?? ()
#12 0x00007fffffffd060 in ?? ()
#13 0x0000000000000006 in ?? ()
#14 0x00007fffffffd0f8 in ?? ()
#15 0x0000325432e93a09 in ?? ()
#16 0x00002f8f84a822e1 in ?? ()
#17 0x00002e6b4b682a39 in ?? ()
#18 0x0000000500000000 in ?? ()
#19 0x00002f8f84a82321 in ?? ()
#20 0x00002e6b4b682201 in ?? ()
#21 0x00002f8f84a822e1 in ?? ()
#22 0x00002f8f84a822e1 in ?? ()
#23 0x00002e6b4b682a39 in ?? ()
#24 0x000025289dd8e2f9 in ?? ()
#25 0x0000004300000000 in ?? ()
#26 0x00000e1d1f5d9ef9 in ?? ()
#27 0x000025289dd8e2f9 in ?? ()
#28 0x00000a5438a85409 in ?? ()
#29 0x00007fffffffd160 in ?? ()
#30 0x0000325432e93a09 in ?? ()
#31 0x00003130dac82201 in ?? ()
#32 0x00002f8f84a822e1 in ?? ()
#33 0x00002f8f84a822e1 in ?? ()
#34 0x00002f8f84a822e1 in ?? ()
#35 0x00002f8f84a822e1 in ?? ()
#36 0x000025289dd8e2f9 in ?? ()
#37 0x00002f8f84a822e1 in ?? ()
#38 0x0000008600000000 in ?? ()
#39 0x00000fbaa1602e51 in ?? ()
#40 0x00002e6b4b693ae1 in ?? ()
#41 0x00000a5438aa2511 in ?? ()
#42 0x00007fffffffd1d8 in ?? ()
#43 0x0000325432e93a09 in ?? ()
#44 0x00002f8f84a822e1 in ?? ()
#45 0x00003130dac82201 in ?? ()
#46 0x00002f8f84a822e1 in ?? ()
#47 0x00002e6b4b693ae1 in ?? ()
#48 0x00000a5438aa2511 in ?? ()
#49 0x00002f8f84a822e1 in ?? ()
#50 0x00000e1d1f5b3fa9 in ?? ()
#51 0x00002f8f84a82381 in ?? ()
#52 0x00003130dac82b71 in ?? ()
#53 0x0000007800000000 in ?? ()
#54 0x00000fbaa1602b21 in ?? ()
#55 0x00002e6b4b693a61 in ?? ()
#56 0x00000a5438aa2511 in ?? ()
#57 0x00007fffffffd220 in ?? ()
#58 0x0000325432e8c5a3 in ?? ()
#59 0x00002f8f84a822e1 in ?? ()
#60 0x00003130dac82201 in ?? ()
#61 0x00002f8f84a822e1 in ?? ()
#62 0x0000000000000000 in ?? ()


I have a hunch it might be from the v8's move of GC to a separate thread.

@addaleax
Copy link
Member

@legraphista It’s hard to tell from the stack traces – this could be a bug in N-API, in V8 or in your code…

My best guess would be that this is some use-after-free bug for Persistent handles – Is there any chance you could run your code under valgrind or similar? That might give better information about where the source of the bug is, as opposed to the place where it shows up…

@legraphista
Copy link
Author

legraphista commented Nov 16, 2018

Hi, as promised, i'm back with an example. In the example i also detail sort of a solution/workaround where i don't move data by storing into Float32Arrays but by passing External pointers.

I'm sceptic that it's from a use-after-free bug since i'm not using it after free, and free-ing is handled by the deconstructor (and guarded from a double free)
https://gist.github.com/legraphista/f468aa73ba57eb8aab66466bda50a50c

@addaleax
Copy link
Member

The valgrind output is pretty clear about this being an use-after-free situation – not necessarily in your code, though.

It sounds like the issue is something like this: After a GC run, one persistent handle finalizer callback (the one for the ObjectWrap<DarknetImage>) leads to the DarknetImage destructor being called, which in turn leads to the _original_data field being released from memory; and the JS object referred to by _original_data is collected in the same GC run, and its finalizer callback is still pending. When the finalizer callback for _original_data wants to execute, that doesn’t work, because _original_data itself has already been destroyed.

I am not sure what to do about this; it seems like an issue that can occur in very generic situations with v8::Persistents… and I kind of wonder why we aren’t facing this kind of issue in Node.js core.

@legraphista
Copy link
Author

A workaround that I've found is to call GC from javascript after each iteration (or a couple of), like here. I've found it to be stable (at least in the limited testing i did).

@mhdawson
Copy link
Member

Having looked at the issue I think the top commit in this branch might resolve the issue but I've not looked at testing on your code yet:

https://github.com/mhdawson/io.js/tree/finalizer-order2

The main change is that if a request to delete a reference is made before the finalizer has run for the associated object it defers the delete until the finalizer runs. I think this makes sense for the case where we had a workaround in place for when a finalizer callback called delete on a reference and I'm hoping it also resolves the issue you were seeing.

@mhdawson
Copy link
Member

@legraphista could you try out that change and see if it resolves the issue for you?

@mhdawson
Copy link
Member

Just noticed I missed pushing the commit to the branch doing that now

@mhdawson
Copy link
Member

Branch updated.

@mhdawson
Copy link
Member

mhdawson commented Nov 16, 2018

In respect, to I have a hunch it might be from the v8's move of GC to a separate thread. You could be right if those changes affected when an object was identified as being no longer referenced and that changed the timing of when the finalizer was enqueued to be run.

@addaleax
Copy link
Member

@mhdawson That looks like it could this issue, yes. 👍

@legraphista
Copy link
Author

legraphista commented Nov 18, 2018

After some testing, I've come back with results:

Linux 4.15.0-36-generic (Ubuntu 16) MacOS 10.13.6 Notes
v9.11.2 survived 10k iter. survived 10k iter. -
v10.13.0 crashes between 200-350 iter. survived 10k iter. GC seams to be lazy.*
v11.2.0 crashes between 130-200 iter. survived 10k iter. -
v12.0.0-pre custom build ** survived 10k iter. survived 10k iter. -
v12.0.0-pre b7e9804c90 *** crashes between 130-200 iter. survived 10k iter. -
  • * GC prefers high memory usage and occupying the entire available RAM until process crashes from allocation errors.
  • ** Custom build based on mhdawson's branch from https://github.com/mhdawson/io.js/tree/finalizer-order2
  • *** Custom build based on node's master from https://github.com/mhdawson/io.js/tree/b7e9804c90ec1b834e88279ce06725c9dd9156a8 (the commit before the fix)

Each configuration was tasted over multiple runs.

Testing was done on:

  • MacBook Pro Mid-2014 Intel I7-4980HQ
  • OVH g3-30 / Intel Xeon CPU E5-2640 v4 @ 2.40GHz

@legraphista
Copy link
Author

legraphista commented Nov 18, 2018

I've thrown in macOS since in my original testing I haven't included it. To my surprise, with the same scenario as on the Linux environment, I cannot reproduce the crash. The stress test finished multiple times without a hitch.

For both environments, versions 9, 10, and 11 were downloaded & installed with nvm. The v12 branch was compiled with llvm 9.0.0 (clang-900.0.39.2) on macOS and gcc version 7.3.0 on linux.

If deemed necessary, I could compile v10 and v11 locally on the linux box, and see if the issue persists.

@mhdawson
Copy link
Member

@legraphista thanks, one more piece of data that would be great is v12.0.0-pre WITHOUT my change. If we see it fail there, then we'll be sure that it was my change that fixed it.

@legraphista
Copy link
Author

legraphista commented Nov 19, 2018

Ah yes, of course, my bad.
Here are the results. I've also updated the table in the original comment.

Linux 4.15.0-36-generic (Ubuntu 16) MacOS 10.13.6 Notes
v12.0.0-pre b7e9804c90 *** crashes between 130-200 iter. survived 10k iter. -
  • *** Custom build based on node's master from https://github.com/mhdawson/io.js/tree/b7e9804c90ec1b834e88279ce06725c9dd9156a8 (the commit before the fix)

@mhdawson
Copy link
Member

@legraphista thanks. Will submit PR and ask V8 team members to comment as well.

legraphista added a commit to legraphista/darknet-bindings that referenced this issue Nov 19, 2018
mhdawson added a commit to mhdawson/io.js that referenced this issue Nov 19, 2018
Crashes were reported during finalization due to
the memory for a reference being deleted and the
finalizer running after the deletion.

This change ensures the deletion of the memory for
the reference only occurs after the finalizer has run.

Fixes: nodejs/node-addon-api#393
@gabrielschulhof
Copy link
Contributor

This may be similar to nodejs/node#23999.

@mhdawson
Copy link
Member

PR to fix: nodejs/node#24494

targos pushed a commit to nodejs/node that referenced this issue Nov 24, 2018
Crashes were reported during finalization due to
the memory for a reference being deleted and the
finalizer running after the deletion.

This change ensures the deletion of the memory for
the reference only occurs after the finalizer has run.

Fixes: nodejs/node-addon-api#393

PR-URL: #24494
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
rvagg pushed a commit to nodejs/node that referenced this issue Nov 28, 2018
Crashes were reported during finalization due to
the memory for a reference being deleted and the
finalizer running after the deletion.

This change ensures the deletion of the memory for
the reference only occurs after the finalizer has run.

Fixes: nodejs/node-addon-api#393

PR-URL: #24494
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
refack pushed a commit to refack/node that referenced this issue Jan 14, 2019
Crashes were reported during finalization due to
the memory for a reference being deleted and the
finalizer running after the deletion.

This change ensures the deletion of the memory for
the reference only occurs after the finalizer has run.

Fixes: nodejs/node-addon-api#393

PR-URL: nodejs#24494
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
mhdawson added a commit to mhdawson/io.js that referenced this issue Jan 18, 2019
Crashes were reported during finalization due to
the memory for a reference being deleted and the
finalizer running after the deletion.

This change ensures the deletion of the memory for
the reference only occurs after the finalizer has run.

Fixes: nodejs/node-addon-api#393

Backport-PR-URL: nodejs#25572
PR-URL: nodejs#24494
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
mhdawson added a commit to mhdawson/io.js that referenced this issue Jan 30, 2019
Crashes were reported during finalization due to
the memory for a reference being deleted and the
finalizer running after the deletion.

This change ensures the deletion of the memory for
the reference only occurs after the finalizer has run.

Fixes: nodejs/node-addon-api#393

Backport-PR-URL: nodejs#25574
PR-URL: nodejs#24494
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
BethGriggs pushed a commit to nodejs/node that referenced this issue Feb 5, 2019
Crashes were reported during finalization due to
the memory for a reference being deleted and the
finalizer running after the deletion.

This change ensures the deletion of the memory for
the reference only occurs after the finalizer has run.

Fixes: nodejs/node-addon-api#393

Backport-PR-URL: #25574
PR-URL: #24494
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
rvagg pushed a commit to nodejs/node that referenced this issue Feb 28, 2019
Crashes were reported during finalization due to
the memory for a reference being deleted and the
finalizer running after the deletion.

This change ensures the deletion of the memory for
the reference only occurs after the finalizer has run.

Fixes: nodejs/node-addon-api#393

Backport-PR-URL: #25574
PR-URL: #24494
Reviewed-By: Anna Henningsen <anna@addaleax.net>
Reviewed-By: Franziska Hinkelmann <franziska.hinkelmann@gmail.com>
Reviewed-By: Refael Ackermann <refack@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants