v8 GC crash on node 4.4.7 #7654

jeroenvollenbrock · 2016-07-11T11:54:59Z

Version: v4.4.7 (nodesource)
Platform: Linux 4.1.13-HomeyOS-0.8.0 deps: update openssl to 1.0.1j #1 SMP Sun Dec 20 00:02:18 UTC 2015 armv7l GNU/Linux
OS: Debian Jessie based linux
Subsystem: V8 Garbage Collection

node sometimes crashes with a V8 Fatal error after running for about 5 hours:

Fatal error in ../deps/v8/src/heap/mark-compact.cc, line 3088

Check failed: large_object->IsHeapObject().

Stack Trace:

#0  0x00a18460 in v8::base::OS::Abort ()
#1  0x00a15938 in V8_Fatal ()
#2  0x005e69b4 in v8::internal::MarkCompactCollector::IsSlotInBlackObject ()
#3  0x005e8324 in v8::internal::SlotsBuffer::RemoveInvalidSlots ()
#4  0x005e84b8 in v8::internal::MarkCompactCollector::ClearInvalidStoreAndSlotsBufferEntries ()
#5  0x005f4abc in v8::internal::MarkCompactCollector::CollectGarbage ()
#6  0x005b19a4 in v8::internal::Heap::MarkCompact ()
#7  0x005c89f0 in v8::internal::Heap::PerformGarbageCollection ()

Registers at #2:

r0             0x0      0
r1             0x0      0
r2             0xad071c 11339548
r3             0xafb7dc 11515868
r4             0x0      0
r5             0x22ff088        36696200
r6             0x2301e4c        36707916
r7             0x10     16
r8             0x3fd    1021
r9             0x7ed1843c       2127660092
r10            0x1fd56f8        33380088
r11            0x7ed18424       2127660068
r12            0xf81ab4 16259764
sp             0x7ed183f8       0x7ed183f8
lr             0x5e69b4 6187444
pc             0x5e69b4 0x5e69b4 <_ZN2v88internal20MarkCompactCollector19IsSlotInBlackObjectEPNS0_4PageEPhPPNS0_10HeapObjectE+604>
cpsr           0x200f0010       537853968

I've made an attempt to debug this myself, it looks like the isHeapObject() check at line 3088 fails due to the value of r4 (0b0), expected r4 = (0b01). r4 contains the AND of r0 and 0b11, and r0 contains the return value of v8::internal::LargeObjectSpace::FindObject(Address a). Since r0 is also 0x0, this method apparently did not find the specified object in the large object space. When i take a look at the callstack, i can see this function was invoked through the ClearInvalidStoreAndSlotsBufferEntries call, which makes me wonder if the object should exist at all in the LargeObjectSpace.

Any help debugging the cause of this issue is appreciated. A coredump and remote debugging tools are available if required.

The text was updated successfully, but these errors were encountered:

bnoordhuis · 2016-07-11T12:36:39Z

Are you using native modules? Memory corruption in add-ons sometimes causes GC crashes.

Something to try is compiling from source (./configure && make -C out BUILDTYPE=Debug) and run with --verify_heap. That hopefully catches the bug closer to the source.

It's possible you are hitting a bug that was fixed in v8/v8@01b8fc8; --verify_heap should catch it.

When i take a look at the callstack, i can see this function was invoked through the ClearInvalidStoreAndSlotsBufferEntries call, which makes me wonder if the object should exist at all in the LargeObjectSpace.

If your question is whether it's legal for LO objects to show up in the store buffer - yes, it is.

jeroenvollenbrock · 2016-07-11T13:09:48Z

Hi Ben, I am indeed using a few native modules. I've been trying to systematically disable these modules in order to pinpoint the causing module, but this has not helped me so far as the crash continued to persist. I'll try running a debug build, but this might take a while since it takes a few hours to reproduce the problem.

If your question is whether it's legal for LO objects to show up in the store buffer - yes, it is.

Actually, i ment something else, the method ClearInvalidStoreAndSlotsBufferEntries implies the method is trying to clear invalid store and slot buffer entries. During this operation, it calls IsInvalidSlot, which calls IsSlotInLiveObject which calls IsSlotInBlackObject. However, IsSlotInBlack object seems to assume the Slot is a valid address by using the CHECK macro on the found object. If we are really talking about invalid slots, then what assures us the large object space (still) has a valid (Heap) object located at this slot?

bnoordhuis · 2016-07-11T15:11:15Z

Sounds plausible. The only scenario I can think of where that could happen is when a LO page is unmapped right before iterating over the store buffer. There is some special handling around LO page freeing in LargeObjectSpace::FreeUnmarkedObjects() and Heap::FreeQueuedChunks() that might be buggy.

If that's the case, when running with --log, I think you should see a delete,MemoryChunk,$addr in the log file right before the crash, where addr corresponds to the page address of the offending object (i.e., the object address rounded down to a multiple of 4096.)

jeroenvollenbrock · 2016-07-14T12:22:46Z

@bnoordhuis: I've been running a debug build for the past three days, but i have not been able to reproduce the crash using these builds yet. I've added another test setup with a release build of nodejs compiled with the v8_enable_verify_heap gyp flag and ran it with the --verify_heap argument, but this build is not crashing either. I've setup other devices to run a debug build and a (normal) release build that are looping through unittests of all individual native modules, but these setups have not crashed once either. In the meanwhile, i've received bugreports mentioning very irregular intervals, so i'm suspecting a race condition is involved homey/#690.

Do you have any other ideas?

ofrobots · 2016-07-15T16:50:18Z

/cc @nodejs/v8

ofrobots · 2016-07-15T16:59:54Z

@jeroenvollenbrock Are you able to run this with newer versions of Node to check if the problem still exists?

Looking at V8 ToT, the expectation seems to be that IsSlotInBlackObject is not supposed to be reached for LO space objects:

bool MarkCompactCollector::IsSlotInBlackObject(MemoryChunk* p, Address slot) {
  Space* owner = p->owner();
  DCHECK(owner != heap_->lo_space() && owner != nullptr);
  USE(owner);

It would be good to know if Node 5.x and 6.x still have the crash. I suspect it might been fixed as part of the store buffer changes that happens earlier this year.

jeroenvollenbrock · 2016-07-15T18:18:09Z

@ofrobots I'll deploy node.js 6.3.0 later today but i'm not completely sure how much time it is going to take to make all dependencies compatible. If i succeed, i'll allocate a couple of testdevices and let them run over the weekend, but I doubt if it will yield any results before next Monday though.

jeroenvollenbrock · 2016-07-26T13:27:02Z

@ofrobots I've been running a stress test using Node 6 for about 8 days, no crashes so far. I did have to update one native module (node-serialport), and started a second test using the newer version of this module on Node 4 which did not crash either. Since I'm getting close to the end of my merge windows, I've updated the module to its newest version, implemented a watchdog and configured it to upload a coredump and restart Node upon a segfault. This will hit our beta channel later this week (~1000 users), which should confirm whether or not this crash is really gone.

hannespayer · 2016-07-27T14:17:01Z

v4.4.7 (nodesource) indeed has the old remembered set implementation. There we also had to check for large objects when clearing slots.

A race between unmapping and clearing should not happen, since clearing happens before sweeping, i.e. before memory gets released. I am wondering if you are flushing out a bug in LargeObjectSpace::FindPage. You could print the whole chunk_map_ map before crashing, maybe that gives us some information.

jeroenvollenbrock · 2016-09-06T14:03:34Z

I've replace the native module node-serialport with my own implementation, and we've now been running on 4.4.7 for approximately 500000 device-hours without re-triggering the crash. I think it's therefore safe to assume the issue was caused by node-serialport.

MylesBorins · 2016-09-06T14:13:36Z

/cc @reconbot

reconbot · 2016-09-06T20:38:30Z

@jeroenvollenbrock I'm the maintainer of node serialport, can you share a bit more about what you found? I'd love to have it fixed asap.

Fishrock123 added v8 engine Issues and PRs related to the V8 dependency. v4.x labels Jul 11, 2016

saper mentioned this issue Aug 25, 2016

Webpack segmentation faults for newer versions of node-sass sass/node-sass#1676

Closed

jeroenvollenbrock closed this as completed Sep 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

v8 GC crash on node 4.4.7 #7654

v8 GC crash on node 4.4.7 #7654

jeroenvollenbrock commented Jul 11, 2016 •

edited by Fishrock123

Loading

bnoordhuis commented Jul 11, 2016

jeroenvollenbrock commented Jul 11, 2016

bnoordhuis commented Jul 11, 2016

jeroenvollenbrock commented Jul 14, 2016 •

edited

Loading

ofrobots commented Jul 15, 2016

ofrobots commented Jul 15, 2016

jeroenvollenbrock commented Jul 15, 2016 •

edited

Loading

jeroenvollenbrock commented Jul 26, 2016

hannespayer commented Jul 27, 2016

jeroenvollenbrock commented Sep 6, 2016

MylesBorins commented Sep 6, 2016

reconbot commented Sep 6, 2016 •

edited

Loading

v8 GC crash on node 4.4.7 #7654

v8 GC crash on node 4.4.7 #7654

Comments

jeroenvollenbrock commented Jul 11, 2016 • edited by Fishrock123 Loading

Fatal error in ../deps/v8/src/heap/mark-compact.cc, line 3088

Check failed: large_object->IsHeapObject().

bnoordhuis commented Jul 11, 2016

jeroenvollenbrock commented Jul 11, 2016

bnoordhuis commented Jul 11, 2016

jeroenvollenbrock commented Jul 14, 2016 • edited Loading

ofrobots commented Jul 15, 2016

ofrobots commented Jul 15, 2016

jeroenvollenbrock commented Jul 15, 2016 • edited Loading

jeroenvollenbrock commented Jul 26, 2016

hannespayer commented Jul 27, 2016

jeroenvollenbrock commented Sep 6, 2016

MylesBorins commented Sep 6, 2016

reconbot commented Sep 6, 2016 • edited Loading

jeroenvollenbrock commented Jul 11, 2016 •

edited by Fishrock123

Loading

jeroenvollenbrock commented Jul 14, 2016 •

edited

Loading

jeroenvollenbrock commented Jul 15, 2016 •

edited

Loading

reconbot commented Sep 6, 2016 •

edited

Loading