-
Notifications
You must be signed in to change notification settings - Fork 30.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
v8 GC crash on node 4.4.7 #7654
Comments
Are you using native modules? Memory corruption in add-ons sometimes causes GC crashes. Something to try is compiling from source ( It's possible you are hitting a bug that was fixed in v8/v8@01b8fc8;
If your question is whether it's legal for LO objects to show up in the store buffer - yes, it is. |
Hi Ben, I am indeed using a few native modules. I've been trying to systematically disable these modules in order to pinpoint the causing module, but this has not helped me so far as the crash continued to persist. I'll try running a debug build, but this might take a while since it takes a few hours to reproduce the problem.
Actually, i ment something else, the method ClearInvalidStoreAndSlotsBufferEntries implies the method is trying to clear invalid store and slot buffer entries. During this operation, it calls IsInvalidSlot, which calls IsSlotInLiveObject which calls IsSlotInBlackObject. However, IsSlotInBlack object seems to assume the Slot is a valid address by using the CHECK macro on the found object. If we are really talking about invalid slots, then what assures us the large object space (still) has a valid (Heap) object located at this slot? |
Sounds plausible. The only scenario I can think of where that could happen is when a LO page is unmapped right before iterating over the store buffer. There is some special handling around LO page freeing in LargeObjectSpace::FreeUnmarkedObjects() and Heap::FreeQueuedChunks() that might be buggy. If that's the case, when running with |
@bnoordhuis: I've been running a debug build for the past three days, but i have not been able to reproduce the crash using these builds yet. I've added another test setup with a release build of nodejs compiled with the v8_enable_verify_heap gyp flag and ran it with the --verify_heap argument, but this build is not crashing either. I've setup other devices to run a debug build and a (normal) release build that are looping through unittests of all individual native modules, but these setups have not crashed once either. In the meanwhile, i've received bugreports mentioning very irregular intervals, so i'm suspecting a race condition is involved homey/#690. Do you have any other ideas? |
/cc @nodejs/v8 |
@jeroenvollenbrock Are you able to run this with newer versions of Node to check if the problem still exists? Looking at V8 ToT, the expectation seems to be that bool MarkCompactCollector::IsSlotInBlackObject(MemoryChunk* p, Address slot) {
Space* owner = p->owner();
DCHECK(owner != heap_->lo_space() && owner != nullptr);
USE(owner); It would be good to know if Node 5.x and 6.x still have the crash. I suspect it might been fixed as part of the store buffer changes that happens earlier this year. |
@ofrobots I'll deploy node.js 6.3.0 later today but i'm not completely sure how much time it is going to take to make all dependencies compatible. If i succeed, i'll allocate a couple of testdevices and let them run over the weekend, but I doubt if it will yield any results before next Monday though. |
@ofrobots I've been running a stress test using Node 6 for about 8 days, no crashes so far. I did have to update one native module (node-serialport), and started a second test using the newer version of this module on Node 4 which did not crash either. Since I'm getting close to the end of my merge windows, I've updated the module to its newest version, implemented a watchdog and configured it to upload a coredump and restart Node upon a segfault. This will hit our beta channel later this week (~1000 users), which should confirm whether or not this crash is really gone. |
v4.4.7 (nodesource) indeed has the old remembered set implementation. There we also had to check for large objects when clearing slots. A race between unmapping and clearing should not happen, since clearing happens before sweeping, i.e. before memory gets released. I am wondering if you are flushing out a bug in LargeObjectSpace::FindPage. You could print the whole chunk_map_ map before crashing, maybe that gives us some information. |
I've replace the native module node-serialport with my own implementation, and we've now been running on 4.4.7 for approximately 500000 device-hours without re-triggering the crash. I think it's therefore safe to assume the issue was caused by node-serialport. |
/cc @reconbot |
@jeroenvollenbrock I'm the maintainer of node serialport, can you share a bit more about what you found? I'd love to have it fixed asap. |
node sometimes crashes with a V8 Fatal error after running for about 5 hours:
Fatal error in ../deps/v8/src/heap/mark-compact.cc, line 3088
Check failed: large_object->IsHeapObject().
Stack Trace:
Registers at #2:
I've made an attempt to debug this myself, it looks like the isHeapObject() check at line 3088 fails due to the value of r4 (0b0), expected r4 = (0b01). r4 contains the AND of r0 and 0b11, and r0 contains the return value of v8::internal::LargeObjectSpace::FindObject(Address a). Since r0 is also 0x0, this method apparently did not find the specified object in the large object space. When i take a look at the callstack, i can see this function was invoked through the ClearInvalidStoreAndSlotsBufferEntries call, which makes me wonder if the object should exist at all in the LargeObjectSpace.
Any help debugging the cause of this issue is appreciated. A coredump and remote debugging tools are available if required.
The text was updated successfully, but these errors were encountered: