Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

npm --version segfaults on v7.7.3 linux ppc64 (BE) only ON RHEL 7 #11882

Closed
mattcolegate opened this issue Mar 16, 2017 · 29 comments
Closed

npm --version segfaults on v7.7.3 linux ppc64 (BE) only ON RHEL 7 #11882

mattcolegate opened this issue Mar 16, 2017 · 29 comments
Labels
ppc Issues and PRs related to the Power architecture. v8 engine Issues and PRs related to the V8 dependency.

Comments

@mattcolegate
Copy link

mattcolegate commented Mar 16, 2017

  • Version: 7.7.3
  • Platform: linux ppc64 (BE)
  • Subsystem: node

Running node --version works on this platform and version, but running npm --version causes a segmentation fault and core dump.

cc/ @gibfahn who helped with initial diagnosis

@gibfahn
Copy link
Member

gibfahn commented Mar 16, 2017

To add some more info, the version downloaded was https://nodejs.org/dist/latest-v7.x/node-v7.7.3-linux-ppc64.tar.gz.

Running node -e 'console.log("hi")' also segfaults.

@gibfahn gibfahn added the ppc Issues and PRs related to the Power architecture. label Mar 16, 2017
@gibfahn gibfahn changed the title npm --version segfaults on v7.7.3 linux PPC64 only npm --version segfaults on v7.7.3 linux ppc64 (BE) only Mar 16, 2017
@bnoordhuis
Copy link
Member

Running node -e 'console.log("hi")' also segfaults.

Can you obtain a stack trace?

@sxa
Copy link
Member

sxa commented Mar 16, 2017

Ouch - this from a RHEL70 box (EDIT: Also occurs on 7.1)

#0  0x00003fffb7aaa4e0 in .__memset_power7 () from /lib64/power8/libc.so.6
#1  0x0000000010f69094 in ._ZN2v88internal18RegExpResultsCache5ClearEPNS0_10FixedArrayE ()
#2  0x0000000010cace24 in ._ZN2v88internal4Heap19MarkCompactPrologueEv ()
#3  0x0000000010cbf964 in ._ZN2v88internal4Heap11MarkCompactEv ()
#4  0x0000000010cca010 in ._ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE ()
#5  0x0000000010cca4f8 in ._ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE ()
#6  0x0000000010ccce30 in ._ZN2v88internal4Heap12ReserveSpaceEPNS0_4ListINS1_5ChunkENS0_25FreeStoreAllocationPolicyEEEPNS2_IPhS4_EE ()
#7  0x000000001104a598 in ._ZN2v88internal12Deserializer11DeserializeEPNS0_7IsolateE ()
#8  0x0000000010dcea9c in ._ZN2v88internal7Isolate4InitEPNS0_12DeserializerE
    ()
#9  0x00000000110560b4 in ._ZN2v88internal8Snapshot10InitializeEPNS0_7IsolateE
    ()
#10 0x000000001075cca8 in ._ZN2v87Isolate3NewERKNS0_12CreateParamsE ()
#11 0x00000000111f6f04 in ._ZN4node5StartEP9uv_loop_siPKPKciS5_ ()
#12 0x00000000111f650c in ._ZN4node5StartEiPPc ()
#13 0x000000001052b720 in .main ()

@mscdex mscdex added the v8 engine Issues and PRs related to the V8 dependency. label Mar 16, 2017
@bnoordhuis
Copy link
Member

@sxa555 Can you post the output of disassemble and info registers?

@sxa
Copy link
Member

sxa commented Mar 16, 2017

FYI I've built 7.7.3 from commit 9c68a69 locally and it doesn't fail in the same way. This is with this version of gcc (have the build boxes been upgraded recently to 4.9? I haven't used that yet)

gcc (GCC) 4.8.2 20140120 (Red Hat 4.8.2-16)

@sxa
Copy link
Member

sxa commented Mar 16, 2017

@bnoordhuis (For reference this output was just running "node" without any parameters)

(gdb) bt
#0  0x00003fffb7aaa4e0 in .__memset_power7 () from /lib64/power8/libc.so.6
#1  0x0000000010f69094 in ._ZN2v88internal18RegExpResultsCache5ClearEPNS0_10FixedArrayE ()
#2  0x0000000010cace24 in ._ZN2v88internal4Heap19MarkCompactPrologueEv ()
#3  0x0000000010cbf964 in ._ZN2v88internal4Heap11MarkCompactEv ()
#4  0x0000000010cca010 in ._ZN2v88internal4Heap24PerformGarbageCollectionENS0_16GarbageCollectorENS_15GCCallbackFlagsE ()
#5  0x0000000010cca4f8 in ._ZN2v88internal4Heap14CollectGarbageENS0_16GarbageCollectorENS0_23GarbageCollectionReasonEPKcNS_15GCCallbackFlagsE ()
#6  0x0000000010ccce30 in ._ZN2v88internal4Heap12ReserveSpaceEPNS0_4ListINS1_5ChunkENS0_25FreeStoreAllocationPolicyEEEPNS2_IPhS4_EE ()
#7  0x000000001104a598 in ._ZN2v88internal12Deserializer11DeserializeEPNS0_7IsolateE ()
#8  0x0000000010dcea9c in ._ZN2v88internal7Isolate4InitEPNS0_12DeserializerE ()
#9  0x00000000110560b4 in ._ZN2v88internal8Snapshot10InitializeEPNS0_7IsolateE ()
#10 0x000000001075cca8 in ._ZN2v87Isolate3NewERKNS0_12CreateParamsE ()
#11 0x00000000111f6f04 in ._ZN4node5StartEP9uv_loop_siPKPKciS5_ ()
#12 0x00000000111f650c in ._ZN4node5StartEiPPc ()
#13 0x000000001052b720 in .main ()
(gdb) info registers
r0             0x1	1
r1             0x3fffffffda00	70368744167936
r2             0x3fffb7be4410	70367531910160
r3             0xf	15
r4             0x0	0
r5             0x7ff	2047
r6             0x0	0
r7             0x30	48
r8             0x40	64
r9             0x400	1024
r10            0xf	15
r11            0x7	7
r12            0x800	2048
r13            0x3fffb7ffe190	70367536210320
r14            0x3fffffffe400	70368744170496
r15            0xffffffffba2e8ba3	18446744072538196899
r16            0x11fa4ab0	301615792
r17            0x7a940	502080
r18            0x7a940	502080
r19            0x0	0
r20            0x0	0
r21            0x1	1
r22            0x11f63a80	301349504
r23            0x2	2
r24            0x11f51e0e	301276686
r25            0x0	0
r26            0x0	0
r27            0x56f50	356176
r28            0x0	0
r29            0x11f63aa0	301349536
r30            0x11e1ce00	300011008
r31            0x3fffffffda00	70368744167936
pc             0x3fffb7aaa4e0	0x3fffb7aaa4e0 <.__memset_power7+64>
msr            0x800000010000d032	9223372041149796402
cr             0x44044841	1141131329
lr             0x10f69094	0x10f69094 <._ZN2v88internal18RegExpResultsCache5ClearEPNS0_10FixedArrayE+36>
ctr            0x3fffb7aaa4a0	70367530624160
xer            0x0	0
orig_r3        0xc00000000000908c	-4611686018427350900
trap           0x300	768
(gdb) disassemble
Dump of assembler code for function .__memset_power7:
   0x00003fffb7aaa4a0 <+0>:	cmpldi  cr7,r5,31
   0x00003fffb7aaa4a4 <+4>:	cmpldi  cr6,r5,8
   0x00003fffb7aaa4a8 <+8>:	mr      r10,r3
   0x00003fffb7aaa4ac <+12>:	rlwimi  r4,r4,8,16,23
   0x00003fffb7aaa4b0 <+16>:	rlwimi  r4,r4,16,0,15
   0x00003fffb7aaa4b4 <+20>:	ble     cr6,0x3fffb7aaa830 <.__memset_power7+912>
   0x00003fffb7aaa4b8 <+24>:	neg     r0,r3
   0x00003fffb7aaa4bc <+28>:	ble     cr7,0x3fffb7aaa7a0 <.__memset_power7+768>
   0x00003fffb7aaa4c0 <+32>:	andi.   r11,r10,7
   0x00003fffb7aaa4c4 <+36>:	rldimi  r4,r4,32,0
   0x00003fffb7aaa4c8 <+40>:	mr      r12,r5
   0x00003fffb7aaa4cc <+44>:	beq     0x3fffb7aaa500 <.__memset_power7+96>
   0x00003fffb7aaa4d0 <+48>:	clrldi  r0,r0,61
   0x00003fffb7aaa4d4 <+52>:	mtocrf  1,r0
   0x00003fffb7aaa4d8 <+56>:	subf    r5,r0,r5
   0x00003fffb7aaa4dc <+60>:	bns     cr7,0x3fffb7aaa4e8 <.__memset_power7+72>
=> 0x00003fffb7aaa4e0 <+64>:	stb     r4,0(r10)
   0x00003fffb7aaa4e4 <+68>:	addi    r10,r10,1
   0x00003fffb7aaa4e8 <+72>:	bne     cr7,0x3fffb7aaa4f4 <.__memset_power7+84>
   0x00003fffb7aaa4ec <+76>:	sth     r4,0(r10)
   0x00003fffb7aaa4f0 <+80>:	addi    r10,r10,2
   0x00003fffb7aaa4f4 <+84>:	ble     cr7,0x3fffb7aaa500 <.__memset_power7+96>
   0x00003fffb7aaa4f8 <+88>:	stw     r4,0(r10)
   0x00003fffb7aaa4fc <+92>:	addi    r10,r10,4
   0x00003fffb7aaa500 <+96>:	cmpldi  cr5,r5,255
   0x00003fffb7aaa504 <+100>:	li      r0,32
   0x00003fffb7aaa508 <+104>:	dcbtst  0,r10
   0x00003fffb7aaa50c <+108>:	cmpldi  cr6,r4,0
   0x00003fffb7aaa510 <+112>:	rldicl  r9,r5,61,3
   0x00003fffb7aaa514 <+116>:	crand   4*cr6+so,4*cr6+eq,4*cr5+gt
   0x00003fffb7aaa518 <+120>:	mtocrf  1,r9
   0x00003fffb7aaa51c <+124>:	bso     cr6,0x3fffb7aaa5e0 <.__memset_power7+320>
   0x00003fffb7aaa520 <+128>:	rldicl  r8,r5,59,5
   0x00003fffb7aaa524 <+132>:	clrldi  r11,r5,61
   0x00003fffb7aaa528 <+136>:	cmpldi  cr6,r11,0
   0x00003fffb7aaa52c <+140>:	cmpldi  cr1,r9,4
   0x00003fffb7aaa530 <+144>:	mtctr   r8
   0x00003fffb7aaa534 <+148>:	bne     cr7,0x3fffb7aaa560 <.__memset_power7+192>
   0x00003fffb7aaa538 <+152>:	std     r4,0(r10)
   0x00003fffb7aaa53c <+156>:	std     r4,8(r10)
   0x00003fffb7aaa540 <+160>:	addi    r10,r10,16
   0x00003fffb7aaa544 <+164>:	bns     cr7,0x3fffb7aaa570 <.__memset_power7+208>
   0x00003fffb7aaa548 <+168>:	std     r4,0(r10)
   0x00003fffb7aaa54c <+172>:	addi    r10,r10,8
   0x00003fffb7aaa550 <+176>:	mr      r12,r10
   0x00003fffb7aaa554 <+180>:	blt     cr1,0x3fffb7aaa5b0 <.__memset_power7+272>
   0x00003fffb7aaa558 <+184>:	b       0x3fffb7aaa570 <.__memset_power7+208>
   0x00003fffb7aaa55c <+188>:	ori     r2,r2,0
   0x00003fffb7aaa560 <+192>:	bns     cr7,0x3fffb7aaa570 <.__memset_power7+208>
   0x00003fffb7aaa564 <+196>:	std     r4,0(r10)
   0x00003fffb7aaa568 <+200>:	addi    r10,r10,8
   0x00003fffb7aaa56c <+204>:	ori     r2,r2,0
   0x00003fffb7aaa570 <+208>:	addi    r12,r10,32
   0x00003fffb7aaa574 <+212>:	std     r4,0(r10)
   0x00003fffb7aaa578 <+216>:	std     r4,8(r10)
   0x00003fffb7aaa57c <+220>:	std     r4,16(r10)
   0x00003fffb7aaa580 <+224>:	std     r4,24(r10)
   0x00003fffb7aaa584 <+228>:	bdz     0x3fffb7aaa5b0 <.__memset_power7+272>

@bnoordhuis
Copy link
Member

Looks like an almost-nullptr bug. It tries to store a byte at the address in r10, which is 0xf:

0x00003fffb7aaa4e0 <+64>: stb r4,0(r10) # r10 == 0xf

r3 (first function argument) is moved into r10 a few lines up so it would seem Heap::MarkCompactPrologue() is calling RegExpResultsCache::Clear() with a bad FixedArray pointer. That's about all I can glean from it though, the root cause is probably elsewhere.

@mhdawson
Copy link
Member

mhdawson commented Mar 16, 2017

The compiler version on the test/build boxes is:

root@test-osuosl-ubuntu14-ppc64-be-3:~# gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4
Copyright (C) 2013 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

@mhdawson
Copy link
Member

The binaries seem to run ok on the machines on which they were built:

iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ uname -a
Linux test-osuosl-ubuntu14-ppc64-be-3 4.2.0-27-powerpc64-smp #32~14.04.1-Ubuntu SMP Fri Jan 22 15:47:25 UTC 2016 ppc64 ppc64 ppc64 GNU/Linux

@mhdawson
Copy link
Member

iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ node -e 'console.log("hi")';
hi
iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ node --version
v7.7.3-nightly20170309c62798034a
iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ node -e 'console.log("hi")';
hi
iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$

@mhdawson
Copy link
Member

output just before crash with LD_DEBUG=all

    23377:     symbol=munmap;  lookup in file=/lib64/power8/libc.so.6 [0]
     23377:     binding file ./node [0] to /lib64/power8/libc.so.6 [0]: normal symbol `munmap' [GLIBC_2.3]
     23377:     symbol=mprotect;  lookup in file=./node [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/libdl.so.2 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/power8/librt.so.1 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/libstdc++.so.6 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/power8/libm.so.6 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/libgcc_s.so.1 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/power8/libpthread.so.0 [0]
     23377:     symbol=mprotect;  lookup in file=/lib64/power8/libc.so.6 [0]
     23377:     binding file ./node [0] to /lib64/power8/libc.so.6 [0]: normal symbol `mprotect' [GLIBC_2.3]
Segmentation fault (core dumped)

@mhdawson
Copy link
Member

On RHEL machine:

-sh-4.2$ /lib64/power8/libc.so.6
GNU C Library (GNU libc) stable release version 2.17, by Roland McGrath et al.
Copyright (C) 2012 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.8.3 20140911 (Red Hat 4.8.3-7).
Compiled on a Linux 3.10.0 system on 2015-01-19.
Available extensions:
        The C stubs add-on version 2.1.2.
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
        RT using linux kernel aio
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<http://www.gnu.org/software/libc/bugs.html>.

On community machine

iojs@test-osuosl-ubuntu14-ppc64-be-3:~/build/mtest/node-v7.7.3-nightly20170309c62798034a-linux-ppc64/bin$ /lib64/libc.so.6
GNU C Library (Ubuntu EGLIBC 2.19-0ubuntu6.9) stable release version 2.19, by Roland McGrath et al.
Copyright (C) 2014 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
Compiled by GNU CC version 4.8.4.
Compiled on a Linux 3.13.11 system on 2016-05-26.
Available extensions:
        crypt add-on version 2.1 by Michael Glad and others
        GNU Libidn by Simon Josefsson
        Native POSIX Threads Library by Ulrich Drepper et al
        BIND-8.2.3-T5B
libc ABIs: UNIQUE IFUNC
For bug reporting instructions, please see:
<https://bugs.launchpad.net/ubuntu/+source/eglibc/+bugs>.

@mhdawson
Copy link
Member

I wonder if its the glibc version. I believe we had run the community binaries on our RHEL 7 machines in the past, but possibly node is now using something before that is not compatible across glibc versions 2.17 and 2.19

@mhdawson
Copy link
Member

@sxa555 do you have an environment where you can install a newer glibc on RHEL 7 and see if that makes a difference ?

@mhdawson mhdawson changed the title npm --version segfaults on v7.7.3 linux ppc64 (BE) only npm --version segfaults on v7.7.3 linux ppc64 (BE) only ON RHEL 7 Mar 16, 2017
@mhdawson
Copy link
Member

Updated title to indicate crash is only on RHEL 7 as binaries seem to run on on ubuntu 14 BE.

@richardlau
Copy link
Member

We're able to run v7.50 binaries, but not v7.6.0 and later:

-bash-4.2$ node-v7.5.0-linux-ppc64/bin/node
> .exit
-bash-4.2$ node-v7.6.0-linux-ppc64/bin/node
Segmentation fault (core dumped)
-bash-4.2$

@bnoordhuis
Copy link
Member

The V8 5.4 -> 5.5 upgrade in 61870b4 seems like the most obvious culprit. Can you check if that commit fails and the preceding commit works?

@richardlau
Copy link
Member

For the 8.0.0 nightlies (from https://nodejs.org/download/nightly/):

-bash-4.2$ node-v8.0.0-nightly20170126a67a04d765-linux-ppc64/bin/node
> .exit
-bash-4.2$ node-v8.0.0-nightly20170127b19334e566-linux-ppc64/bin/node
Segmentation fault (core dumped)
-bash-4.2$

@richardlau
Copy link
Member

I guess this is also pointing to the V8 5.4->5.5 update:

-bash-4.2$ git log a67a04d765..b19334e566 --oneline
b19334e test: expand test coverage of fs.js
bee83e0 test: expand test coverage of events.js
e71c278 url: stop exporting originFor()
ad6e778 benchmark: add benchmark for object properties
084acc8 test: check noAssert option in buf.write*()
24ef1e6 string_decoder: align UTF-8 handling with V8
007386e repl: remove workaround for function redefinition
c2c6ae5 test: move test-vm-function-redefinition to parallel
b37f55a deps: limit regress/regress-crbug-514081 v8 test
91ab09f src: update NODE_MODULE_VERSION to 52
2739185 deps: update V8 to 5.5.372.40
-bash-4.2$

@richardlau
Copy link
Member

The V8 5.4 -> 5.5 upgrade in 61870b4 seems like the most obvious culprit. Can you check if that commit fails and the preceding commit works?

node works if we compile and run locally -- The failures are with the binaries from nodejs.org running locally.

@gibfahn
Copy link
Member

gibfahn commented Mar 17, 2017

The gcc version for the test-osuosl-ubuntu14-ppc64_be_1 machine (should be the same as the release machines):

test-osuosl-ubuntu14-ppc64-be-3:~$ gcc --version
gcc (Ubuntu 4.8.4-2ubuntu1~14.04.3) 4.8.4

@sxa
Copy link
Member

sxa commented Mar 17, 2017

RHEL 7.2 suffers the same symptoms - it is supplied with gcc/g++ 4.8.5-4 - even later than the Ubuntu 14.04 one, which suggests it's either a compiler bug specific to Ubuntu''s specific gcc version (I'm thinking unlikely but not impossible) or more likely the new V8 is triggering something that is using some functionality from glibc later than 2.17 (RHEL7 has 2.17, Ubu14.04 has 2.19)

Going the other way round, node7 binaries built on RHEL7 appear to run ok on Ubuntu 14.04 - possibly because it's built against an earlier glibc) so I wonder if a CentOS build machine (same as x64?) might be a better choice than Ubuntu 14.04. For the record, we have built with Ubuntu 14.04.1 on PPC-LE (Note: the BE community machines are 14.04.5) and that appears to run ok on RHEL7.

(I've also tried building my own glibc 2.19 on RHEL7 but that didn't execute properly with anything on the system)

@sxa
Copy link
Member

sxa commented Mar 23, 2017

I've got my own clean Ubuntu 14.04 now that I can experiment with and replicates the (lack of) problem on that platform. That at least confirms it's not anything magic on the CI machines that's making it work ;-)

@sxa
Copy link
Member

sxa commented Mar 28, 2017

Have tried building my own gcc/g++ (version 4.8.5) and that still causes a crash in the same place when run on RHEL7, so whatever we're seeing isn't an issue specific to Ubuntu's compiler.

Updating glibc on the RHEL7 box is "non-trivial" so I can't really recommend such a course of action (needs the dynamic loader and other stuff updated)

@sxa
Copy link
Member

sxa commented Mar 28, 2017

It's the Clear function at the end of https://github.com/nodejs/node/blob/v7.x/deps/v8/src/regexp/jsregexp.cc that's causing the memset call, which is invoked from line 1472 of https://github.com/nodejs/node/blob/v7.x/deps/v8/src/heap/heap.cc.

@sxa
Copy link
Member

sxa commented Mar 28, 2017

For reference: I did also try using the headers from glibc 2.17 (the RHEL version) on the ubuntu 14.04 build system but that still seemed to cause a crash in the same place.

@gibfahn
Copy link
Member

gibfahn commented Mar 31, 2017

@bnoordhuis quick ping in case you have any more suggestions, otherwise it looks like there's not much we can do here.

@bnoordhuis
Copy link
Member

No other suggestions. I'll go ahead and close it out.

@mhdawson
Copy link
Member

mhdawson commented Apr 5, 2017

An interesting link that tracks abi compatibility between versions: https://abi-laboratory.pro/tracker/timeline/glibc/

incompatible changes between 2.17 and 2.18 are related to

  • pthread_mutex_s structures
  • anon-struct-siginfo.h

2.19 shows as 100% compatible with 2.17

Its not obvious from the description of the crash that it is related to either of those two things,

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
ppc Issues and PRs related to the Power architecture. v8 engine Issues and PRs related to the V8 dependency.
Projects
None yet
Development

No branches or pull requests

7 participants