-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Recurrent intermittent travis test failure #4016
Comments
I can get this on my OSX box as well, if I just run If there's anything I can do to help debug this, let me know. |
So if I had paid more attention, the problem occurs during the DSP tests, where the worker terminates. The following is sufficient to cause a segfault on two linux systems that I tried: julia> ;cd test
/home/kmsquire/Source/julia/test
julia> using Base.Test
julia> while true
include("dsp.jl")
end
Segmentation fault (core dumped) |
It might be worth valgrinding this one with MEMDEBUG enabled. I did earlier today (unreleatedly) and I saw a fair number of invalid reads/writes though I can't rule out that those weren't caused by my changes. |
@loladiro, will do. Right now, in the debugger, I can see that there's memory corruption. julia> while true
include("dsp.jl")
end
Program received signal SIGSEGV, Segmentation fault.
0x00007ffff72529a4 in pool_alloc (p=0x7ffff7fcdb68) at gc.c:489
489 p->freelist = p->freelist->next;
Missing separate debuginfos, use: debuginfo-install ncurses-libs-5.7-3.20090208.el6.x86_64
(gdb) backtrace
#0 0x00007ffff72529a4 in pool_alloc (p=0x7ffff7fcdb68) at gc.c:489
#1 0x00007ffff72540ef in allocobj (sz=368) at gc.c:981
#2 0x00007ffff7242008 in _new_array (atype=0x67fa60, ndims=1, dims=0x7fffffffb7f0) at array.c:80
#3 0x00007ffff7242c95 in jl_alloc_array_1d (atype=0x67fa60, nr=40) at array.c:297
#4 0x00007ffff0e382a9 in ?? ()
#5 0x00007fffffffb930 in ?? ()
#6 0x01007ffff7242008 in ?? ()
#7 0x0000000003c48350 in ?? ()
#8 0x0000004e00000000 in ?? ()
#9 0x000000000000000b in ?? ()
#10 0x0000000000000008 in ?? ()
#11 0x000000000000000b in ?? ()
#12 0x00007fffffffb960 in ?? ()
#13 0x0000000200000100 in ?? ()
#14 0x0000000000ad90c0 in ?? ()
#15 0x0000000000000580 in ?? ()
#16 0x0000000000000000 in ?? ()
(gdb) print p
$1 = (pool_t *) 0x7ffff7fcdb68
(gdb) print *p
$2 = {osize = 384, pages = 0x3e5c380, freelist = 0x4009000000000000}
(gdb) print *(p.freelist)
Cannot access memory at address 0x4009000000000000
(gdb) |
Here you go - https://gist.github.com/amitmurthy/6218188 |
Yup, that's the one I saw earlier today as well. I'm not quite sure but I think it might be related to the size of the work array in gesdd that was changed recently. |
FWIW, I reported this upstream back when this problem originally appeared, and the documentation of ZGESDD was recently fixed (although the fix won't appear in an LAPACK release until sometime this summer). |
Happens both on clang (here and here) and gcc (here)
It's also unclear where
task.jl:797
is, since task.jl only has 164 lines, but possibly from stream.jl:797.Other backtrace locations are iobuffer.jl:68 and stream.jl:609
I was looking to see if there might be a race condition in
IOBuffer
, e.g., whereisopen()
becomes false inwait_nb
before data is written, or thereadnotify
condition is notified before the buffer is filled, etc., but didn't see anything obvious.The text was updated successfully, but these errors were encountered: