Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Occasional segfaults when running with @threads #44019

Closed
chipbuster opened this issue Feb 2, 2022 · 7 comments
Closed

Occasional segfaults when running with @threads #44019

chipbuster opened this issue Feb 2, 2022 · 7 comments
Labels
heisenbug This bug occurs unpredictably multithreading Base.Threads and related functionality

Comments

@chipbuster
Copy link
Contributor

chipbuster commented Feb 2, 2022

Executive summary: I occasionally get segfaults when running an @threads loop in Julia 1.7.1. It is a Heisenbug, and I'm still trying to get rr working on my system.

I've been trying to pare down the example code, but this is definitely a heisenbug and it often takes me several tries to confirm that the program still segfaults with a set of changes, so progress has been slow. I also have not been able to get an rr trace since rr refuses to run at all on my system at the moment, though I'll work towards getting that working in my spare time.

Running on an Arch system (confirmed that this still happens after a reboot w/ no updates), julia installed from the julia-bin package which downloads and unpacks the official binaries.

I see segfaults with 32 threads, they're much rarer with just 16 but can still occur after a very long time.

Current Code I will continue to pare this down to get as minimal an example as possible.
using Base.Iterators
using Base.Threads
using Serialization
using Distributions

function bin_data(data, lo, hi, nbins)
    dx = (hi - lo) / nbins
    bins = ((data .- lo) ./ dx) .|> floor
    bins = UInt8.(bins)
    clamp.(bins, UInt8(0), UInt8(nbins))
end

l = SpinLock()
function compress_data(data)
    lock(l)
    tmpfn = tempname()
    unlock(l)
    write(tmpfn, data)
    run(
        pipeline(
            `xz -9e --keep --format=raw --suffix=.xz $(tmpfn)`,
            stdout = devnull,
            stderr = devnull,
        ),
    )
    nbytes = filesize(tmpfn * ".xz")
    rm(tmpfn * ".xz")
    rm(tmpfn)
    return nbytes
end

compressed_size_bytes(data) = compress_data(data)
compressed_size_bits(data) = compress_data(data) * 8

function emission_times_exp(n, k, Γ)
    η = (k + Γ) / (k * Γ)
    dist = Exponential(η)
    rand(dist, n)
end

function lose_data(lagtimes, γ)
    @assert(all(lagtimes .>= 0.0))
    ind = Int[]
    fixed_times = cumsum(lagtimes)
    for i = 1:length(lagtimes)
        x = rand()
        if x < γ
            push!(ind, i)
        end
    end
    detected_times = fixed_times[ind]
    detected_times |> diff
end

ns = [100_000, 1_000_000, 10_000_000]
# ns = [1_000]  # testing only
ks = [0.1, 0.5, 1.0, 5.0, 10.0]
Γs = [0.1, 0.5, 1.0, 5.0, 10.0]
γs = range(0.1, 1.0, step = 0.1)
ntrials = 1000

smrates = Iterators.product(ks, Γs) |> collect |> vec

l = SpinLock()
@threads for trialnum = 1:ntrials
    data = Dict()
    for p in smrates
        (k, Γ) = p
        for n in ns
            # nm_times = get_emission_dt(n, k, Γ)
            # mar_times = emission_times_exp(n, k, Γ)
            nm_times = 10.0 .* rand(n)
            mar_times = 10.0 .* rand(n)

            for γ in γs
                nm_lost = lose_data(nm_times, γ)
                mar_lost = lose_data(mar_times, γ)
                hi = max(maximum(nm_lost),maximum(mar_lost))

                @assert(all(nm_lost .>= 0.0))
                @assert(all(mar_lost .>= 0.0))

                nm_binned = bin_data(nm_lost, 0.0, hi, 100)
                mar_binned = bin_data(mar_lost, 0.0, hi, 100)

                nm_size = compressed_size_bytes(nm_binned)
                mar_size = compressed_size_bytes(mar_binned)

                experiment_index = (n = n, k = k, Γ = Γ, γ = γ, trial = trialnum)

                try
                    lock(l)
                    data[experiment_index] = (1.0, 1.0)
                finally
                    unlock(l)
                end
            end
        end
    end
    serialize("../data/compression_sweep_$(trialnum).jls", data)
    @info "Finishing trial $(trialnum)"
end
Output of `versioninfo()`
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: AMD Ryzen 9 5950X 16-Core Processor
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, znver3)
Stack trace from the one time in many that the program managed to print one before dying

The line that originates in my code (line 65 at the top) is just the @threads loop.

signal (11): Segmentation fault
in expression starting at /mnt/ssd-data/Experiments/02-2022/simple-photon-model/code/gen_paramsweep_segfault.jl:65
jl_uv_call_close_callback at /buildworker/worker/package_linux64/build/src/jl_uv.c:88 [inlined]
jl_uv_closeHandle at /buildworker/worker/package_linux64/build/src/jl_uv.c:111
uv__finish_close at /workspace/srcdir/libuv/src/unix/core.c:301
uv__run_closing_handles at /workspace/srcdir/libuv/src/unix/core.c:315
uv_run at /workspace/srcdir/libuv/src/unix/core.c:393
jl_process_events at /buildworker/worker/package_linux64/build/src/jl_uv.c:214
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:528
poptask at ./task.jl:827
wait at ./task.jl:836
Exception: julia killed by signal segmentation fault (core dumped)
[tty 13], line 1: E:JULIA_NUM_THREADS=32 julia gen_paramsweep_segfault.jl

Let me know if any other information would be helpful!

@giordano
Copy link
Contributor

giordano commented Feb 3, 2022

and I'm still trying to get rr working on my system.

Did you try BugReporting.jl already?

@giordano giordano added heisenbug This bug occurs unpredictably multithreading Base.Threads and related functionality labels Feb 3, 2022
@chipbuster
Copy link
Contributor Author

Did you try BugReporting.jl already?

I had not. On trying, it appears to fail an assertion and crash:

Output of `julia --bug-report=rr-local segfaulting_file.jl`
rr: Saving execution to trace directory `/home/chipbuster/.local/share/rr/julia-0'.
[FATAL /workspace/srcdir/rr/src/RecordSession.cc:1478:inject_handled_signal()]
 (task 444527 (rec:444527) at time 38992)
 -> Assertion `t->stop_sig() == SIGTRAP' failed to hold. Got unexpected status 0x117f (STOP-SIGCHLD)
Tail of trace dump:
{
  real_time:23068.360423 global_time:38972, event:`SYSCALLBUF_FLUSH' tid:451294, ticks:1108222
  { syscall:'write', ret:0x17ca, size:0x10, desched:1 }
  { syscall:'rt_sigprocmask', ret:0x0, size:0x18 }
}
{
  real_time:23068.360438 global_time:38973, event:`PATCH_SYSCALL' tid:451294, ticks:1108222
rax:0x5d rbx:0x55ea44388340 rcx:0xffffffffffffffff rdx:0xffffffff rsi:0x3e8 rdi:0x6 rbp:0x1 rsp:0x7fffe9db8408 r8:0x55ea44388620 r9:0x17c9 r10:0x8 r11:0x246 r12:0x2d r13:0x55ea4437d540 r14:0x55ea4437dd00 r15:0x0 rip:0x7fa96c0bbac9 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7fa96bfc3400 gs_base:0x0
  { tid:451294, addr:0x7fa96c2285f4, length:0x4f }
  { tid:451294, addr:0x7fa96c0bbac9, length:0x5 }
  { tid:451294, addr:0x7fa96c0bbace, length:0x3 }
}
{
  real_time:23068.360440 global_time:38974, event:`SYSCALLBUF_RESET' tid:451294, ticks:1108222
}
{
  real_time:23068.360452 global_time:38975, event:`SYSCALL: fchown' (state:ENTERING_SYSCALL) tid:451294, ticks:1108228
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0xffffffff rsi:0x3e8 rdi:0x6 rbp:0x5d rsp:0x681ffde0 r8:0x55ea44388620 r9:0x17c9 r10:0x8 r11:0x246 r12:0x2d r13:0x55ea4437d540 r14:0x55ea4437dd00 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x5d fs_base:0x7fa96bfc3400 gs_base:0x0
}
{
  real_time:23068.360463 global_time:38976, event:`SYSCALL: fchown' (state:EXITING_SYSCALL) tid:451294, ticks:1108228
rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0xffffffff rsi:0x3e8 rdi:0x6 rbp:0x5d rsp:0x681ffde0 r8:0x55ea44388620 r9:0x17c9 r10:0x8 r11:0x246 r12:0x2d r13:0x55ea4437d540 r14:0x55ea4437dd00 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x5d fs_base:0x7fa96bfc3400 gs_base:0x0
}
{
  real_time:23068.360475 global_time:38977, event:`SYSCALL: fchown' (state:ENTERING_SYSCALL) tid:451294, ticks:1108239
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x3e8 rsi:0xffffffff rdi:0x6 rbp:0x5d rsp:0x681ffde0 r8:0x55ea44388620 r9:0x17c9 r10:0x8 r11:0x246 r12:0x2d r13:0x55ea4437d540 r14:0x55ea4437dd00 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x5d fs_base:0x7fa96bfc3400 gs_base:0x0
}
{
  real_time:23068.360485 global_time:38978, event:`SYSCALL: fchown' (state:EXITING_SYSCALL) tid:451294, ticks:1108239
rax:0x0 rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x3e8 rsi:0xffffffff rdi:0x6 rbp:0x5d rsp:0x681ffde0 r8:0x55ea44388620 r9:0x17c9 r10:0x8 r11:0x246 r12:0x2d r13:0x55ea4437d540 r14:0x55ea4437dd00 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x5d fs_base:0x7fa96bfc3400 gs_base:0x0
}
{
  real_time:23068.360508 global_time:38979, event:`PATCH_SYSCALL' tid:451294, ticks:1108244
rax:0x5b rbx:0x55ea44388340 rcx:0xffffffffffffffff rdx:0x3e8 rsi:0x1a4 rdi:0x6 rbp:0x1 rsp:0x7fffe9db8408 r8:0x55ea44388620 r9:0x17c9 r10:0x8 r11:0x246 r12:0x2d r13:0x55ea4437d540 r14:0x55ea4437dd00 r15:0x0 rip:0x7fa96c0ba2d9 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7fa96bfc3400 gs_base:0x0
  { tid:451294, addr:0x7fa96c228643, length:0x4f }
  { tid:451294, addr:0x7fa96c0ba2d9, length:0x5 }
  { tid:451294, addr:0x7fa96c0ba2de, length:0x3 }
}
{
  real_time:23068.360522 global_time:38980, event:`SYSCALLBUF_FLUSH' tid:451294, ticks:1108262
  { syscall:'fchmod', ret:0x0, size:0x10 }
}
{
  real_time:23068.360536 global_time:38981, event:`PATCH_SYSCALL' tid:451294, ticks:1108262
rax:0x118 rbx:0x55ea44388340 rcx:0xffffffffffffffff rdx:0x7fffe9db8410 rsi:0x0 rdi:0x6 rbp:0x1 rsp:0x7fffe9db8408 r8:0x55ea44388620 r9:0x17c9 r10:0x0 r11:0x246 r12:0x2d r13:0x55ea4437d540 r14:0x55ea4437dd00 r15:0x0 rip:0x7fa96c0bf1dc eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7fa96bfc3400 gs_base:0x0
  { tid:451294, addr:0x7fa96c228692, length:0x4f }
  { tid:451294, addr:0x7fa96c0bf1dc, length:0x5 }
  { tid:451294, addr:0x7fa96c0bf1e1, length:0x3 }
}
{
  real_time:23068.360538 global_time:38982, event:`SYSCALLBUF_RESET' tid:451294, ticks:1108262
}
{
  real_time:23068.360550 global_time:38983, event:`SYSCALLBUF_FLUSH' tid:451294, ticks:1108283
  { syscall:'utimensat', ret:0x0, size:0x10 }
}
{
  real_time:23068.360563 global_time:38984, event:`PATCH_SYSCALL' tid:451294, ticks:1108283
rax:0x3 rbx:0x55ea44388340 rcx:0xffffffffffffffff rdx:0x7fffe9db8410 rsi:0x0 rdi:0x6 rbp:0x1 rsp:0x7fffe9db8408 r8:0x55ea44388620 r9:0x17c9 r10:0x0 r11:0x246 r12:0x2d r13:0x55ea4437d540 r14:0x55ea4437dd00 r15:0x0 rip:0x7fa96c1a9805 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7fa96bfc3400 gs_base:0x0
  { tid:451294, addr:0x7fa96c2286e1, length:0x4f }
  { tid:451294, addr:0x7fa96c1a9805, length:0x5 }
  { tid:451294, addr:0x7fa96c1a980a, length:0x3 }
}
{
  real_time:23068.360565 global_time:38985, event:`SYSCALLBUF_RESET' tid:451294, ticks:1108283
}
{
  real_time:23068.360590 global_time:38986, event:`SYSCALLBUF_FLUSH' tid:451294, ticks:1108995
  { syscall:'close', ret:0x0, size:0x10 }
  { syscall:'close', ret:0x0, size:0x10 }
  { syscall:'rt_sigprocmask', ret:0x0, size:0x18 }
  { syscall:'close', ret:0x0, size:0x10 }
  { syscall:'close', ret:0x0, size:0x10 }
}
{
  real_time:23068.360608 global_time:38987, event:`PATCH_SYSCALL' tid:451294, ticks:1108995
rax:0xe7 rbx:0x7fa96c18d470 rcx:0xffffffffffffffff rdx:0x3c rsi:0xe7 rdi:0x0 rbp:0x0 rsp:0x7fffe9db83d8 r8:0xffffffffffffff88 r9:0x1 r10:0x5 r11:0x246 r12:0x7fa96c18d470 r13:0x1 r14:0x7fa96c18d948 r15:0x0 rip:0x7fa96c096f3f eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7fa96bfc3400 gs_base:0x0
  { tid:451294, addr:0x7fa96c228730, length:0x4f }
  { tid:451294, addr:0x7fa96c096f3f, length:0x5 }
  { tid:451294, addr:0x7fa96c096f44, length:0x3 }
}
{
  real_time:23068.360610 global_time:38988, event:`SYSCALLBUF_RESET' tid:451294, ticks:1108995
}
{
  real_time:23068.360682 global_time:38989, event:`SYSCALL: exit_group' (state:ENTERING_SYSCALL) tid:451294, ticks:1109001
rax:0xffffffffffffffda rbx:0x681fffa0 rcx:0xffffffffffffffff rdx:0x3c rsi:0xe7 rdi:0x0 rbp:0xe7 rsp:0x681ffde0 r8:0xffffffffffffff88 r9:0x1 r10:0x5 r11:0x246 r12:0x7fa96c18d470 r13:0x1 r14:0x7fa96c18d948 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xe7 fs_base:0x7fa96bfc3400 gs_base:0x0
  { tid:451294, addr:0x7fa96c1ec14c, length:0x1 }
  { tid:451294, addr:0x7fa96c1ec33a, length:0x1 }
}
{
  real_time:23068.360697 global_time:38990, event:`EXIT' tid:451294, ticks:1109001
}
{
  real_time:23068.361217 global_time:38991, event:`SIGNAL: SIGCHLD(async)' tid:444527, ticks:2364665664
rax:0x7fd7ccbb0180 rbx:0x7fd7cc5865d0 rcx:0x35df780 rdx:0x24b1fc0 rsi:0x7fd7cc5e00b0 rdi:0x7fd8437c2980 rbp:0x7fd7c39f2c00 rsp:0x7fd7c39f2bf0 r8:0x7fd7cc5e0030 r9:0x2 r10:0xfffffffffffffff0 r11:0x7fd84337e8d0 r12:0x7fd7cc586570 r13:0x7fd7c39f2d90 r14:0x0 r15:0x0 rip:0x7fd84337df10 eflags:0x202 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xffffffffffffffff fs_base:0x7fd843ce3000 gs_base:0x0
}
=== Start rr backtrace:
/home/chipbuster/.julia/artifacts/fed663d473a94b4d75db54e1a8e1cc8b42875680/bin/rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x249)[0x582cd9]
/home/chipbuster/.julia/artifacts/fed663d473a94b4d75db54e1a8e1cc8b42875680/bin/rr[0x5136ed]
/home/chipbuster/.julia/artifacts/fed663d473a94b4d75db54e1a8e1cc8b42875680/bin/rr(_ZN2rr13RecordSession20signal_state_changedEPNS_10RecordTaskEPNS0_9StepStateE+0x7ee)[0x53ea7e]
/home/chipbuster/.julia/artifacts/fed663d473a94b4d75db54e1a8e1cc8b42875680/bin/rr(_ZN2rr13RecordSession11record_stepEv+0x4eb)[0x542f9b]
/home/chipbuster/.julia/artifacts/fed663d473a94b4d75db54e1a8e1cc8b42875680/bin/rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0xa24)[0x534784]
/home/chipbuster/.julia/artifacts/fed663d473a94b4d75db54e1a8e1cc8b42875680/bin/rr(main+0x276)[0x4c3a06]
/usr/bin/../lib/libc.so.6(__libc_start_main+0xd5)[0x7f08ba689b25]
/home/chipbuster/.julia/artifacts/fed663d473a94b4d75db54e1a8e1cc8b42875680/bin/rr[0x4c3e3e]
=== End rr backtrace
Launch gdb with
  gdb '-l' '10000' '-ex' 'set sysroot /' '-ex' 'target extended-remote 127.0.0.1:51311' /usr/bin/julia

Interesting to note that this is a different assertion than the one that trips when I try to run rr record normally, but I don't think it's working still.

@paulmelis
Copy link
Contributor

Note sure how precise valgrind is in this case, but it consistently points to this location, looks like (at least) a null pointer.

snellius paulm@tcn116 17:23 ~$ valgrind --smc-check=all-non-file --suppressions=$HOME/c/julia-git/contrib/valgrind-julia.supp ~/software/julia-1.7.1/bin/julia -t 128 segfault.jl
==1988794== Memcheck, a memory error detector
==1988794== Copyright (C) 2002-2017, and GNU GPL'd, by Julian Seward et al.
==1988794== Using Valgrind-3.18.1 and LibVEX; rerun with -h for copyright info
==1988794== Command: /home/paulm/software/julia-1.7.1/bin/julia -t 128 segfault.jl
==1988794== 
--1988794-- WARNING: unhandled amd64-linux syscall: 1008
--1988794-- You may be able to write your own handler.
--1988794-- Read the file README_MISSING_SYSCALL_OR_IOCTL.
--1988794-- Nevertheless we consider this a bug.  Please report
--1988794-- it at http://valgrind.org/support/bug_reports.html.
==1988794== Warning: client switching stacks?  SP change: 0x311d17d8 --> 0xe2a5fff8
==1988794==          to suppress, use: --max-stackframe=2978539552 or greater
==1988794== Warning: invalid file descriptor -1 in syscall close()
==1988794== Warning: invalid file descriptor -1 in syscall close()
==1988794== Warning: client switching stacks?  SP change: 0x30bce7d8 --> 0xef554ff8
==1988794==          to suppress, use: --max-stackframe=3197659168 or greater
==1988794== Warning: client switching stacks?  SP change: 0x34dd57d8 --> 0xf3958ff8
==1988794==          to suppress, use: --max-stackframe=3199744032 or greater
==1988794==          further instances of this message will not be shown.
==1988794== Thread 3:
==1988794== Syscall param write(buf) points to uninitialised byte(s)
==1988794==    at 0x4F4A52D: syscall (in /usr/lib64/libc-2.28.so)
==1988794==  Address 0xef54e000 is in a rw- anonymous segment
==1988794== 
==1988794== Syscall param write(buf) points to unaddressable byte(s)
==1988794==    at 0x4F4A52D: syscall (in /usr/lib64/libc-2.28.so)
==1988794==  Address 0xef54e000 is in a rw- anonymous segment
==1988794== 
==1988794== Thread 80:
==1988794== Invalid read of size 8
==1988794==    at 0x5B87E84: maybe_collect (julia_threads.h:325)
==1988794==    by 0x5B87E84: jl_gc_big_alloc (gc.c:947)
==1988794==  Address 0xfffffffffe49ff10 is not stack'd, malloc'd or (recently) free'd
==1988794== 
==1988794== Thread 88:
==1988794== Invalid read of size 8
==1988794==    at 0x5B7CC12: jl_gc_state_set (julia_threads.h:325)
==1988794==    by 0x5B7CC12: jl_task_get_next (partr.c:523)
==1988794==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
==1988794== 
==1988794== 
==1988794== Process terminating with default action of signal 11 (SIGSEGV): dumping core
==1988794==  Access not within mapped region at address 0x0
==1988794==    at 0x5B7CC12: jl_gc_state_set (julia_threads.h:325)
==1988794==    by 0x5B7CC12: jl_task_get_next (partr.c:523)
==1988794==  If you believe this happened as a result of a stack
==1988794==  overflow in your program's main thread (unlikely but
==1988794==  possible), you can try to increase the size of the
==1988794==  main thread stack using the --main-stacksize= flag.
==1988794==  The main thread stack size used in this run was 16777216.
==1988794== 
==1988794== HEAP SUMMARY:
==1988794==     in use at exit: 603,293,029 bytes in 49,164 blocks
==1988794==   total heap usage: 1,022,326 allocs, 973,162 frees, 3,449,011,340 bytes allocated
==1988794== 
==1988794== LEAK SUMMARY:
==1988794==    definitely lost: 163 bytes in 12 blocks
==1988794==    indirectly lost: 0 bytes in 0 blocks
==1988794==      possibly lost: 1,273,212 bytes in 12,672 blocks
==1988794==    still reachable: 602,018,922 bytes in 36,477 blocks
==1988794==                       of which reachable via heuristic:
==1988794==                         newarray           : 56,448 bytes in 10 blocks
==1988794==                         multipleinheritance: 7,992 bytes in 15 blocks
==1988794==         suppressed: 732 bytes in 3 blocks
==1988794== Rerun with --leak-check=full to see details of leaked memory
==1988794== 
==1988794== Use --track-origins=yes to see where uninitialised values come from
==1988794== For lists of detected and suppressed errors, rerun with: -s
==1988794== ERROR SUMMARY: 15 errors from 4 contexts (suppressed: 46 from 6)
Segmentation fault

@chipbuster
Copy link
Contributor Author

I've gotten a working version of rr on this system and gotten a crash while under rr in chaos mode. Unfortunately, the crash also took rr with it, so I'm not certain how useful the trace will wind up being.

A packed tarball of the trace can be found at https://www.dropbox.com/s/yv3zuvy7ojod7nb/gh-julia-44019-rr-1.tar.xz?dl=0

Console output when running `rr`
❯ E:JULIA_NUM_THREADS=32 /home/chipbuster/tmp/rr/rr/build/bin/rr record -h julia gen_paramsweep_segfault.jl
rr: Saving execution to trace directory `/home/chipbuster/.local/share/rr/julia-2'.
The futex facility returned an unexpected error code.

signal (6): Aborted
in expression starting at /mnt/ssd-data/Experiments/02-2022/simple-photon-model/code/gen_paramsweep_segfault.jl:65
gsignal at /usr/bin/../lib/libc.so.6 (unknown line)
abort at /usr/bin/../lib/libc.so.6 (unknown line)
__libc_message at /usr/bin/../lib/libc.so.6 (unknown line)
__libc_fatal at /usr/bin/../lib/libc.so.6 (unknown line)
__futex_abstimed_wait_common64 at /usr/bin/../lib/libpthread.so.0 (unknown line)
pthread_cond_wait at /usr/bin/../lib/libpthread.so.0 (unknown line)
uv_cond_wait at /workspace/srcdir/libuv/src/unix/thread.c:847
jl_task_get_next at /buildworker/worker/package_linux64/build/src/partr.c:517
poptask at ./task.jl:827
wait at ./task.jl:836
wait at ./condition.jl:123
wait at ./process.jl:627
success at ./process.jl:489
jfptr_success_38103.clone_1 at /usr/lib/julia/sys.so (unknown line)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
#run#701 at ./process.jl:446
run at ./process.jl:444 [inlined]
compress_data at /mnt/ssd-data/Experiments/02-2022/simple-photon-model/code/gen_paramsweep_segfault.jl:19
compressed_size_bytes at /mnt/ssd-data/Experiments/02-2022/simple-photon-model/code/gen_paramsweep_segfault.jl:32
unknown function (ip: 0x2f743451b6a2)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
macro expansion at /mnt/ssd-data/Experiments/02-2022/simple-photon-model/code/gen_paramsweep_segfault.jl:86 [inlined]
#26#threadsfor_fun at ./threadingconstructs.jl:85
#26#threadsfor_fun at ./threadingconstructs.jl:52
unknown function (ip: 0x2f743451468f)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:877
Allocations: 7565182 (Pool: 7562324; Big: 2858); GC: 7
[FATAL /home/chipbuster/tmp/rr/rr/src/Task.cc:833:enter_syscall() errno: ESRCH]
 (task 79279 (rec:79279) at time 64560)
 -> Assertion `!ptrace_event()' failed to hold.
Tail of trace dump:
{
  real_time:2259.130080 global_time:64540, event:`SYSCALL: rt_sigaction' (state:EXITING_SYSCALL) tid:98677, ticks:992
rax:0x0 rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x258e1a7496f0 rsi:0x258e1a749650 rdi:0x1d rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0x258e1a749840 r9:0x0 r10:0x8 r11:0x246 r12:0x7b3f0e2229a0 r13:0x258e1a749d90 r14:0x0 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x5a4d3a3a1640 gs_base:0x0
  { tid:98677, addr:0x258e1a7496f0, length:0x20 }
}
{
  real_time:2259.130100 global_time:64541, event:`SYSCALL: rt_sigaction' (state:ENTERING_SYSCALL) tid:98677, ticks:1018
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x258e1a7496f0 rsi:0x258e1a749650 rdi:0x1e rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0x258e1a749840 r9:0x0 r10:0x8 r11:0x246 r12:0x7b3f0e2229a0 r13:0x258e1a749d90 r14:0x0 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x5a4d3a3a1640 gs_base:0x0
}
{
  real_time:2259.130111 global_time:64542, event:`SYSCALL: rt_sigaction' (state:EXITING_SYSCALL) tid:98677, ticks:1018
rax:0x0 rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x258e1a7496f0 rsi:0x258e1a749650 rdi:0x1e rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0x258e1a749840 r9:0x0 r10:0x8 r11:0x246 r12:0x7b3f0e2229a0 r13:0x258e1a749d90 r14:0x0 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x5a4d3a3a1640 gs_base:0x0
  { tid:98677, addr:0x258e1a7496f0, length:0x20 }
}
{
  real_time:2259.130131 global_time:64543, event:`SYSCALL: rt_sigaction' (state:ENTERING_SYSCALL) tid:98677, ticks:1044
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x258e1a7496f0 rsi:0x258e1a749650 rdi:0x1f rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0x258e1a749840 r9:0x0 r10:0x8 r11:0x246 r12:0x7b3f0e2229a0 r13:0x258e1a749d90 r14:0x0 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x5a4d3a3a1640 gs_base:0x0
}
{
  real_time:2259.130140 global_time:64544, event:`SYSCALL: rt_sigaction' (state:EXITING_SYSCALL) tid:98677, ticks:1044
rax:0x0 rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x258e1a7496f0 rsi:0x258e1a749650 rdi:0x1f rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0x258e1a749840 r9:0x0 r10:0x8 r11:0x246 r12:0x7b3f0e2229a0 r13:0x258e1a749d90 r14:0x0 r15:0x0 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0xd fs_base:0x5a4d3a3a1640 gs_base:0x0
  { tid:98677, addr:0x258e1a7496f0, length:0x20 }
}
{
  real_time:2259.130161 global_time:64545, event:`SYSCALLBUF_FLUSH' tid:98677, ticks:1186
  { syscall:'rt_sigprocmask', ret:0x0, size:0x18 }
}
{
  real_time:2259.130175 global_time:64546, event:`SYSCALL: execve' (state:ENTERING_SYSCALL) tid:98677, ticks:1186
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c222620 r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c2225e8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0
}
{
  real_time:2259.130177 global_time:64547, event:`SYSCALLBUF_RESET' tid:98677, ticks:1186
}
{
  real_time:2259.130211 global_time:64548, event:`SYSCALL: execve' (state:EXITING_SYSCALL) tid:98677, ticks:1186
rax:0xfffffffffffffffe rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c222620 r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c2225e8 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x0 st7:0x0 ymm0:0x726574737562706968632f656d6f682f ymm1:0x73726174732f74706d6f72706c6c6568 ymm2:0x0 ymm3:0x4e4f49535345530065646f632f6c6564 ymm4:0x43492e2f706d742f403a63696863726f ymm5:0xffff0000000000000000 ymm6:0x43492e2f53455300403a63632f636564 ymm7:0x0 ymm8:0x6574737562706968632f656d6f682f3a ymm9:0x0 ymm10:0xffffffffffffffffffffffffffffffff ymm11:0xffffffffffffffffffffffffffffffff ymm12:0xffffffffffffffffffffffffffffffff ymm13:0xffffffffffffffffffffffffffffffff ymm14:0x0 ymm15:0xf46e13817ba4291b733353fcd73bdee0
}
{
  real_time:2259.130243 global_time:64549, event:`SYSCALL: execve' (state:ENTERING_SYSCALL) tid:98677, ticks:1213
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c22265b r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c222621 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0
}
{
  real_time:2259.130271 global_time:64550, event:`SYSCALL: execve' (state:EXITING_SYSCALL) tid:98677, ticks:1213
rax:0xfffffffffffffffe rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c22265b r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c222621 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x0 st7:0x0 ymm0:0x726574737562706968632f656d6f682f ymm1:0x696873726174732f74706d6f72706c6c ymm2:0x0 ymm3:0x4e4f49535345530065646f632f6c6564 ymm4:0x43492e2f706d742f403a63696863726f ymm5:0xffff0000000000000000 ymm6:0x43492e2f53455300403a63632f636564 ymm7:0x0 ymm8:0x7261742f70696873726174732f74706d ymm9:0x0 ymm10:0xffffffffffffffffffffffffffffffff ymm11:0xffffffffffffffffffffffffffffffff ymm12:0xffffffffffffffffffffffffffffffff ymm13:0xffffffffffffffffffffffffffffffff ymm14:0x0 ymm15:0xf46e13817ba4291b733353fcd73bdee0
}
{
  real_time:2259.130303 global_time:64551, event:`SYSCALL: execve' (state:ENTERING_SYSCALL) tid:98677, ticks:1239
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c22266d r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c22265c rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0
}
{
  real_time:2259.130328 global_time:64552, event:`SYSCALL: execve' (state:EXITING_SYSCALL) tid:98677, ticks:1239
rax:0xfffffffffffffffe rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c22266d r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c22265c rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x0 st7:0x0 ymm0:0x69622f77657262656d6f682f74706f2f ymm1:0x6e69622f77657262656d6f682f74706f ymm2:0x0 ymm3:0x4e4f49535345530065646f632f6c6564 ymm4:0x43492e2f706d742f403a63696863726f ymm5:0xffff0000000000000000 ymm6:0x43492e2f53455300403a63632f636564 ymm7:0x0 ymm8:0x69622f77657262656d6f682f74706f2f ymm9:0x0 ymm10:0xffffffffffffffffffffffffffffffff ymm11:0xffffffffffffffffffffffffffffffff ymm12:0xffffffffffffffffffffffffffffffff ymm13:0xffffffffffffffffffffffffffffffff ymm14:0x0 ymm15:0xf46e13817ba4291b733353fcd73bdee0
}
{
  real_time:2259.130358 global_time:64553, event:`SYSCALL: execve' (state:ENTERING_SYSCALL) tid:98677, ticks:1266
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c222689 r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c22266e rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0
}
{
  real_time:2259.130383 global_time:64554, event:`SYSCALL: execve' (state:EXITING_SYSCALL) tid:98677, ticks:1266
rax:0xfffffffffffffffe rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c222689 r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c22266e rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x0 st7:0x0 ymm0:0x726574737562706968632f656d6f682f ymm1:0x6e69622f6f677261632e2f7265747375 ymm2:0x0 ymm3:0x4e4f49535345530065646f632f6c6564 ymm4:0x43492e2f706d742f403a63696863726f ymm5:0xffff0000000000000000 ymm6:0x43492e2f53455300403a63632f636564 ymm7:0x0 ymm8:0x2f656d6f682f3a6e69622f6f67726163 ymm9:0x0 ymm10:0xffffffffffffffffffffffffffffffff ymm11:0xffffffffffffffffffffffffffffffff ymm12:0xffffffffffffffffffffffffffffffff ymm13:0xffffffffffffffffffffffffffffffff ymm14:0x0 ymm15:0xf46e13817ba4291b733353fcd73bdee0
}
{
  real_time:2259.130426 global_time:64555, event:`SYSCALL: execve' (state:ENTERING_SYSCALL) tid:98677, ticks:1292
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c2226a5 r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c22268a rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0
}
{
  real_time:2259.130453 global_time:64556, event:`SYSCALL: execve' (state:EXITING_SYSCALL) tid:98677, ticks:1292
rax:0xfffffffffffffffe rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c2226a5 r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c22268a rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x0 st7:0x0 ymm0:0x726574737562706968632f656d6f682f ymm1:0x6e69622f6c61636f6c2e2f7265747375 ymm2:0x0 ymm3:0x4e4f49535345530065646f632f6c6564 ymm4:0x43492e2f706d742f403a63696863726f ymm5:0xffff0000000000000000 ymm6:0x43492e2f53455300403a63632f636564 ymm7:0x0 ymm8:0x726574737562706968632f656d6f682f ymm9:0x0 ymm10:0xffffffffffffffffffffffffffffffff ymm11:0xffffffffffffffffffffffffffffffff ymm12:0xffffffffffffffffffffffffffffffff ymm13:0xffffffffffffffffffffffffffffffff ymm14:0x0 ymm15:0xf46e13817ba4291b733353fcd73bdee0
}
{
  real_time:2259.130484 global_time:64557, event:`SYSCALL: execve' (state:ENTERING_SYSCALL) tid:98677, ticks:1319
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c2226b4 r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c2226a6 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0
}
{
  real_time:2259.130508 global_time:64558, event:`SYSCALL: execve' (state:EXITING_SYSCALL) tid:98677, ticks:1319
rax:0xfffffffffffffffe rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c2226b4 r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c2226a6 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0 st0:0x0 st1:0x0 st2:0x0 st3:0x0 st4:0x0 st5:0x0 st6:0x0 st7:0x0 ymm0:0x3a3a3a3a3a3a3a3a3a3a3a3a3a3a3a3a ymm1:0xff0000000000 ymm2:0x0 ymm3:0x4e4f49535345530065646f632f6c6564 ymm4:0x43492e2f706d742f403a63696863726f ymm5:0xffff0000000000000000 ymm6:0x43492e2f53455300403a63632f636564 ymm7:0x0 ymm8:0x6c61636f6c2f7273752f3a6e69622f6c ymm9:0x0 ymm10:0xffffffffffffffffffffffffffffffff ymm11:0xffffffffffffffffffffffffffffffff ymm12:0xffffffffffffffffffffffffffffffff ymm13:0xffffffffffffffffffffffffffffffff ymm14:0x0 ymm15:0xf46e13817ba4291b733353fcd73bdee0
}
{
  real_time:2259.130538 global_time:64559, event:`SYSCALL: execve' (state:ENTERING_SYSCALL) tid:98677, ticks:1346
rax:0xffffffffffffffda rbx:0x7efd4c6acfa0 rcx:0xffffffffffffffff rdx:0x1b71760 rsi:0x54f90baf40b0 rdi:0x258e1a7496d0 rbp:0x7efd4c6acfa0 rsp:0x7efd4c6acd60 r8:0xfff r9:0x7ffd0c2226bd r10:0x8 r11:0x246 r12:0x54f90baf40b0 r13:0x1b71760 r14:0x54f90b6d02f8 r15:0x7ffd0c2226b5 rip:0x70000002 eflags:0x246 cs:0x33 ss:0x2b ds:0x0 es:0x0 fs:0x0 gs:0x0 orig_rax:0x3b fs_base:0x5a4d3a3a1640 gs_base:0x0
}
=== Start rr backtrace:
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr9GdbServer15emergency_debugEPNS_4TaskE+0x93b)[0x55f76c42f1cb]
/home/chipbuster/tmp/rr/rr/build/bin/rr(+0xb857f)[0x55f76c43f57f]
/home/chipbuster/tmp/rr/rr/build/bin/rr(+0xb904b)[0x55f76c44004b]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr4Task13enter_syscallEv+0x315)[0x55f76c51b9a5]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr18AutoRemoteSyscalls12syscall_baseEiRNS_9RegistersE+0x18a)[0x55f76c3e55ca]
/home/chipbuster/tmp/rr/rr/build/bin/rr(+0x1da861)[0x55f76c561861]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr4Task17unmap_buffers_forERNS_18AutoRemoteSyscallsEPS0_NS_10remote_ptrI14syscallbuf_hdrEE+0x4e)[0x55f76c51654e]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr4Task9post_execERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_+0x6b2)[0x55f76c51c952]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr10RecordTask9post_execEv+0xc8)[0x55f76c4b27c8]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr13RecordSession19handle_ptrace_eventEPPNS_10RecordTaskEPNS0_9StepStateEPNS0_12RecordResultEPb+0x285)[0x55f76c463bf5]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr13RecordSession11record_stepEv+0x477)[0x55f76c46fb97]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x871)[0x55f76c45f7d1]
/home/chipbuster/tmp/rr/rr/build/bin/rr(main+0x1c8)[0x55f76c3d0e98]
/usr/lib/libc.so.6(__libc_start_main+0xd5)[0x7f71bc7e6b25]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_start+0x2e)[0x55f76c3d104e]
=== End rr backtrace
Launch gdb with
  gdb '-l' '10000' '-ex' 'set sysroot /' '-ex' 'target extended-remote 127.0.0.1:13743' /usr/bin/julia

ls
[FATAL /home/chipbuster/tmp/rr/rr/src/log.cc:430:emergency_debug()] Can't resume execution from invalid state
=== Start rr backtrace:
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr13dump_rr_stackEv+0x41)[0x55f76c549691]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr15notifying_abortEv+0x49)[0x55f76c54de19]
/home/chipbuster/tmp/rr/rr/build/bin/rr(+0x1da570)[0x55f76c561570]
/home/chipbuster/tmp/rr/rr/build/bin/rr(+0xb85bd)[0x55f76c43f5bd]
/home/chipbuster/tmp/rr/rr/build/bin/rr(+0xb904b)[0x55f76c44004b]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr4Task13enter_syscallEv+0x315)[0x55f76c51b9a5]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr18AutoRemoteSyscalls12syscall_baseEiRNS_9RegistersE+0x18a)[0x55f76c3e55ca]
/home/chipbuster/tmp/rr/rr/build/bin/rr(+0x1da861)[0x55f76c561861]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr4Task17unmap_buffers_forERNS_18AutoRemoteSyscallsEPS0_NS_10remote_ptrI14syscallbuf_hdrEE+0x4e)[0x55f76c51654e]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr4Task9post_execERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEES8_+0x6b2)[0x55f76c51c952]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr10RecordTask9post_execEv+0xc8)[0x55f76c4b27c8]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr13RecordSession19handle_ptrace_eventEPPNS_10RecordTaskEPNS0_9StepStateEPNS0_12RecordResultEPb+0x285)[0x55f76c463bf5]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr13RecordSession11record_stepEv+0x477)[0x55f76c46fb97]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_ZN2rr13RecordCommand3runERSt6vectorINSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEESaIS7_EE+0x871)[0x55f76c45f7d1]
/home/chipbuster/tmp/rr/rr/build/bin/rr(main+0x1c8)[0x55f76c3d0e98]
/usr/lib/libc.so.6(__libc_start_main+0xd5)[0x7f71bc7e6b25]
/home/chipbuster/tmp/rr/rr/build/bin/rr(_start+0x2e)[0x55f76c3d104e]
=== End rr backtrace
Exception: /home/chipbuster/tmp/rr/rr/build/bin/rr killed by signal aborted (core dumped)

@vtjnash
Copy link
Sponsor Member

vtjnash commented Mar 3, 2022

v1.7.1 is a very old release to be relying on threading. You will need to try again with a newer copy of julia (preferably nightly) and possibly try updating rr too

@vtjnash vtjnash closed this as completed Mar 3, 2022
@chipbuster
Copy link
Contributor Author

Can confirm that this appears to have been solved by the upgrade to 1.7.2.

@chipbuster
Copy link
Contributor Author

I spoke too soon. The segfault takes longer to occur than it used to, but still occurs, both on 1.7.2 and nightly.

Should I open a new issue?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
heisenbug This bug occurs unpredictably multithreading Base.Threads and related functionality
Projects
None yet
Development

No branches or pull requests

4 participants