Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GC error (probable corruption) #43567

Open
DatName opened this issue Dec 27, 2021 · 6 comments
Open

GC error (probable corruption) #43567

DatName opened this issue Dec 27, 2021 · 6 comments

Comments

@DatName
Copy link

DatName commented Dec 27, 2021

I have a relatively big multithreaded application which runs fine on 1.6.4, but segfaults on 1.7 and 1.7.1.
I will try to create a minimal example which reproduces this segfault, but for now I have console log only:

GC error (probable corruption) :
Allocations: 480045702 (Pool: 479950106; Big: 95596); GC: 244
Array{
!!! ERROR in jl_ -- ABORTING !!!
0x7f4734343100: Queued root: 0x7f46a8784010 :: 0x7f46d8f494b0 (bits: 3)
        of type 
!!! ERROR in jl_ -- ABORTING !!!
0x7f4734343118: Queued root: 0x7f46a861c010 :: 0x7f46d8f494b0 (bits: 3)
        of type 
!!! ERROR in jl_ -- ABORTING !!!
0x7f4734343130: Queued root: 0x7f46dbafa650 :: 0x7f46d7de01a0 (bits: 3)
        of type 
!!! ERROR in jl_ -- ABORTING !!!
0x7f4734343148: Queued root: 0x7f4677dd8ad0 :: 0x7f46d7de01a0 (bits: 3)
        of type 

....

!!! ERROR in jl_ -- ABORTING !!!
0x7f4734344660: Queued root: 0x7f46adf04e70 :: 0x7f476597cc40 (bits: 3)
        of type 
!!! ERROR in jl_ -- ABORTING !!!
0x7f4734344678:  r-- Stack frame 0x7f46c3676240 -- 1 of 6 (direct)
0x7f47343446a0:   `- Stack frame 0x7f4652fcf060 -- 124 of 298 (direct)


signal (6): Aborted
in expression starting at none:0
pthread_kill at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
raise at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
abort at /lib/x86_64-linux-gnu/libc.so.6 (unknown line)
gc_assert_datatype_fail at /buildworker/worker/package_linux64/build/src/gc.c:1657
gc_mark_loop at /buildworker/worker/package_linux64/build/src/gc.c:2711
_jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3039
jl_gc_collect at /buildworker/worker/package_linux64/build/src/gc.c:3248
maybe_collect at /buildworker/worker/package_linux64/build/src/gc.c:882 [inlined]
jl_gc_pool_alloc at /buildworker/worker/package_linux64/build/src/gc.c:1209
export_event at /path/src/events/process_events.jl:99
process_event at /path/src/events/process_events.jl:12
guarded_process_event at /path/src/server/state/start.jl:387
unknown function (ip: 0x7f46a42d1cd2)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
consume_output_events at /path/src/server/state/start.jl:381
unknown function (ip: 0x7f46c5e8facd)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
macro expansion at /path/src/task_utils/generic_handler.jl:64 [inlined]
#35 at /home/.julia/packages/ThreadPools/hwwUU/src/macros.jl:261
unknown function (ip: 0x7f46c5e8d2df)
_jl_invoke at /buildworker/worker/package_linux64/build/src/gf.c:2247 [inlined]
jl_apply_generic at /buildworker/worker/package_linux64/build/src/gf.c:2429
jl_apply at /buildworker/worker/package_linux64/build/src/julia.h:1788 [inlined]
start_task at /buildworker/worker/package_linux64/build/src/task.c:877
Allocations: 480045702 (Pool: 479950106; Big: 95596); GC: 244
Aborted (core dumped)

julia> versioninfo()
Julia Version 1.7.1
Commit ac5cc99908 (2021-12-22 19:35 UTC)
Platform Info:
  OS: Linux (x86_64-pc-linux-gnu)
  CPU: Intel(R) Core(TM) i7-10870H CPU @ 2.20GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, skylake)
@Keno
Copy link
Member

Keno commented Dec 27, 2021

Unfortunately this will not be debuggable without reproducer or rr trace.

@DatName
Copy link
Author

DatName commented Dec 27, 2021

I see.
When I run it with

export JULIA_NUM_THREADS=12
./julia --bug-report=rr-local

the program just stalls on a non-blocking call:

julia> start!(ctx)
[ Info: Listening on: 0.0.0.0:26000


^CERROR: InterruptException:
Stacktrace:
  [1] poptask(W::Base.InvasiveLinkedListSynchronized{Task})
    @ Base ./task.jl:827
  [2] wait()
    @ Base ./task.jl:836
  [3] wait(c::Base.GenericCondition{Base.Threads.SpinLock})
    @ Base ./condition.jl:123
  [4] wait(x::Base.Process)
    @ Base ./process.jl:627
  [5] success
    @ ./process.jl:489 [inlined]
  [6] run(::Cmd; wait::Bool)
    @ Base ./process.jl:446
  [7] run
    @ ./process.jl:444 [inlined]
  [8] (::BugReporting.var"#7#8"{Nothing, Tuple{Cmd, Vector{String}}})(rr_path::String)
    @ BugReporting ~/.julia/packages/BugReporting/7auqP/src/BugReporting.jl:132
  [9] (::JLLWrappers.var"#2#3"{BugReporting.var"#7#8"{Nothing, Tuple{Cmd, Vector{String}}}, String})()
    @ JLLWrappers ~/.julia/packages/JLLWrappers/bkwIo/src/runtime.jl:49
 [10] withenv(::JLLWrappers.var"#2#3"{BugReporting.var"#7#8"{Nothing, Tuple{Cmd, Vector{String}}}, String}, ::Pair{String, String}, ::Vararg{Pair{String, String}})
    @ Base ./env.jl:172
 [11] withenv_executable_wrapper(f::Function, executable_path::String, PATH::String, LIBPATH::String, adjust_PATH::Bool, adjust_LIBPATH::Bool)
    @ JLLWrappers ~/.julia/packages/JLLWrappers/bkwIo/src/runtime.jl:48
 [12] #invokelatest#2
    @ ./essentials.jl:716 [inlined]
 [13] invokelatest
    @ ./essentials.jl:714 [inlined]
 [14] #rr#7
    @ ~/.julia/packages/JLLWrappers/bkwIo/src/products/executable_generators.jl:7 [inlined]
 [15] rr
    @ ~/.julia/packages/JLLWrappers/bkwIo/src/products/executable_generators.jl:7 [inlined]
 [16] #rr_record#6
    @ ~/.julia/packages/BugReporting/7auqP/src/BugReporting.jl:122 [inlined]
 [17] rr_record
    @ ~/.julia/packages/BugReporting/7auqP/src/BugReporting.jl:119 [inlined]
 [18] make_interactive_report(report_type::String, ARGS::Vector{String})
    @ BugReporting ~/.julia/packages/BugReporting/7auqP/src/BugReporting.jl:208
 [19] #invokelatest#2
    @ ./essentials.jl:716 [inlined]
 [20] invokelatest
    @ ./essentials.jl:714 [inlined]
 [21] report_bug(kind::String)
    @ InteractiveUtils ~/code/julia/julia-1.7.1/share/julia/stdlib/v1.7/InteractiveUtils/src/InteractiveUtils.jl:397
 [22] exec_options(opts::Base.JLOptions)
    @ Base ./client.jl:233
 [23] _start()
    @ Base ./client.jl:495

Could this be by any chance related?

@Keno
Copy link
Member

Keno commented Dec 27, 2021

Could this be by any chance related?

Perhaps, but the backtrace is of the outside process not where it's actually blocked. Also rr can make things slow, so you may just need to let it run for a while.

@JeffBezanson
Copy link
Sponsor Member

You can also try running with --check-bounds=yes.

@aeisman
Copy link

aeisman commented Apr 29, 2022

I've had a similar problem with 1.7.2. Downgraded to 1.6.6 LTS and it resolved so does appear to be Julia version specific.

@DilumAluthge
Copy link
Member

DilumAluthge commented Apr 29, 2022

I talked with Aaron out-of-band, and here are some more details on the code he ran:

He has a function gwas_extract_snps defined as such:

function gwas_extract_snps(gwas_fh,gwas_keep_fh,keep_snp_set,delim)
    # extract keep_snp_set of snps from a gwas file
    gwas_io = GZip.open(gwas_fh)
    gwas_keep_io = open(gwas_keep_fh,"w")
    i = 1
    for line in eachline(gwas_io)
        snp = split(line,delim)[2]
        if in(snp,keep_snp_set)
            write(gwas_keep_io,line*"\n")
        end
        i += 1
        if (i % 1000000) == 0
            #println(i)
        end
    end
    close(gwas_io)
    close(gwas_keep_io)
end

And then he has a Distributed for loop of the form:

Distributed.@distributed vcat for met in met_arr_keep
    #download file from google bucket
    #run gwas_extract_snps()
    #delete original file
end

This table shows whether or not he gets the segfault. ✅ means no segfault. ❌ means he encountered the segfault.

Julia version @distributed -p Result Notes
1.6.6 yes 2 Command-line
1.7.2 yes 2 Command-line
1.7.2 no 1 REPL

His data cannot be shared publicly, unfortunately, so we don't have an MWE.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants