Freeze in GC for multithreaded code #15620

kpamnany · 2016-03-25T04:38:33Z

This is on Linux using commit d72842a, which is 13 days old.

Here's the backtrace:

[inline] at /global/u1/k/kpamnany/julia/src/gc.c:364
jl_wait_for_gc at /global/u1/k/kpamnany/julia/src/gc.c:2327
[inline] at /global/u1/k/kpamnany/julia/src/gc.c:1181
__pool_alloc at /global/u1/k/kpamnany/julia/src/gc.c:2445
unknown function (ip: 0x2aacb3fdc038)
[inline] at /global/u1/k/kpamnany/julia/src/julia_internal.h:69
jl_call_method_internal at /global/u1/k/kpamnany/julia/src/gf.c:1848
[inline] at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/SensitiveFloats.jl:63
ElboIntermediateVariables at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/ElboDeriv.jl:132
[inline] at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/ElboDeriv.jl:102
tile_predicted_image at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/ElboDeriv.jl:896
unknown function (ip: 0x2aacd78bd3ee)
[inline] at ./boot.jl:331
trim_source_tiles at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/ModelInit.jl:676
unknown function (ip: 0x2aacd78c13e4)
unknown function (ip: 0x2aacd78c1609)
[inline] at /global/u1/k/kpamnany/julia/src/julia_internal.h:69
jl_call_method_internal at /global/u1/k/kpamnany/julia/src/gf.c:1848
[inline] at ./boot.jl:331
#833###_threadsfor#8141 at ./threadingconsutructs.jl:43
unknown function (ip: 0x2aacd78a1750)
[inline] at /global/u1/k/kpamnany/julia/src/julia_internal.h:69
jl_call_method_internal at /global/u1/k/kpamnany/julia/src/gf.c:1848
[inline] at /global/u1/k/kpamnany/julia/src/julia.h:1381
jl_eh_restore_state at /global/u1/k/kpamnany/julia/src/threading.c:137
ti_threadfun at /global/u1/k/kpamnany/julia/src/threading.c:234
uv__thread_start at /global/u1/k/kpamnany/julia/deps/srccache/libuv/src/uv-common.c:270

I see a FIXME in __pool_alloc() which is at gc.c:1185 on master; not sure if this is the issue.

Running inside gdb, I consistently get a segfault:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aacb1fbc700 (LWP 35284)]
0x00002aaaaae5c772 in jl_gc_safepoint () at ./julia_threads.h:265
265         size_t v = *jl_gc_signal_page;
(gdb) bt
#0  0x00002aaaaae5c772 in jl_gc_safepoint () at ./julia_threads.h:265
#1  jl_generate_fptr (li=li@entry=0x2aaabd8e4170) at /global/u1/k/kpamnany/julia/src/codegen.cpp:1072
#2  0x00002aaaaadefe12 in jl_call_unspecialized (nargs=3, args=0x2aacb1e82ed8, meth=<optimized out>, sparam_vals=0x2aaab0734010) at /global/u1/k/kpamnany/julia/src/gf.c:859
#3  jl_apply_generic (args=0x2aacb1e82ed8, nargs=<optimized out>) at /global/u1/k/kpamnany/julia/src/gf.c:1845
#4  0x00002aacd2c045c3 in #833###_threadsfor#8298 () at strings/io.jl:73
#5  julia_#833###_threadsfor#8298_24586 () at threadingconstructs.jl:43
#6  0x00002aacd2c05750 in jlcall_#833###_threadsfor#8298_24586 ()
#7  0x00002aaaaadefdbf in jl_call_method_internal (nargs=1, args=0x2aaab073c2b8, meth=<optimized out>) at /global/u1/k/kpamnany/julia/src/julia_internal.h:69
#8  jl_apply_generic (args=args@entry=0x2aaab073c2b8, nargs=<optimized out>) at /global/u1/k/kpamnany/julia/src/gf.c:1848
#9  0x00002aaaaae2d6dd in jl_apply (nargs=<optimized out>, args=0x2aaab073c2b8) at /global/u1/k/kpamnany/julia/src/julia.h:1263
#10 ti_run_fun (args=0x2aaab073c2b0) at /global/u1/k/kpamnany/julia/src/threading.c:138
#11 0x00002aaaaae2da65 in ti_threadfun (arg=arg@entry=0x2aacb4000dd0) at /global/u1/k/kpamnany/julia/src/threading.c:233
#12 0x00002aaaaaead527 in uv__thread_start (arg=<optimized out>) at /global/u1/k/kpamnany/julia/deps/srccache/libuv/src/uv-common.c:267

@JeffBezanson, @vtjnash, @yuyichao.

The text was updated successfully, but these errors were encountered:

vtjnash · 2016-03-25T05:02:25Z

the segfault is expected (it's a safepoint trigger that forces that thread into gc). it seems at least one thread may not have reached a call to jl_gc_collect or a safepoint trigger (there aren't enough of them currently)?

carnaval · 2016-03-25T05:12:05Z

I don't think we ever did the "every backedge gets a safepoint" thing ? It's gonna be a pain to have that not prevent vectorization (and inserting them after opts is unsafe since you might be in the middle of a critical section)

kpamnany · 2016-03-25T05:27:57Z

So I can't look at state in a debugger then? Is there a workaround so I can figure out where the laggard thread is stuck?

carnaval · 2016-03-25T05:34:13Z

to avoid the segfault you can ask gdb to ignore it and pass it back to our signal handler with something like handle 11 nostop noprint pass

yuyichao · 2016-03-25T12:27:57Z

The signal handling is mentioned in the debugging doc. The FIXME should be irrelevant.

The freeze can happen if you have a infinite wait loop in C or julia without any allocation.

yuyichao · 2016-03-25T17:28:56Z

Also, is there any code to reproduce this?

kpamnany · 2016-03-25T17:40:21Z

I haven't been able to isolate it enough to find a code snippet; Celeste is pretty big.

If I understand this correctly, every thread must reach a safepoint before GC can run. So if thread 1 is busy in some tight loop, perhaps waiting for thread 2 to do something, but thread 2 is at a safepoint in the GC, the application will freeze like this?

carnaval · 2016-03-25T18:00:10Z

yep. a workaround would be to insert a call to the runtime from time to time in the tight loop. The proper solution is to have codegen generate safepoints in every loop.

yuyichao · 2016-03-25T18:02:10Z

I haven't been able to isolate it enough to find a code snippet; Celeste is pretty big.

Bigger ones are fine too. Assuming you are allowed to post it of course...

If I understand this correctly, every thread must reach a safepoint before GC can run. So if thread 1 is busy in some tight loop, perhaps waiting for thread 2 to do something, but thread 2 is at a safepoint in the GC, the application will freeze like this?

Correct. In order to fix this, we need GC safepoint (and transition) support in codegen. The runtime part of this is almost done (with a missing sync at the beginning of the GC to force a write barrier on other threads). The codegen part is not there. I haven't got a chance to go through the current codegen and figure out where to add the necessary pieces yet.

yuyichao · 2016-03-25T18:07:45Z

See

julia/base/locks.jl

Line 34 in 6b9023b

# Temporary solution before we have gc transition support in codegen.

for the temporary hack used in base. (note that once we have gc transition support this code will lead to undefined behavior, I'm thinking of just removing/renaming these functions at that time). As a slightly better workaround, I believe we can also put this around all the atomic operations since a pure julia dead loop can only be waken up by another thread if it is synchronize using atomics. That will have some overhead (one volitale load per atomic ops) and won't cover synchronizations in C of course.

kpamnany · 2016-03-25T19:08:05Z

I think I see. So codegen will insert safepoints in generated code? But this won't help if a thread is blocked in a C library or in a system call, right?

carnaval · 2016-03-25T19:13:35Z

C calls to random libraries and system calls will be safe in that gc can run concurrently with them, but I don't think this has been implemented yet either (safe regions)

yuyichao · 2016-03-25T19:14:26Z

But this won't help if a thread is blocked in a C library or in a system call, right?

Not with safepoint only but it will with GC transitions. See the system mutex impl for an idea of what the code would look like before optimization when we have GC transition support. See my summary in the original PR for the plan forward.

kpamnany · 2016-03-25T19:54:55Z

Thanks for the explanations guys.

Maybe this is a dumb question, but have you considered the opposite approach -- entering and leaving unsafe regions explicitly? Then the default thread state would be safe, and when safe, the thread could be signaled for GC synchronization. This would eliminate the need to wait for threads to reach safe points, but would require waiting for them to leave unsafe regions. Would there be too many unsafe regions?

yuyichao · 2016-03-25T19:59:00Z

That is exactly the plan

kpamnany · 2016-03-25T20:12:15Z

Okay then!

I'm trying to isolate this further and will update or close this when I understand the freeze better.

yuyichao · 2016-03-25T21:29:20Z

Back to my desk...

Would there be too many unsafe regions?

So the plan is to do codegen in exactly this way. (gc safe by default, and mark critical unsafe region). Since each transition need a store (unless we have good unwinding and stack map etc) we would like to minimize the transition we actually emit in a post-codegen optimization by running more code in unsafe region. We just need to make sure that those additional code in unsafe region doesn't have anything that has to be run in save region (loops, julia-unaware ccall for example).

I'll need to write some note about the plan in more detail although currently I don't feel like advertising it too much before I actually sit down and implement it....

kpamnany · 2016-04-02T20:52:41Z

Please confirm: if I'm calling out to a C library from Julia, I should insert gc_state = ccall(:jl_gc_safe_enter, Int8, ()) before the call, and ccall(:jl_gc_safe_leave, Void, (Int8,), gc_state) when it returns? If the C code is touching Julia managed memory, or if it is calling back into Julia, then this would be wrong. Correct?

yuyichao · 2016-04-02T21:15:37Z

Please confirm: if I'm calling out to a C library from Julia, I should insert gc_state = ccall(:jl_gc_safe_enter, Int8, ()) before the call, and ccall(:jl_gc_safe_leave, Void, (Int8,), gc_state) when it returns?

Correct. The codegen support part is basically to insert this automatically. (and merge them).

If the C code is touching Julia managed memory, or if it is calling back into Julia, then this would be wrong. Correct?

You can read or write isbits typed slots (Vector{Int8}, type A a::Float64 end) or read (but not write, in which case you need to trigger the write barrier) any managed memory.

Please also note that the ccall conversion should not have any allocation either.

kpamnany · 2016-08-04T00:28:27Z

@yuyichao: can you expand on this:

You can read or write isbits typed slots (Vector{Int8}, type A a::Float64 end) or read (but not write, in which case you need to trigger the write barrier) any managed memory.

We seem to be seeing this freeze in some other code that is calling out to FFTW from multiple threads. FFTW writes into Julia managed memory. So can you explain this write barrier, or point me at any explanation please?

yuyichao · 2016-08-04T00:49:02Z

It's a little hard to say without actually seeing the code. A few comment I can make now,

If one of the thread is frozen waiting for a lock (or in general something that cannot make progress unless another thread does) in C library, I guess there should be at least another threading holding the same lock. Assuming the C library itself is thread safe and doesn't have dead lock (IIRC fftw is only thread safe when executing the plan, are you creating plans from multiple threads?) the C library must be callling back to julia code (or running julia code with some locks in the C library held). Is this the case? Otherwise, the locking in the C library are lower level than the julia ones and while the GC has to wait for ccall to return, there shouldn't be dead lock.
Writing to managed memory is fine, writing to managed memory that contains heap references is not unless the library is already aware of julia (e.g. julia runtime itself or certain embedding applications) or the caller has made sure that it is safe to do so.

This is already required by the generational GC before threading (which is why calling memcpy on arrays with heap references is unsafe) so it's not really a new requirement. The write barrier is what the generational GC requires and I mentioned it just to say that any code that want to change object reference on the heap should already be julia aware. I don't think fftw overwrite any heap references in the managed memory so it should be able to run simultaneously with the GC and shouldn't need to worry about write barrier.

kpamnany · 2017-03-04T22:09:04Z

Haven't seen this in Celeste in a long while. Closing.

kpamnany added bug Indicates an unexpected problem or unintended behavior multithreading Base.Threads and related functionality labels Mar 25, 2016

kpamnany mentioned this issue Mar 25, 2016

Use Dtree for distributed parallelism jeff-regier/Celeste.jl#179

Closed

yuyichao added the needs more info Clarification or a reproducible example is required label Mar 25, 2016

yuyichao mentioned this issue Mar 29, 2016

Workaround possible GC dead loop #15677

Merged

kpamnany mentioned this issue Apr 2, 2016

{WIP} Drive Dtree with a thread jeff-regier/Celeste.jl#207

Merged

kpamnany mentioned this issue Apr 15, 2016

Towards enabling multi-threading on master #15743

Closed

vtjnash mentioned this issue Oct 16, 2016

IO in threads corrupts heap even when protected #18962

Closed

kpamnany closed this as completed Mar 4, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Freeze in GC for multithreaded code #15620

Freeze in GC for multithreaded code #15620

kpamnany commented Mar 25, 2016

vtjnash commented Mar 25, 2016

carnaval commented Mar 25, 2016

kpamnany commented Mar 25, 2016

carnaval commented Mar 25, 2016

yuyichao commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Mar 25, 2016

carnaval commented Mar 25, 2016

yuyichao commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Mar 25, 2016

carnaval commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Apr 2, 2016

yuyichao commented Apr 2, 2016

kpamnany commented Aug 4, 2016

yuyichao commented Aug 4, 2016

kpamnany commented Mar 4, 2017

Freeze in GC for multithreaded code #15620

Freeze in GC for multithreaded code #15620

Comments

kpamnany commented Mar 25, 2016

vtjnash commented Mar 25, 2016

carnaval commented Mar 25, 2016

kpamnany commented Mar 25, 2016

carnaval commented Mar 25, 2016

yuyichao commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Mar 25, 2016

carnaval commented Mar 25, 2016

yuyichao commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Mar 25, 2016

carnaval commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Mar 25, 2016

yuyichao commented Mar 25, 2016

kpamnany commented Apr 2, 2016

yuyichao commented Apr 2, 2016

kpamnany commented Aug 4, 2016

yuyichao commented Aug 4, 2016

kpamnany commented Mar 4, 2017