Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Freeze in GC for multithreaded code #15620

Closed
kpamnany opened this issue Mar 25, 2016 · 22 comments
Closed

Freeze in GC for multithreaded code #15620

kpamnany opened this issue Mar 25, 2016 · 22 comments
Labels
bug Indicates an unexpected problem or unintended behavior multithreading Base.Threads and related functionality needs more info Clarification or a reproducible example is required

Comments

@kpamnany
Copy link
Contributor

This is on Linux using commit d72842a, which is 13 days old.

Here's the backtrace:

[inline] at /global/u1/k/kpamnany/julia/src/gc.c:364
jl_wait_for_gc at /global/u1/k/kpamnany/julia/src/gc.c:2327
[inline] at /global/u1/k/kpamnany/julia/src/gc.c:1181
__pool_alloc at /global/u1/k/kpamnany/julia/src/gc.c:2445
unknown function (ip: 0x2aacb3fdc038)
[inline] at /global/u1/k/kpamnany/julia/src/julia_internal.h:69
jl_call_method_internal at /global/u1/k/kpamnany/julia/src/gf.c:1848
[inline] at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/SensitiveFloats.jl:63
ElboIntermediateVariables at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/ElboDeriv.jl:132
[inline] at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/ElboDeriv.jl:102
tile_predicted_image at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/ElboDeriv.jl:896
unknown function (ip: 0x2aacd78bd3ee)
[inline] at ./boot.jl:331
trim_source_tiles at /global/homes/k/kpamnany/.julia/v0.5/Celeste/src/ModelInit.jl:676
unknown function (ip: 0x2aacd78c13e4)
unknown function (ip: 0x2aacd78c1609)
[inline] at /global/u1/k/kpamnany/julia/src/julia_internal.h:69
jl_call_method_internal at /global/u1/k/kpamnany/julia/src/gf.c:1848
[inline] at ./boot.jl:331
#833###_threadsfor#8141 at ./threadingconsutructs.jl:43
unknown function (ip: 0x2aacd78a1750)
[inline] at /global/u1/k/kpamnany/julia/src/julia_internal.h:69
jl_call_method_internal at /global/u1/k/kpamnany/julia/src/gf.c:1848
[inline] at /global/u1/k/kpamnany/julia/src/julia.h:1381
jl_eh_restore_state at /global/u1/k/kpamnany/julia/src/threading.c:137
ti_threadfun at /global/u1/k/kpamnany/julia/src/threading.c:234
uv__thread_start at /global/u1/k/kpamnany/julia/deps/srccache/libuv/src/uv-common.c:270

I see a FIXME in __pool_alloc() which is at gc.c:1185 on master; not sure if this is the issue.

Running inside gdb, I consistently get a segfault:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x2aacb1fbc700 (LWP 35284)]
0x00002aaaaae5c772 in jl_gc_safepoint () at ./julia_threads.h:265
265         size_t v = *jl_gc_signal_page;
(gdb) bt
#0  0x00002aaaaae5c772 in jl_gc_safepoint () at ./julia_threads.h:265
#1  jl_generate_fptr (li=li@entry=0x2aaabd8e4170) at /global/u1/k/kpamnany/julia/src/codegen.cpp:1072
#2  0x00002aaaaadefe12 in jl_call_unspecialized (nargs=3, args=0x2aacb1e82ed8, meth=<optimized out>, sparam_vals=0x2aaab0734010) at /global/u1/k/kpamnany/julia/src/gf.c:859
#3  jl_apply_generic (args=0x2aacb1e82ed8, nargs=<optimized out>) at /global/u1/k/kpamnany/julia/src/gf.c:1845
#4  0x00002aacd2c045c3 in #833###_threadsfor#8298 () at strings/io.jl:73
#5  julia_#833###_threadsfor#8298_24586 () at threadingconstructs.jl:43
#6  0x00002aacd2c05750 in jlcall_#833###_threadsfor#8298_24586 ()
#7  0x00002aaaaadefdbf in jl_call_method_internal (nargs=1, args=0x2aaab073c2b8, meth=<optimized out>) at /global/u1/k/kpamnany/julia/src/julia_internal.h:69
#8  jl_apply_generic (args=args@entry=0x2aaab073c2b8, nargs=<optimized out>) at /global/u1/k/kpamnany/julia/src/gf.c:1848
#9  0x00002aaaaae2d6dd in jl_apply (nargs=<optimized out>, args=0x2aaab073c2b8) at /global/u1/k/kpamnany/julia/src/julia.h:1263
#10 ti_run_fun (args=0x2aaab073c2b0) at /global/u1/k/kpamnany/julia/src/threading.c:138
#11 0x00002aaaaae2da65 in ti_threadfun (arg=arg@entry=0x2aacb4000dd0) at /global/u1/k/kpamnany/julia/src/threading.c:233
#12 0x00002aaaaaead527 in uv__thread_start (arg=<optimized out>) at /global/u1/k/kpamnany/julia/deps/srccache/libuv/src/uv-common.c:267

@JeffBezanson, @vtjnash, @yuyichao.

@kpamnany kpamnany added bug Indicates an unexpected problem or unintended behavior multithreading Base.Threads and related functionality labels Mar 25, 2016
@vtjnash
Copy link
Member

vtjnash commented Mar 25, 2016

the segfault is expected (it's a safepoint trigger that forces that thread into gc). it seems at least one thread may not have reached a call to jl_gc_collect or a safepoint trigger (there aren't enough of them currently)?

@carnaval
Copy link
Contributor

I don't think we ever did the "every backedge gets a safepoint" thing ? It's gonna be a pain to have that not prevent vectorization (and inserting them after opts is unsafe since you might be in the middle of a critical section)

@kpamnany
Copy link
Contributor Author

So I can't look at state in a debugger then? Is there a workaround so I can figure out where the laggard thread is stuck?

@carnaval
Copy link
Contributor

to avoid the segfault you can ask gdb to ignore it and pass it back to our signal handler with something like handle 11 nostop noprint pass

@yuyichao
Copy link
Contributor

The signal handling is mentioned in the debugging doc. The FIXME should be irrelevant.

The freeze can happen if you have a infinite wait loop in C or julia without any allocation.

@yuyichao yuyichao added the needs more info Clarification or a reproducible example is required label Mar 25, 2016
@yuyichao
Copy link
Contributor

Also, is there any code to reproduce this?

@kpamnany
Copy link
Contributor Author

I haven't been able to isolate it enough to find a code snippet; Celeste is pretty big.

If I understand this correctly, every thread must reach a safepoint before GC can run. So if thread 1 is busy in some tight loop, perhaps waiting for thread 2 to do something, but thread 2 is at a safepoint in the GC, the application will freeze like this?

@carnaval
Copy link
Contributor

yep. a workaround would be to insert a call to the runtime from time to time in the tight loop. The proper solution is to have codegen generate safepoints in every loop.

@yuyichao
Copy link
Contributor

I haven't been able to isolate it enough to find a code snippet; Celeste is pretty big.

Bigger ones are fine too. Assuming you are allowed to post it of course...

If I understand this correctly, every thread must reach a safepoint before GC can run. So if thread 1 is busy in some tight loop, perhaps waiting for thread 2 to do something, but thread 2 is at a safepoint in the GC, the application will freeze like this?

Correct. In order to fix this, we need GC safepoint (and transition) support in codegen. The runtime part of this is almost done (with a missing sync at the beginning of the GC to force a write barrier on other threads). The codegen part is not there. I haven't got a chance to go through the current codegen and figure out where to add the necessary pieces yet.

@yuyichao
Copy link
Contributor

See

# Temporary solution before we have gc transition support in codegen.
for the temporary hack used in base. (note that once we have gc transition support this code will lead to undefined behavior, I'm thinking of just removing/renaming these functions at that time). As a slightly better workaround, I believe we can also put this around all the atomic operations since a pure julia dead loop can only be waken up by another thread if it is synchronize using atomics. That will have some overhead (one volitale load per atomic ops) and won't cover synchronizations in C of course.

@kpamnany
Copy link
Contributor Author

I think I see. So codegen will insert safepoints in generated code? But this won't help if a thread is blocked in a C library or in a system call, right?

@carnaval
Copy link
Contributor

C calls to random libraries and system calls will be safe in that gc can run concurrently with them, but I don't think this has been implemented yet either (safe regions)

@yuyichao
Copy link
Contributor

But this won't help if a thread is blocked in a C library or in a system call, right?

Not with safepoint only but it will with GC transitions. See the system mutex impl for an idea of what the code would look like before optimization when we have GC transition support. See my summary in the original PR for the plan forward.

@kpamnany
Copy link
Contributor Author

Thanks for the explanations guys.

Maybe this is a dumb question, but have you considered the opposite approach -- entering and leaving unsafe regions explicitly? Then the default thread state would be safe, and when safe, the thread could be signaled for GC synchronization. This would eliminate the need to wait for threads to reach safe points, but would require waiting for them to leave unsafe regions. Would there be too many unsafe regions?

@yuyichao
Copy link
Contributor

That is exactly the plan

@kpamnany
Copy link
Contributor Author

Okay then!

I'm trying to isolate this further and will update or close this when I understand the freeze better.

@yuyichao
Copy link
Contributor

Back to my desk...

Would there be too many unsafe regions?

So the plan is to do codegen in exactly this way. (gc safe by default, and mark critical unsafe region). Since each transition need a store (unless we have good unwinding and stack map etc) we would like to minimize the transition we actually emit in a post-codegen optimization by running more code in unsafe region. We just need to make sure that those additional code in unsafe region doesn't have anything that has to be run in save region (loops, julia-unaware ccall for example).

I'll need to write some note about the plan in more detail although currently I don't feel like advertising it too much before I actually sit down and implement it....

@kpamnany
Copy link
Contributor Author

kpamnany commented Apr 2, 2016

Please confirm: if I'm calling out to a C library from Julia, I should insert gc_state = ccall(:jl_gc_safe_enter, Int8, ()) before the call, and ccall(:jl_gc_safe_leave, Void, (Int8,), gc_state) when it returns? If the C code is touching Julia managed memory, or if it is calling back into Julia, then this would be wrong. Correct?

@yuyichao
Copy link
Contributor

yuyichao commented Apr 2, 2016

Please confirm: if I'm calling out to a C library from Julia, I should insert gc_state = ccall(:jl_gc_safe_enter, Int8, ()) before the call, and ccall(:jl_gc_safe_leave, Void, (Int8,), gc_state) when it returns?

Correct. The codegen support part is basically to insert this automatically. (and merge them).

If the C code is touching Julia managed memory, or if it is calling back into Julia, then this would be wrong. Correct?

You can read or write isbits typed slots (Vector{Int8}, type A a::Float64 end) or read (but not write, in which case you need to trigger the write barrier) any managed memory.

Please also note that the ccall conversion should not have any allocation either.

@kpamnany
Copy link
Contributor Author

kpamnany commented Aug 4, 2016

@yuyichao: can you expand on this:

You can read or write isbits typed slots (Vector{Int8}, type A a::Float64 end) or read (but not write, in which case you need to trigger the write barrier) any managed memory.

We seem to be seeing this freeze in some other code that is calling out to FFTW from multiple threads. FFTW writes into Julia managed memory. So can you explain this write barrier, or point me at any explanation please?

@yuyichao
Copy link
Contributor

yuyichao commented Aug 4, 2016

It's a little hard to say without actually seeing the code. A few comment I can make now,

  1. If one of the thread is frozen waiting for a lock (or in general something that cannot make progress unless another thread does) in C library, I guess there should be at least another threading holding the same lock. Assuming the C library itself is thread safe and doesn't have dead lock (IIRC fftw is only thread safe when executing the plan, are you creating plans from multiple threads?) the C library must be callling back to julia code (or running julia code with some locks in the C library held). Is this the case? Otherwise, the locking in the C library are lower level than the julia ones and while the GC has to wait for ccall to return, there shouldn't be dead lock.

  2. Writing to managed memory is fine, writing to managed memory that contains heap references is not unless the library is already aware of julia (e.g. julia runtime itself or certain embedding applications) or the caller has made sure that it is safe to do so.

    This is already required by the generational GC before threading (which is why calling memcpy on arrays with heap references is unsafe) so it's not really a new requirement. The write barrier is what the generational GC requires and I mentioned it just to say that any code that want to change object reference on the heap should already be julia aware. I don't think fftw overwrite any heap references in the managed memory so it should be able to run simultaneously with the GC and shouldn't need to worry about write barrier.

@kpamnany
Copy link
Contributor Author

kpamnany commented Mar 4, 2017

Haven't seen this in Celeste in a long while. Closing.

@kpamnany kpamnany closed this as completed Mar 4, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Indicates an unexpected problem or unintended behavior multithreading Base.Threads and related functionality needs more info Clarification or a reproducible example is required
Projects
None yet
Development

No branches or pull requests

4 participants