-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP: Fixing the @sync/@async disappearing exception issue #38916
Conversation
An example of code that hangs currently and is fixed by this PR: c = Channel(0)
@sync begin
@async begin
println("$(current_task())")
put!(c,0)
println("Done $(current_task())")
end
@async begin
println("$(current_task())")
undefined()
take!(c)
println("Done $(current_task())")
end
println("$(current_task())")
end |
Welcome, thanks for looking at this! The main approach I can think of to solve this is the existing There is also an issue with the implementation here, in that it busy waits, e.g. |
@JeffBezanson - the existing This PR just moves to parallel monitoring to remove that order dependence. But it does not change the semantic - the PR |
Agreed on the |
On master I see no CPU utilization for that. |
Oh, I see what you're saying. Yes, the parallel monitor is "active" in the sense that it keeps checking the Task status while that Task is waiting. In the original implementation with this scenario, it ends with neither Task scheduled to do anything at all. But a control-C is still detected (presumably through Thanks! |
Waiting in a slightly less busy manner is generally not what one wants to do. Rather waiting should be event-driven: the process should do nothing until some event occurs which causes it to stop having to wait (timer, input, etc.). |
But that passivity is what leads to the lock in the first place. Task A launches Tasks B and C, and begins waiting specifically for Task B to awaken it. Task B waits for a Channel that Task C is supposed to write to. But Task C throws an error first. So no Tasks are scheduled, A is frozen until B wakes it, B is frozen until C wakes it, and C is frozen waiting to hand an exception back to A. The only way to beat this is a |
My original version went a different route - adding a "wake up the parent" function to each Task and |
Okay guys, thanks for the feedback. I just pushed a commit that moves the task monitor loop from active/ |
BTW, I tried getting a |
Also, I did not test this latest commit against |
6cdb94d
to
43e0897
Compare
Moving to WIP untiI I can debug some random build check fails on Windows (passes on win64 at home). Will get rid of / squash the commits before I move back to RFC. |
@JeffBezanson , @StefanKarpinski - okay, after wrapping my head around the lower-level scheduling details, I saw that I should move the above structure to a channel "completion notification" system that counts the remaining tasks... which sounded familiar. So I went back to look at my So no need for this PR - you already have it covered in Experimental. Why not move to Base, BTW? If you guys want, when Experimental does move over, I can move the lockup tests here into a different PR just for regression purposes. Thanks for the learnings, guys. |
FWIW, when I combine the structure I envisioned (which created an extra "completion or exception" Channel) with the more efficient loop in Experimental, I end up with the below. Only advantage is that it mimics the exception behavior of the current function _schedule_parent(t, rs)
schedule(Task(() -> begin
ex = nothing
try
wait(t)
catch e
ex = e
finally
put!(rs, ex)
end
end))
end
function sync_end(c::Channel{Any})
try
n = 0
isready(c) || return
while true
t = take!(c)
if t isa Exception
c_ex = CompositeException([t])
while isready(c)
t = take!(c)
t isa Exception && push!(c_ex, t)
end
throw(c_ex)
elseif isnothing(t)
n -= 1
n == 0 && !isready(c) && break
else
n += 1
_schedule_parent(t, c)
end
end
finally
close(c)
end
nothing
end |
Fixing #32677. This is the "disappearing exception" problem in the
@sync
/@async
structure. The problem occurs becausesync_end
currently waits for the tasks in order, and therefore locks up when an earlier task is waiting for a later task that throws an exception.This mod changes
sync_end
so that it monitors Tasks/Futures in parallel so that the order no longer matters and this specific lockup condition cannot occur. Lockups due to poorly-constructed Channels or other resources are not dealt with - this is a just a bandage to help the programmer, making sure exceptions propagate rather than hang. One step towards a more complete structured concurrency solution. I recommend unifying the Task / Future interfaces as that happens, btw.This has been tested against
1.5-release
,1.6-release
, andmaster
, and tests have been added to check the lockup condition for@async
,Threads.@spawn
, andDistributed.@spawnat
. If you would prefer that this go into Experimental, just let me know I can move and resubmit.