Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add TaskFailedException to propagate backtrace of failed task in wait #32814

Merged
merged 1 commit into from
Aug 13, 2019

Conversation

JeffBezanson
Copy link
Member

@JeffBezanson JeffBezanson commented Aug 7, 2019

Before:

julia> t = @async rand(2)[3]; wait(t)
ERROR: BoundsError: attempt to access 2-element Array{Float64,1} at index [3]
Stacktrace:
 [1] try_yieldto(::typeof(Base.ensure_rescheduled), ::Base.RefValue{Task}) at ./task.jl:552
 [2] wait() at ./task.jl:609
 [3] wait(::Base.GenericCondition{Base.Threads.SpinLock}) at ./condition.jl:107
 [4] wait(::Task) at ./task.jl:214
 [5] top-level scope at REPL[1]:1

After:

julia> t = @async rand(2)[3]; wait(t)
ERROR: FailedTaskException:
BoundsError: attempt to access 2-element Array{Float64,1} at index [3]
Stacktrace:
 [1] getindex at ./array.jl:742 [inlined]
 [2] (::getfield(Main, Symbol("##3#4")))() at ./task.jl:332
Stacktrace:
 [1] wait(::Task) at ./task.jl:251
 [2] top-level scope at REPL[1]:1

So the location of the failure inside the waited-for task is now visible.

I'm not sure if a CapturedException is the best way to do this, but it was within easy reach. Maybe it would make sense to use the exception stack somehow? @c42f

@JeffBezanson JeffBezanson added error handling Handling of exceptions by Julia or the user multithreading Base.Threads and related functionality labels Aug 7, 2019
@c42f
Copy link
Member

c42f commented Aug 7, 2019

Interesting. Doing it this way with CapturedException is nice because it's simple. But it can definitely loose the root cause in some cases.

Thinking out loud, one option would be to import the exception stack from the failed task into the current task. But that seems quite confusing because there's no way to tell which exceptions arose on which task.

Another option is to have a FailedTaskException(t::Task) which wraps the failed task itself. Then the entire root cause would be available and could be reported using Base.catch_stack(t). This seems clear and no less annoying to pattern match the nested exceptions inside.

Going further, it would be nice if wait() and Base.sync_end() were consistent — at least if we want wait() to allow waiting on multiple tasks. In that case a CompositeException wrapping some FailedTaskException seems like the consistent choice. The python trio people have more or less the same issue with their MultiError (python-trio/trio#611) and they've concluded that MultiError should always be thrown even when only a single child task fails. I tend to agree with the reasoning there. See also C#'s AggregateException.

@JeffBezanson
Copy link
Member Author

Another option is to have a FailedTaskException(t::Task) which wraps the failed task itself.

+1 This sounds good.

@JeffBezanson JeffBezanson changed the title propagate backtrace of failed task in wait RFC: add FailedTaskException to propagate backtrace of failed task in wait Aug 7, 2019
Copy link
Member

@c42f c42f left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. Needs a couple of tests for show of FailedTaskException.

while isa(ex.task.exception, FailedTaskException)
pushfirst!(stacks, ex.task.backtrace)
ex = ex.task.exception
end
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, this surfaces the underlying error through multiple levels of wait(). That's nice:

julia> t1 = @async sum()
       t2 = @async wait(t1)
       wait(t2)

ERROR: FailedTaskException:
MethodError: no method matching sum()
Closest candidates are:
  sum(::Tuple{Any,Vararg{Any,N} where N}) at tuple.jl:379
  sum(::StepRangeLen) at twiceprecision.jl:536
  sum(::AbstractRange{#s66} where #s66<:Real) at range.jl:978
  ...
Stacktrace:
 [1] (::getfield(Main, Symbol("##23#24")))() at ./task.jl:332
Stacktrace:
 [1] wait(::Task) at ./task.jl:251
 [2] (::getfield(Main, Symbol("##25#26")))() at ./task.jl:332
Stacktrace:
 [1] wait(::Task) at ./task.jl:251
 [2] top-level scope at REPL[5]:3

stdlib/Distributed/test/distributed_exec.jl Outdated Show resolved Hide resolved
stdlib/Serialization/test/runtests.jl Outdated Show resolved Hide resolved
test/worlds.jl Outdated Show resolved Hide resolved
rethrow()
if isa(r, Task)
_wait(r)
if istaskfailed(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice improvement not having the extra catch and rethrow.

@JeffBezanson JeffBezanson added the triage This should be discussed on a triage call label Aug 8, 2019
@StefanKarpinski StefanKarpinski added the minor change Marginal behavior change acceptable for a minor release label Aug 8, 2019
NEWS.md Outdated
@@ -37,6 +37,8 @@ New library functions
Standard library changes
------------------------

* When `wait` (or `@sync` etc.) is called on a failing `Task`, the exception is propagated as a
`FailedTaskException` wrapping the task ([#32814]).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You might want a few words about why this is an improvement (i.e. you no longer lose the backtrace).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(should also mention fetch in the docs)

base/task.jl Outdated
var = esc(sync_varname)
quote
local ref = $(esc(expr))
if $(Expr(:isdefined, var))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should perhaps not include the isdefined check and just assert that there is a sync pool?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, right.

@JeffBezanson
Copy link
Member Author

Hm, now I notice we added ProcessFailedException. So maybe the name should be swapped around to TaskFailedException?

@c42f
Copy link
Member

c42f commented Aug 8, 2019

Hmm ProcessFailedException was #27900 (CC @oxinabox). It's very similar in intent though it can hold multiple failed processes rather than using CompositeException. If we wanted one exception type to do double duty what would we call it? TaskFailedException seems confusing if the wrapped object happened to be a Process. Move right along nothing to see here 🤦‍♂️

@JeffBezanson
Copy link
Member Author

I'm not saying to combine them, just use the same word order: ProcessFailedException and TaskFailedException, instead of ProcessFailedException and FailedTaskException.

@StefanKarpinski
Copy link
Member

+1 to having the word order match: ProcessFailedException and TaskFailedException.

@JeffBezanson JeffBezanson changed the title RFC: add FailedTaskException to propagate backtrace of failed task in wait RFC: add TaskFailedException to propagate backtrace of failed task in wait Aug 8, 2019
if !isa(r, Task) || (isa(r, Task) && !istaskfailed(r))
rethrow()
if isa(r, Task)
_wait(r)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

x-ref: #32677

If the first task is blocking on a second task and the second task is throwing an exception, we will never see the exception and @sync will block forever.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👍 but out of scope for this PR.

@@ -323,12 +323,12 @@ end
ct = current_task()
testerr = ErrorException("expected")
Copy link
Contributor

@oxinabox oxinabox Aug 9, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Probably out of scope for this PR,
but do you think
we should have a custom error type defined in Test stdlib for testing these things,
so it is harder to get false positives from other errors being thrown?

(If so i can open an issue)

@JeffBezanson JeffBezanson changed the title RFC: add TaskFailedException to propagate backtrace of failed task in wait add TaskFailedException to propagate backtrace of failed task in wait Aug 9, 2019
@JeffBezanson JeffBezanson merged commit df1ee8c into master Aug 13, 2019
@JeffBezanson JeffBezanson deleted the jb/taskbt branch August 13, 2019 18:25
@JeffBezanson JeffBezanson removed the triage This should be discussed on a triage call label Aug 13, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
error handling Handling of exceptions by Julia or the user minor change Marginal behavior change acceptable for a minor release multithreading Base.Threads and related functionality
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants