remove "unhandled task failure" message printing #27722

JeffBezanson · 2018-06-21T23:30:30Z

This is a repeat of #12736; I'm going to propose this again.

With the new multi-threading system approaching, it will be helpful to remove as much logic as possible from task startup and shutdown. Ideally we'd like to remove task_done_hook entirely. It's bad because every task has to pay for everything in that hook whether it needs the feature or not. Instead there can be just one mechanism (waiting in a task's completion queue) you can use to do whatever you need when a task exits. I plan to remove TASKDONE_HOOKS as well in a future PR.
Other asynchronous programming systems don't do this. One source I found says that C# handles this on task GC, which has also been proposed. I think that's still too complex and costly, and also somewhat arbitrary: if you want to see exceptions, make sure you don't keep any references to the task? If you're going to go to that trouble, it's easier to add a try-catch or start a monitoring task.
I'll say it again: having to set a SUPPRESS_EXCEPTION_PRINTING flag sometimes is just silly. No such thing should ever exist.

staticfloat · 2018-06-22T05:00:03Z

So I guess the flipside is that we should just make it easy for users to launch tasks with wrappers around them; e.g. there should be a @schedule_and_print_errors or something that wraps a Task in nice things like this; that gives us fast Tasks if we want it, with the user being able to decide what should or should not receive the special attentions of an exception catcher/printer.

StefanKarpinski · 2018-06-22T16:53:31Z

Or do something like trio: #6283 (comment). Basic premise: tasks must form a tree and all child tasks must terminate before their parent task does. There's an escape hatch, however, where you can spawn a task that outlives the current task by spawning it explicitly into a "nursery" which then becomes responsible for that task. This way errors can propagate up the task tree in a well-defined manner and there's always some task that's responsible for it. Note this comment from the trio docs:

This is why you have to use async with to get a nursery: it gives us a way to make sure that the child calls can’t run away and get lost. One reason this is important is that if there’s a bug or other problem in one of the children, and it raises an exception, then it lets us propagate that exception into the parent; in many other frameworks, exceptions like this are just discarded. Trio never discards exceptions.

Yes, that's exactly the property we want. It would also mean that cancelling tasks would make sense without chaos: when you cancel a task it cancels all its child tasks; it doesn't matter how many subtasks some task spawned in order to do its work, if you cancel a task they're all neatly killed.

vtjnash · 2018-06-22T17:39:18Z

should just make it easy for users to launch tasks with wrappers around them

+1. I think we need a better approach to launching tasks, as Stefan suggested above, before we remove this.

StefanKarpinski · 2018-06-22T18:28:16Z

In considering this, I'm not sure if we'd need a "nursery" object type since we have first-class tasks: if you want to spawn a task that outlives the current task, you could be required to explicitly given it a parent task which is responsible for it (gets its unhandled exceptions, cancels it if cancelled). All you really need is a task tree which exceptions propagate up and cancellations propagate down. With normal blocking APIs, the parent is the caller; with some APIs you can give an explicit parent task.

JeffBezanson · 2018-06-22T18:41:46Z

I believe the way Trio ensures exceptions are handled is roughly that it requires all tasks to be spawned inside the equivalent of our @sync, which calls wait on all of them at the end of the block. Simply assigning a parent to a task is not enough, since you still have to decide when to print (or otherwise handle) the child task's exception. AFAIU, in Trio you still must reach the parent task's call to wait before anything happens with the child exception. It does not spontaneously get printed as a result of the wait not having been reached yet. Indeed, the only way to implement @sync or something like Trio's nurseries in julia is to use the SUPPRESS_EXCEPTION_PRINTING flag. Without it there's no hope of getting the right behavior.

Put differently, the only way Trio argues against this change is if it has a SUPPRESS_EXCEPTION_PRINTING flag, and that flag has generally been considered a good API.

StefanKarpinski · 2018-06-22T19:42:43Z

Simply assigning a parent to a task is not enough, since you still have to decide when to print (or otherwise handle) the child task's exception.

My understanding is that if a child task dies, the error propagates into the parent. If the parent handles it and continues, all is well (the child is still dead, but everything else continues). If the parent doesn't handle the child's exception, then it too dies, which kills off all its children—i.e. the sibling tasks of the original child task—and the error propagates further up to the grandparent of the original child. If no task ever handles the exception, it eventually reaches the root of the task tree and the entire process terminates with the child's exception as the cause. Hence the claim "Trio never discards exceptions"—either some task handles an exception or it kills the process.

JeffBezanson · 2018-06-22T19:52:20Z

If the parent doesn't handle the child's exception, then it too dies

I assume that happens when the parent task exits? I.e. when a task exits, it waits on all of its child tasks, unless they were explicitly "detached" somehow. That might be ok. But it still doesn't argue for printing an exception as soon as a task dies before anybody has called wait on it, so I still want to merge this PR.

StefanKarpinski · 2018-06-22T20:01:26Z

so I still want to merge this PR.

I'm not really arguing against this PR but I do think that we should do something so that every exception must be handled. It's true that the trio approach does not entail a failed task printing its error as soon as it ends, it just has a comparable effect because if no one catches the error then the root task prints its error and quits.

I.e. when a task exits, it waits on all of its child tasks, unless they were explicitly "detached" somehow.

I'm not 100% clear on that. The parent may be "blocked" until all of its children return which is a somewhat different model than what we've had. But I could be wrong.

JeffBezanson · 2018-06-22T20:17:55Z

if no one catches the error then the root task prints its error and quits.

I'm fine with something like this in theory. But the whole problem is defining what it means for nobody to catch the error --- how long do you wait for somebody to catch the error? And Trio does provide an escape hatch:

The only way to violate causality (which is an important feature, just one that needs to be handled carefully) is to explicitly create a nursery object in one task and then pass it into another task.
(from https://trio.readthedocs.io/en/latest/design.html)

I'm not sure I fully understand this, but I believe it means you can have a "rogue" nursery sitting around with possibly-failed tasks in it, waiting (possibly forever) for somebody to look at it.

JeffBezanson · 2018-06-25T20:42:23Z

Does anybody consider this a breaking API change? Seems like a borderline case.

JeffBezanson · 2018-06-28T18:37:41Z

Triage seems to be ok with this. We can try to do something better in the future, possibly handling it in the gc.

vtjnash · 2018-06-28T19:19:28Z

:,( The end of a useful feature that has helped find numerous bugs and broken tests. I will impatiently await the day when we get that something better in the future!

JeffBezanson · 2018-06-28T19:26:13Z

Are you ok with printing on gc? Or anything that's not a race condition? The race condition is really what I object to most here.

StefanKarpinski · 2018-06-28T19:33:03Z

I think printing on GC seems like a good compromise.

vtjnash · 2018-06-28T19:35:34Z

Yes, I agree that it is nice to remove the race condition, although since it only happens when the code fails, it seems like it would be rare to observe in practice. Handling it in the GC (or at least, spawning a task to do the printing) seems good.

remove "unhandled task failure" message printing

ff64586

ararslan added parallelism Parallel or distributed computation error handling Handling of exceptions by Julia or the user labels Jun 21, 2018

JeffBezanson added the triage This should be discussed on a triage call label Jun 25, 2018

JeffBezanson merged commit e55f9b5 into master Jun 28, 2018

JeffBezanson deleted the jb/taskfailure branch June 28, 2018 18:37

JeffBezanson removed the triage This should be discussed on a triage call label Jun 28, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

remove "unhandled task failure" message printing #27722

remove "unhandled task failure" message printing #27722

JeffBezanson commented Jun 21, 2018

staticfloat commented Jun 22, 2018

StefanKarpinski commented Jun 22, 2018 •

edited

Loading

vtjnash commented Jun 22, 2018

StefanKarpinski commented Jun 22, 2018 •

edited

Loading

JeffBezanson commented Jun 22, 2018

StefanKarpinski commented Jun 22, 2018 •

edited

Loading

JeffBezanson commented Jun 22, 2018

StefanKarpinski commented Jun 22, 2018

JeffBezanson commented Jun 22, 2018

JeffBezanson commented Jun 25, 2018

JeffBezanson commented Jun 28, 2018

vtjnash commented Jun 28, 2018

JeffBezanson commented Jun 28, 2018

StefanKarpinski commented Jun 28, 2018

vtjnash commented Jun 28, 2018

remove "unhandled task failure" message printing #27722

remove "unhandled task failure" message printing #27722

Conversation

JeffBezanson commented Jun 21, 2018

staticfloat commented Jun 22, 2018

StefanKarpinski commented Jun 22, 2018 • edited Loading

vtjnash commented Jun 22, 2018

StefanKarpinski commented Jun 22, 2018 • edited Loading

JeffBezanson commented Jun 22, 2018

StefanKarpinski commented Jun 22, 2018 • edited Loading

JeffBezanson commented Jun 22, 2018

StefanKarpinski commented Jun 22, 2018

JeffBezanson commented Jun 22, 2018

JeffBezanson commented Jun 25, 2018

JeffBezanson commented Jun 28, 2018

vtjnash commented Jun 28, 2018

JeffBezanson commented Jun 28, 2018

StefanKarpinski commented Jun 28, 2018

vtjnash commented Jun 28, 2018

StefanKarpinski commented Jun 22, 2018 •

edited

Loading

StefanKarpinski commented Jun 22, 2018 •

edited

Loading

StefanKarpinski commented Jun 22, 2018 •

edited

Loading