API Request : Interrupt and terminate a task #6283

amitmurthy · 2014-03-27T15:19:34Z

Reference : https://groups.google.com/d/msg/julia-users/12eYgWWLzfY/7fxXH5cWX2MJ

simonster · 2014-03-27T15:26:50Z

Related: #4037

JeffBezanson · 2014-03-27T21:33:25Z

Also related: #1700

We can already do this using schedule. The only problem is that the task can catch the exception and retry; there is no way to be sure the task ends. Maybe the interface can just be t.state = :done. We just need to update schedule to drop finished tasks.

amitmurthy · 2014-03-28T07:47:59Z

We should probably still have terminate(t::Task) = (t.state = :done; :ok) defined. Just seems odd that in this particular case, we expect the user to access a member field directly, while in other cases, a Task object is effectively used as an opaque handle.

JeffBezanson · 2014-03-28T08:00:54Z

Nah, that won't be sufficient anyway. We should remove the task from whatever wait queue it is in, so it can be GC'd.

kshyatt · 2016-09-15T02:24:18Z

Bumping. Do we have this functionality yet? Do we want it?

amitmurthy · 2016-09-15T03:42:38Z

Don't have it yet. We should. Killing a task should also release whatever resource it is waiting on - file fd, socket, remote reference, etc.

vtjnash · 2016-09-15T04:01:26Z

I don't think we should add this, since "releasing whatever resource" is generally impractical and buggy. I don't know of any APIs that have this sort of feature but don't warn you not to use it due to the infeasibility of cleaning up state afterwards:
http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx
https://internals.rust-lang.org/t/thread-cancel-support/3056 (rust doesn't have it, this is a discussion on why not)

amitmurthy · 2016-09-15T04:19:25Z

After reading those links, an appropriate solution would be to define an interrupt(t::Task).

Currently it should just throw an InterruptException in the target task if it is waiting on I/O, remote reference, Condition, etc. For compute bound tasks, if and when we have tasks scheduled on different threads, it could send an interrupt signal to the specific thread (if possible).

rofinn · 2016-11-22T18:40:47Z

+1 for interrupt(t::Task). @amitmurthy Do you have any idea of how to throw the InterruptException on the task? I feel like this solution would also help me figure out how to implement a stacktrace(t::Task) method (ie: run stacktrace() in the task for debugging).

yuyichao · 2016-11-22T18:49:36Z

stacktrace(::Task) is much easier than interrupt(::Task)

vtjnash · 2017-01-24T20:51:47Z

Another reference on this topic is https://news.ycombinator.com/item?id=13470452

Note, that the current preferred mechanism of aborting another Task is to close the shared resource and let the runtime clean it up synchronously. This has the benefit of being reliable, easy to code, and already exists. It also is less racy, since close is stateful (once closed, the resource remains closed) rather than an edge-driven event, and typically also an expected condition (so it doesn't require any extra effort to handle).

amitmurthy · 2017-01-25T03:45:22Z

That works for libuv resources implementing close and Channels only. For tasks waiting on a remotecall or waiting on a Future/RemoteChannel users have no access to the Condition variables the task is waiting on. And implementing close(::Condition) which would invalidate all current and future calls on a Condition object I think is not correct. If we do that we may as well have interrupt(::Task) call close on the waiting condition which would bring us back to the issue of proper cleanup in the libuv case. Right?

vtjnash · 2017-01-25T04:10:03Z

No, it would still be different because it would no longer be specific to intended interruption. For example, you might end up aborting a call to close or showerror instead of the intended job.

I think the remotecall functions generally have async versions which return a handle to the Channel? I think in most other cases, the resource is passed in as an argument which gives the caller some leverage. In the worst case for remotecall, since the resource argument is worker-pid, you could rmprocs(p) to kill / close the connection to that remote worker.

amitmurthy · 2017-01-25T04:30:18Z

I think the remotecall functions generally have async versions which return a handle to the Channel

The calls return a Future and a wait on a Future results in a remote task that waits on the backing channel waiting for data. On the caller we are waiting on a Condition which will be triggered by a response from the remote wait.

In a statement like @async remotecall_fetch(....) we only have access to a Task object. rmprocs(p) seems like an overkill but is probably the correct way to do it currently, as we don't have a means to interrupt the specific remote task.

vtjnash · 2017-01-25T04:45:40Z

Right, but I thought that's why remotecall is available. And terminating that Task wouldn't actually notify the remote worker to stop, but might confuse / corrupt it when it tries to report it's results. I realize that ensuring cancel-ability may require thinking about how it'll work and threading out handles to the objects that can be used to stop the work. But I don't see how it could be done any other way. There's no guarantee that in the @async example there that it's not actually implemented as @async wait(@async remotecall_fetch()) (hopefully not intentionally...), so all that killing the Task directly accomplishes is destroying the monitoring process.

s2maki · 2018-04-27T16:03:17Z

Someone has written an article that draws an equivalence of @schedule-like behavior to the evils of goto.

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful

There is a Julia-specific thread to this conversation at https://discourse.julialang.org/t/schedule-considered-harmful/10540

As someone who learned programming on old-fashioned BASICs like Applesoft BASIC on the Apple II+, and then had to move to procedural programming in C, I initially had the same gut-response to this article as I did back then. I'd call it "instinctive repulsion". But of course we do all now recognize goto as evil, so I reset my thinking and gave it an honest review.

In doing so, I have come to the conclusion that the article makes some great points. There is no good way to universally handle uncaught exceptions at the top of the @schedule other than to drop them on the floor. There isn't really a global (or even local) repository of outstanding Tasks that were created by @schedule, so you can't really even know what's running, or what may have been left out there by a black-box function call you have made.

But the reason I mention this article in this particular discussion is that I think having to support such unbounded @schedule calls may be one of the reasons that the ability to cancel a task is so hard. I haven't reviewed the Trio Python library itself or delved into the details of the Nursery concept as described other than to recognize it as similar to wrapping @async in @sync. But it does occur to me that there may be some concepts in there relating to "checkpoints" that begin to enable task cancels. (https://trio.readthedocs.io/en/latest/reference-core.html#checkpoints). Perhaps, if @schedule itself is dropped and the only way to schedule a task is to wrap an @async inside a @sync, then dealing with the resulting fallout cleaning up resources a task is holding on to may become easier.

Anyhow, just some food for thought here. I'm not necessarily proposing anything, but rather hoping to move the idea of task cancellation back into active discussion.

StefanKarpinski · 2018-05-08T14:10:34Z

I think we should look closely at the Trio approach to I/O in the future—i.e. post 1.0 (so this may have to be optional in 1.x or it may have to wait until 2.0). It has some really nice properties, including:

Every task spawned within a function finishes before that function returns unless you spawn the task within an explicit "task nursery" that outlives the spawning function.
There's a natural parent task to handle every child task failure—no more task failures disappearing into the void. This has been a frequent point of contention between @JeffBezanson and myself; the Trio approach provides a nice clean solution that could make both of us happy.
There's a clear structure for cancellation of tasks and subtasks: if you kill a task, that also kills all subtasks. In particular, this means that instead of having timeout arguments on every possible blocking operation, you can do external cancellation of blocking tasks correctly—this composes better and means you don't have to wait until a potentially indefinite chain of timers expires.

The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together some time since Nathaniel is a much more effective and explainer of and advocate for the Trio model than I am. (Hope you don't mind the ping, Nathaniel!)

njsmith · 2018-05-11T03:11:25Z

The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together

Sounds like a fun time to me :-)

njsmith · 2019-02-10T08:25:50Z

The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together some time since Nathaniel is a much more effective and explainer of and advocate for the Trio model than I am.

BTW, we've just created a virtual room for cross-language discussions of structured concurrency, in case anyone is interested: https://trio.discourse.group/c/structured-concurrency

StefanKarpinski · 2019-02-11T00:36:04Z

Cool, Nathan, I’ve joined :)

c42f · 2019-02-12T06:10:37Z

Nice, I've joined as well. Reading over the blog post I think this approach makes a lot of sense. (As a side note — it also means that it was "correct" to inherit loggers from their parent task. Phew!)

tkf · 2019-08-08T05:07:21Z

I think cancellation in Trio works nicely because Python forces you to write await which becomes the (potential) checkpoints. As await can only be used inside functions marked by async, you don't need to care about cancellation inside blocking (non-async) functions.

Is there any plans/ideas for (1) how to make non-I/O (compute-intensive) functions cancellable and (2) how to mark them as such in a way that the callers can know that they have to prepare for cancellation? I guess passing around cancellation tokens (see also https://vorpus.org/blog/timeouts-and-cancellation-for-humans/) and handling exit manually as in Go's errgroup would be an option. But it sounds like a very clumsy API to use.

BTW, I played around a bit to see how Trio-like API would look like and how passing around "nursery" would work (although it's more like Go's errgroup as there is no cancellation support). Here is a demo:

function bg(nursery)
    @with_nursery nursery begin
        println("in nursery")
        Threads.@spawn begin
            sleep(0.1)
            Threads.@spawn begin
                sleep(0.1)
                println("world")
            end
            println("hello")
        end
    end
    println("out of nursery")
end

function demo()
    @sync begin
        println("launching background tasks")
        bg(@get_nursery)
        bg(@get_nursery)
        println("synchronising background tasks")
    end
    println("synchronized background tasks")
end

Quick-and-dirty implementation of `@get_nursery` and `@with_nursery` (no cancellation support at all!))

struct TaskVector
    tasks::Vector{Any}
    lock::Threads.SpinLock
end

TaskVector(tasks) = TaskVector(tasks, Threads.SpinLock())

function Base.push!(v::TaskVector, x)
    lock(v.lock)
    try
        return push!(v.tasks, x)
    finally
        unlock(v.lock)
    end
end

macro get_nursery()
    var = esc(Base.sync_varname)
    quote
        TaskVector($var)
    end
end

macro with_nursery(nursery, body)
    ex = quote
        let $(Base.sync_varname) = $nursery
            $body
        end
    end
    return esc(ex)
end

demo() should print

launching background tasks
in nursery
out of nursery
in nursery
out of nursery
synchronising background tasks
hello
hello
world
world
synchronized background tasks

tkf · 2019-08-08T05:13:31Z

I think Kotlin could be interesting here since it does structured concurrency without async/await keywords. Kotlin's approach to making computation cancellable is to call yield function or checking isActive variable. See: https://kotlinlang.org/docs/reference/coroutines/cancellation-and-timeouts.html#making-computation-code-cancellable

But it looks there is no mechanism for callers to know if the function is cancellable?

tkf · 2019-08-12T00:20:08Z

I started implementing a very minimal version of structured concurrency here https://github.com/tkf/Awaits.jl. Basically I implemented what I mentioned in the last bits of #32677 (comment). The idea is to define a macro @await body which is expanded to (roughly speaking)

ans = $body
if ans isa Exception
    cancel!($context)
    return ans
end
ans

where $context is a variable that tracks cancellation tokens (and tasks). There is also @go body for (a thread safe version of) @spawn @await body. Another important construct is @check which expands to

shouldstop($context) && return Cancelled()

where Cancelled <: Exception. This way, compute-intensive functions can define checkpoints manually by inserting @check. Those functions can also "throw" an error by returning an Exception when they hit a condition where the whole computation should stop. Callers of such cancellable functions can wrap the call with @await which then explicitly marks a checkpoint. Sub-computations can be cancelled individually by something like

context = @cancelscope begin
    @go ...
    @go ...
end
cancel!(context)

I also wrote some minimal documentation https://tkf.github.io/Awaits.jl/dev/ and tests https://github.com/tkf/Awaits.jl/blob/master/test/test_simple.jl

AhmedSalih3d · 2022-08-28T08:25:35Z

Is this the reason I cannot stop a FileEvent (BetterFileWatching.jl) task and it keeps on working in the background w/e I do?

Kind regards

vtjnash · 2022-08-31T21:55:31Z

No, that just sounds like a BetterFileWatching.jl API problem. The stdlib FileWatching shouldn't have that issue

ihnorton mentioned this issue Dec 30, 2014

Problems interrupting pmap with Ctrl-C #6752

Closed

amitmurthy mentioned this issue Oct 20, 2015

First pass at a Go-style select statement #13661

Closed

5 tasks

kshyatt added the io Involving the I/O subsystem: libuv, read, write, etc. label Nov 24, 2016

samoconnor mentioned this issue Nov 10, 2017

timeout macro -- what happens to the task that timed out? memory leak? JuliaWeb/HTTP.jl#114

Closed

StefanKarpinski mentioned this issue Jun 22, 2018

remove "unhandled task failure" message printing #27722

Merged

tlienart mentioned this issue Mar 31, 2019

Refactoring without filewatching JuliaDocs/LiveServer.jl#7

Closed

tkf mentioned this issue Aug 10, 2019

Error Handling in Tasks #32677

Closed

timholy mentioned this issue Sep 7, 2019

Test and fix for issue #354 timholy/Revise.jl#355

Merged

vtjnash mentioned this issue May 7, 2020

Ctrl-C does not work when running multi-threaded code #35524

Open

vtjnash mentioned this issue Jun 15, 2020

wait() with timeout #36217

Open

tclements mentioned this issue Oct 9, 2020

Short circuit for reading mseed with many gaps jpjones76/SeisIO.jl#62

Closed

Seelengrab mentioned this issue May 31, 2023

signal handling: User-defined interrupt handlers #49541

Open

7 tasks

Drvi mentioned this issue Dec 7, 2023

Allow @testitem to set its own timeout JuliaTesting/ReTestItems.jl#129

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

API Request : Interrupt and terminate a task #6283

API Request : Interrupt and terminate a task #6283

amitmurthy commented Mar 27, 2014

simonster commented Mar 27, 2014

JeffBezanson commented Mar 27, 2014

amitmurthy commented Mar 28, 2014

JeffBezanson commented Mar 28, 2014

kshyatt commented Sep 15, 2016

amitmurthy commented Sep 15, 2016

vtjnash commented Sep 15, 2016

amitmurthy commented Sep 15, 2016 •

edited

Loading

rofinn commented Nov 22, 2016

yuyichao commented Nov 22, 2016

vtjnash commented Jan 24, 2017

amitmurthy commented Jan 25, 2017

vtjnash commented Jan 25, 2017

amitmurthy commented Jan 25, 2017

vtjnash commented Jan 25, 2017

s2maki commented Apr 27, 2018

StefanKarpinski commented May 8, 2018 •

edited

Loading

njsmith commented May 11, 2018

njsmith commented Feb 10, 2019

StefanKarpinski commented Feb 11, 2019

c42f commented Feb 12, 2019

tkf commented Aug 8, 2019

tkf commented Aug 8, 2019

tkf commented Aug 12, 2019

AhmedSalih3d commented Aug 28, 2022

vtjnash commented Aug 31, 2022

API Request : Interrupt and terminate a task #6283

API Request : Interrupt and terminate a task #6283

Comments

amitmurthy commented Mar 27, 2014

simonster commented Mar 27, 2014

JeffBezanson commented Mar 27, 2014

amitmurthy commented Mar 28, 2014

JeffBezanson commented Mar 28, 2014

kshyatt commented Sep 15, 2016

amitmurthy commented Sep 15, 2016

vtjnash commented Sep 15, 2016

amitmurthy commented Sep 15, 2016 • edited Loading

rofinn commented Nov 22, 2016

yuyichao commented Nov 22, 2016

vtjnash commented Jan 24, 2017

amitmurthy commented Jan 25, 2017

vtjnash commented Jan 25, 2017

amitmurthy commented Jan 25, 2017

vtjnash commented Jan 25, 2017

s2maki commented Apr 27, 2018

StefanKarpinski commented May 8, 2018 • edited Loading

njsmith commented May 11, 2018

njsmith commented Feb 10, 2019

StefanKarpinski commented Feb 11, 2019

c42f commented Feb 12, 2019

tkf commented Aug 8, 2019

tkf commented Aug 8, 2019

tkf commented Aug 12, 2019

AhmedSalih3d commented Aug 28, 2022

vtjnash commented Aug 31, 2022

amitmurthy commented Sep 15, 2016 •

edited

Loading

StefanKarpinski commented May 8, 2018 •

edited

Loading