Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API Request : Interrupt and terminate a task #6283

Open
amitmurthy opened this issue Mar 27, 2014 · 26 comments
Open

API Request : Interrupt and terminate a task #6283

amitmurthy opened this issue Mar 27, 2014 · 26 comments
Labels
io Involving the I/O subsystem: libuv, read, write, etc.

Comments

@amitmurthy
Copy link
Contributor

Reference : https://groups.google.com/d/msg/julia-users/12eYgWWLzfY/7fxXH5cWX2MJ

@simonster
Copy link
Member

Related: #4037

@JeffBezanson
Copy link
Member

Also related: #1700

We can already do this using schedule. The only problem is that the task can catch the exception and retry; there is no way to be sure the task ends. Maybe the interface can just be t.state = :done. We just need to update schedule to drop finished tasks.

@amitmurthy
Copy link
Contributor Author

We should probably still have terminate(t::Task) = (t.state = :done; :ok) defined. Just seems odd that in this particular case, we expect the user to access a member field directly, while in other cases, a Task object is effectively used as an opaque handle.

@JeffBezanson
Copy link
Member

Nah, that won't be sufficient anyway. We should remove the task from whatever wait queue it is in, so it can be GC'd.

@kshyatt
Copy link
Contributor

kshyatt commented Sep 15, 2016

Bumping. Do we have this functionality yet? Do we want it?

@amitmurthy
Copy link
Contributor Author

Don't have it yet. We should. Killing a task should also release whatever resource it is waiting on - file fd, socket, remote reference, etc.

@vtjnash
Copy link
Member

vtjnash commented Sep 15, 2016

I don't think we should add this, since "releasing whatever resource" is generally impractical and buggy. I don't know of any APIs that have this sort of feature but don't warn you not to use it due to the infeasibility of cleaning up state afterwards:
http://docs.oracle.com/javase/1.5.0/docs/guide/misc/threadPrimitiveDeprecation.html
https://msdn.microsoft.com/en-us/library/windows/desktop/ms686717%28v=vs.85%29.aspx
https://internals.rust-lang.org/t/thread-cancel-support/3056 (rust doesn't have it, this is a discussion on why not)

@amitmurthy
Copy link
Contributor Author

amitmurthy commented Sep 15, 2016

After reading those links, an appropriate solution would be to define an interrupt(t::Task).

Currently it should just throw an InterruptException in the target task if it is waiting on I/O, remote reference, Condition, etc. For compute bound tasks, if and when we have tasks scheduled on different threads, it could send an interrupt signal to the specific thread (if possible).

@rofinn
Copy link
Contributor

rofinn commented Nov 22, 2016

+1 for interrupt(t::Task). @amitmurthy Do you have any idea of how to throw the InterruptException on the task? I feel like this solution would also help me figure out how to implement a stacktrace(t::Task) method (ie: run stacktrace() in the task for debugging).

@yuyichao
Copy link
Contributor

stacktrace(::Task) is much easier than interrupt(::Task)

@kshyatt kshyatt added the io Involving the I/O subsystem: libuv, read, write, etc. label Nov 24, 2016
@vtjnash
Copy link
Member

vtjnash commented Jan 24, 2017

Another reference on this topic is https://news.ycombinator.com/item?id=13470452

Note, that the current preferred mechanism of aborting another Task is to close the shared resource and let the runtime clean it up synchronously. This has the benefit of being reliable, easy to code, and already exists. It also is less racy, since close is stateful (once closed, the resource remains closed) rather than an edge-driven event, and typically also an expected condition (so it doesn't require any extra effort to handle).

@amitmurthy
Copy link
Contributor Author

That works for libuv resources implementing close and Channels only. For tasks waiting on a remotecall or waiting on a Future/RemoteChannel users have no access to the Condition variables the task is waiting on. And implementing close(::Condition) which would invalidate all current and future calls on a Condition object I think is not correct. If we do that we may as well have interrupt(::Task) call close on the waiting condition which would bring us back to the issue of proper cleanup in the libuv case. Right?

@vtjnash
Copy link
Member

vtjnash commented Jan 25, 2017

No, it would still be different because it would no longer be specific to intended interruption. For example, you might end up aborting a call to close or showerror instead of the intended job.

I think the remotecall functions generally have async versions which return a handle to the Channel? I think in most other cases, the resource is passed in as an argument which gives the caller some leverage. In the worst case for remotecall, since the resource argument is worker-pid, you could rmprocs(p) to kill / close the connection to that remote worker.

@amitmurthy
Copy link
Contributor Author

I think the remotecall functions generally have async versions which return a handle to the Channel

The calls return a Future and a wait on a Future results in a remote task that waits on the backing channel waiting for data. On the caller we are waiting on a Condition which will be triggered by a response from the remote wait.

In a statement like @async remotecall_fetch(....) we only have access to a Task object. rmprocs(p) seems like an overkill but is probably the correct way to do it currently, as we don't have a means to interrupt the specific remote task.

@vtjnash
Copy link
Member

vtjnash commented Jan 25, 2017

Right, but I thought that's why remotecall is available. And terminating that Task wouldn't actually notify the remote worker to stop, but might confuse / corrupt it when it tries to report it's results. I realize that ensuring cancel-ability may require thinking about how it'll work and threading out handles to the objects that can be used to stop the work. But I don't see how it could be done any other way. There's no guarantee that in the @async example there that it's not actually implemented as @async wait(@async remotecall_fetch()) (hopefully not intentionally...), so all that killing the Task directly accomplishes is destroying the monitoring process.

@s2maki
Copy link
Contributor

s2maki commented Apr 27, 2018

Someone has written an article that draws an equivalence of @schedule-like behavior to the evils of goto.

https://vorpus.org/blog/notes-on-structured-concurrency-or-go-statement-considered-harmful

There is a Julia-specific thread to this conversation at https://discourse.julialang.org/t/schedule-considered-harmful/10540

As someone who learned programming on old-fashioned BASICs like Applesoft BASIC on the Apple II+, and then had to move to procedural programming in C, I initially had the same gut-response to this article as I did back then. I'd call it "instinctive repulsion". But of course we do all now recognize goto as evil, so I reset my thinking and gave it an honest review.

In doing so, I have come to the conclusion that the article makes some great points. There is no good way to universally handle uncaught exceptions at the top of the @schedule other than to drop them on the floor. There isn't really a global (or even local) repository of outstanding Tasks that were created by @schedule, so you can't really even know what's running, or what may have been left out there by a black-box function call you have made.

But the reason I mention this article in this particular discussion is that I think having to support such unbounded @schedule calls may be one of the reasons that the ability to cancel a task is so hard. I haven't reviewed the Trio Python library itself or delved into the details of the Nursery concept as described other than to recognize it as similar to wrapping @async in @sync. But it does occur to me that there may be some concepts in there relating to "checkpoints" that begin to enable task cancels. (https://trio.readthedocs.io/en/latest/reference-core.html#checkpoints). Perhaps, if @schedule itself is dropped and the only way to schedule a task is to wrap an @async inside a @sync, then dealing with the resulting fallout cleaning up resources a task is holding on to may become easier.

Anyhow, just some food for thought here. I'm not necessarily proposing anything, but rather hoping to move the idea of task cancellation back into active discussion.

@StefanKarpinski
Copy link
Member

StefanKarpinski commented May 8, 2018

I think we should look closely at the Trio approach to I/O in the future—i.e. post 1.0 (so this may have to be optional in 1.x or it may have to wait until 2.0). It has some really nice properties, including:

  1. Every task spawned within a function finishes before that function returns unless you spawn the task within an explicit "task nursery" that outlives the spawning function.

  2. There's a natural parent task to handle every child task failure—no more task failures disappearing into the void. This has been a frequent point of contention between @JeffBezanson and myself; the Trio approach provides a nice clean solution that could make both of us happy.

  3. There's a clear structure for cancellation of tasks and subtasks: if you kill a task, that also kills all subtasks. In particular, this means that instead of having timeout arguments on every possible blocking operation, you can do external cancellation of blocking tasks correctly—this composes better and means you don't have to wait until a potentially indefinite chain of timers expires.

The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together some time since Nathaniel is a much more effective and explainer of and advocate for the Trio model than I am. (Hope you don't mind the ping, Nathaniel!)

@njsmith
Copy link

njsmith commented May 11, 2018

The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together

Sounds like a fun time to me :-)

@njsmith
Copy link

njsmith commented Feb 10, 2019

The most effective way forward may be to see if I can get @JeffBezanson and @njsmith in a room together some time since Nathaniel is a much more effective and explainer of and advocate for the Trio model than I am.

BTW, we've just created a virtual room for cross-language discussions of structured concurrency, in case anyone is interested: https://trio.discourse.group/c/structured-concurrency

@StefanKarpinski
Copy link
Member

Cool, Nathan, I’ve joined :)

@c42f
Copy link
Member

c42f commented Feb 12, 2019

Nice, I've joined as well. Reading over the blog post I think this approach makes a lot of sense. (As a side note — it also means that it was "correct" to inherit loggers from their parent task. Phew!)

@tkf
Copy link
Member

tkf commented Aug 8, 2019

I think cancellation in Trio works nicely because Python forces you to write await which becomes the (potential) checkpoints. As await can only be used inside functions marked by async, you don't need to care about cancellation inside blocking (non-async) functions.

Is there any plans/ideas for (1) how to make non-I/O (compute-intensive) functions cancellable and (2) how to mark them as such in a way that the callers can know that they have to prepare for cancellation? I guess passing around cancellation tokens (see also https://vorpus.org/blog/timeouts-and-cancellation-for-humans/) and handling exit manually as in Go's errgroup would be an option. But it sounds like a very clumsy API to use.

BTW, I played around a bit to see how Trio-like API would look like and how passing around "nursery" would work (although it's more like Go's errgroup as there is no cancellation support). Here is a demo:

function bg(nursery)
    @with_nursery nursery begin
        println("in nursery")
        Threads.@spawn begin
            sleep(0.1)
            Threads.@spawn begin
                sleep(0.1)
                println("world")
            end
            println("hello")
        end
    end
    println("out of nursery")
end

function demo()
    @sync begin
        println("launching background tasks")
        bg(@get_nursery)
        bg(@get_nursery)
        println("synchronising background tasks")
    end
    println("synchronized background tasks")
end
Quick-and-dirty implementation of `@get_nursery` and `@with_nursery` (no cancellation support at all!))
struct TaskVector
    tasks::Vector{Any}
    lock::Threads.SpinLock
end

TaskVector(tasks) = TaskVector(tasks, Threads.SpinLock())

function Base.push!(v::TaskVector, x)
    lock(v.lock)
    try
        return push!(v.tasks, x)
    finally
        unlock(v.lock)
    end
end

macro get_nursery()
    var = esc(Base.sync_varname)
    quote
        TaskVector($var)
    end
end

macro with_nursery(nursery, body)
    ex = quote
        let $(Base.sync_varname) = $nursery
            $body
        end
    end
    return esc(ex)
end

demo() should print

launching background tasks
in nursery
out of nursery
in nursery
out of nursery
synchronising background tasks
hello
hello
world
world
synchronized background tasks

@tkf
Copy link
Member

tkf commented Aug 8, 2019

I think Kotlin could be interesting here since it does structured concurrency without async/await keywords. Kotlin's approach to making computation cancellable is to call yield function or checking isActive variable. See: https://kotlinlang.org/docs/reference/coroutines/cancellation-and-timeouts.html#making-computation-code-cancellable

But it looks there is no mechanism for callers to know if the function is cancellable?

@tkf
Copy link
Member

tkf commented Aug 12, 2019

I started implementing a very minimal version of structured concurrency here https://github.com/tkf/Awaits.jl. Basically I implemented what I mentioned in the last bits of #32677 (comment). The idea is to define a macro @await body which is expanded to (roughly speaking)

ans = $body
if ans isa Exception
    cancel!($context)
    return ans
end
ans

where $context is a variable that tracks cancellation tokens (and tasks). There is also @go body for (a thread safe version of) @spawn @await body. Another important construct is @check which expands to

shouldstop($context) && return Cancelled()

where Cancelled <: Exception. This way, compute-intensive functions can define checkpoints manually by inserting @check. Those functions can also "throw" an error by returning an Exception when they hit a condition where the whole computation should stop. Callers of such cancellable functions can wrap the call with @await which then explicitly marks a checkpoint. Sub-computations can be cancelled individually by something like

context = @cancelscope begin
    @go ...
    @go ...
end
cancel!(context)

I also wrote some minimal documentation https://tkf.github.io/Awaits.jl/dev/ and tests https://github.com/tkf/Awaits.jl/blob/master/test/test_simple.jl

@AhmedSalih3d
Copy link

Is this the reason I cannot stop a FileEvent (BetterFileWatching.jl) task and it keeps on working in the background w/e I do?

Kind regards

@vtjnash
Copy link
Member

vtjnash commented Aug 31, 2022

No, that just sounds like a BetterFileWatching.jl API problem. The stdlib FileWatching shouldn't have that issue

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

No branches or pull requests