idleworker() function. #14736

samoconnor · 2016-01-20T05:31:22Z

idleworker() returns a worker that is not currently busy (not being waited for by remotecall_fetch or remotecall_wait).

If there are no idle workers, idleworker() blocks until a worker is available.

This is intended as a more general interface for the type of dynamic scheduling provided by pmap.
e.g. pmap might eventually be a simple combination of amap() and idleworker() as described in #14843...

remote(f, args...) = (args...) -> remotecall_fetch(f, idleworker(), args...)
pmap(f, c...) = amap(remote(f), c...; max=nworkers())

Implementation notes:

Uses the existing RemoteValue.waitingfor field to identify busy workers.
Fixes remotecall_fetch and remotecall_wait to set rv.waitingfor back to 0 after waiting.
Adds ProcessGroup.worker_is_idle::Condition to enable waiting for multiple busy workers. idleworker() waits on worker_is_idle. remotecall_fetch and remotecall_wait notify worker_is_idle just after setting rv.waitingfor = 0.

samoconnor · 2016-01-20T08:14:07Z

I'm not sure how to proceed with writing test cases for this.
As far as I understand it, the test infrastructure uses pmap.
I assume that this means that the state of workers() and their busyness or otherwise is dependant on the other tests that are running.
Is there a standard approach to writing test code that needs to have control over all the workers?

tkelman · 2016-01-20T08:17:24Z

see test/parallel_exec.jl

edit: sorry, I could've sworn we used to have it set up so parallel.jl waited until other test workers were done then ran at the end, but it looks like that might not be the case any more. hard to tell

samoconnor · 2016-01-20T11:02:13Z

see test/parallel_exec.jl

It looks like test/parallel.jl spawns a whole new Julia runtime to execute test/parallel_exec.jl so the workers() should be private to that test regardless of execution order.

samoconnor · 2016-01-21T03:05:05Z

CI passing now.

There may well be a more elegant way to implement idleworker(). This 1st attempt is intended to be minimally disruptive.

Any comments on the interface before I continue to move in the direction described here : #12943 (comment)

@StefanKarpinski, @JeffBezanson, @jakebolewski, @amitmurthy ?

yuyichao · 2016-01-21T03:08:32Z

test/parallel_exec.jl

+
+    w = workers()
+    @test length(w) == 3
+                                        t1 = now()


The format here is very weird.

Hi @yuyichao, I apologise if this looks weird to you.
I found that putting the temporal comments and assertions into a seperate column made it easier to understand.

Please follow the code formatting guideline.

amitmurthy · 2016-01-27T08:05:24Z

Since we allow all-to-all communication, checking waitingfor on the calling process will not be enough to identify "busy" workers. Worker m could have spawned a fetch from worker n.
Remote references are returned by the @spawn macros as also remotecall. They may not be waited upon at the time of checking waitingfor

samoconnor · 2016-01-27T09:03:02Z

Hi @amitmurthy,

Thanks for the feedback.

Re: 1.

I suspect that with all-to-all communication there is no way to identify an idle worker. There would always be a race condition where some other node makes the "idle" node busy right after it is identified as idle.
However, it seems that the pmap style of central dispatch is a common enough use pattern to warrant supporting functions. Some users will want domain-specific all-to-all dispatch and scheduling algorithms, but those users will have to really understand what they are doing to be efficient and to avoid deadlocks. I think the most common use case is likely to involve worker nodes processing work for a central controller.

What if this instead of being called idleworker() this function was just called worker()?
The documentation could say that best efforts will be made to return a worker that is not busy; and that the returned worker is not locally being waited for by remotecall_fetch or remotecall_wait.

Re: 2.

I suppose that it wouldn't be too hard to have a flag to keep track of which workers are busy with locally originated asynchronous remotecalls or @spawns.

I wonder wether it might not be better to encourage use of the blocking remotecall_fetch mechanism with @async if asynchronous behaviour is needed. This would be consistent with the ::IO subsystem where (nearly) everything is blocking but can be wrapped in @async.

In an case, what I'm aiming for here is a function to call when I want to ask for "a worker that I haven't already given a job to." I'm open to suggestions about what the function should be called.

samoconnor · 2016-01-28T23:23:38Z

@amitmurthy, another thing to note is that this PR does not export idleworker.
My initial intention was to put the pieces in place for a cleaner version of #12943 without exporting anything new.

Perhaps we can leave the design of a public "get me a worker" API for later.

amitmurthy · 2016-01-29T06:50:33Z

Wouldn't it be simpler to run off a Q at an application level?

A contrived example:

addprocs(4)

# Create a Q accessible remotely
rr = RemoteChannel(()->Channel(128))

# Make it global on all workers
for p in procs()
    @spawnat p global WorkQ = rr
end

# mark initial availability
for p in workers()
   @spawnat p put!(WorkQ, myid())
end

# Find a free worker, assign work, and have worker mark availability again.
for i in rand(1:10, 20)
     p = take!(WorkQ)
     remotecall(t->(println("sleep : ", t, " seconds"); sleep(t); println("DONE!"); put!(WorkQ, myid()); nothing), p, i)
end

samoconnor · 2016-01-29T10:43:51Z

Hi @amitmurthy, using a shared queue to keep track of available workers makes a lot of sense.

What to you think of having a built-in default worker queue something like this...

type WorkerPool
    channel::RemoteChannel{Channel{Int}}
end

function WorkerPool(workers::Vector{Int})

    # Create a shared queue of workers...
    pool = RemoteChannel(()->Channel{Int}(128)) 

    # Check that workers are not already part of a pool...
    check = () -> if :_worker_pool in names(Main)
        error("Worker $(myid()) already in a WorkerPool!")
    end
    foreach(fetch, [@spawnat w check() for w in workers])

    # Put each worker into the pool...
    for w in workers
        put!(pool, w)
        @spawnat w global _worker_pool = pool
    end

    WorkerPool(pool)
end

WorkerPool(n::Integer) = WorkerPool(addprocs(n))
WorkerPool() = WorkerPool(addprocs())

Base.take!(pool::WorkerPool) = take!(pool.channel)

function Base.remotecall_fetch(f, pool::WorkerPool, args...)
    l = (args...)->try f(args...) finally put!(_worker_pool, myid()) end
    remotecall_fetch(l, take!(pool), args...)
end

default_worker_pool() = _default_worker_pool
global _default_worker_pool = WorkerPool(workers())

function Base.remotecall_fetch(f, args...)
    remotecall_fetch(f, default_worker_pool(), args...)
end

amitmurthy · 2016-01-30T03:39:00Z

A worker pool will definitely be useful. I suspect there will be some debate about whether to include it in Base or have it as part of an external package.

Since we do not yet have a "standard library" or "standard packages", I am OK with having it in Base for now.

samoconnor · 2016-02-16T20:03:35Z

superseded by #15073

Add idleworker() function.

1607000

tkelman added the needs tests Unit tests are required for this change label Jan 20, 2016

tkelman added parallelism Parallel or distributed computation and removed needs tests Unit tests are required for this change labels Jan 20, 2016

samoconnor force-pushed the idleworker branch from 2570022 to b1e2d84 Compare January 20, 2016 11:18

Test for idleworker()

c6ba369

samoconnor force-pushed the idleworker branch from b1e2d84 to c6ba369 Compare January 20, 2016 13:26

yuyichao reviewed Jan 21, 2016
View reviewed changes

samoconnor mentioned this pull request Jan 29, 2016

RFC: Simplifying and generalising pmap #14843

Closed

samoconnor closed this Feb 16, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

idleworker() function. #14736

idleworker() function. #14736

samoconnor commented Jan 20, 2016

samoconnor commented Jan 20, 2016

tkelman commented Jan 20, 2016

samoconnor commented Jan 20, 2016

samoconnor commented Jan 21, 2016

yuyichao Jan 21, 2016

samoconnor Jan 21, 2016

yuyichao Jan 21, 2016

amitmurthy commented Jan 27, 2016

samoconnor commented Jan 27, 2016

samoconnor commented Jan 28, 2016

amitmurthy commented Jan 29, 2016

samoconnor commented Jan 29, 2016

amitmurthy commented Jan 30, 2016

samoconnor commented Feb 16, 2016

idleworker() function. #14736

idleworker() function. #14736

Conversation

samoconnor commented Jan 20, 2016

samoconnor commented Jan 20, 2016

tkelman commented Jan 20, 2016

samoconnor commented Jan 20, 2016

samoconnor commented Jan 21, 2016

yuyichao Jan 21, 2016

Choose a reason for hiding this comment

samoconnor Jan 21, 2016

Choose a reason for hiding this comment

yuyichao Jan 21, 2016

Choose a reason for hiding this comment

amitmurthy commented Jan 27, 2016

samoconnor commented Jan 27, 2016

samoconnor commented Jan 28, 2016

amitmurthy commented Jan 29, 2016

samoconnor commented Jan 29, 2016

amitmurthy commented Jan 30, 2016

samoconnor commented Feb 16, 2016