RFC/WIP: "for-loop" compliant @parallel for [ci skip] #20094

amitmurthy · 2017-01-17T19:01:35Z

This addresses some of the concerns with @parallel as discussed at #19578 in a different manner.

@parallel for differs from a regular for-loop by

implementing a reducer functionality
treating the last line in the body as the value to be reduced
returning before the loop completes execution (non-reducer case)
folks are used to looping over local arrays and expect them to be updated. This only works with shared arrays.

This PR:

deprecates the reducer mode of @parallel for
provides a way for the user to explicitly specify accumulators and use them in the main body
can wait on the accumulator(s)

The syntax is a bit more verbose, but there is much lesser scope for confusion or misplaced expectations.

@parallel reducer for x in unit_range
  body
end

will now be written as

acc = ParallelAccumulator(reducer, length)
@accumulate acc @parallel for x in unit_range
  body
  push!(acc, iteration_value)
end
result = wait(acc)

Multiple accumulators can also be specified as a vector

a1 = ParallelAccumulator{Int}(+, 10)
a2 = ParallelAccumulator{Int}(*, 10)
@accumulate [a1,a2] @parallel for i in 1:10
  push!(a1, i)
  push!(a2, i)
end
results = [wait(a1), wait(a2)]

Updating shared arrays work as before, there is no need for ParallelAccumulators. However, ParallelAccumulators can be used in multi-node scenarios which shared memory cannot address.

As before the input range is partitioned across workers, local reductions performed with a final reduction on the caller.

I feel this syntax and loop behavior is more in-line with a regular for-loop. Updating arrays from the body is still not allowed (except if they are shared of course). ParallelAccumulators does cover that need.

New exports : ParallelAccumulator, @accumulate
wait(::ParallelAccumulator) - waits for and returns value of distributed computation
ParallelAccumulator(reducer, length) - reducer function, count of values to be reduced over
push!(accumulator, value) - applies reducer over value, stores the result. Local ParallelAccumulator push results to the caller only once (when local range iteration is complete).

Feedback on the overall API and suggestions towards syntax, bikeshedding names are welcome.

Note that with this final result of both
@accumulate acc for-loop and @accumulate acc @parallel for-loop is the same as long as the loop only returns data via accumulators

amitmurthy · 2017-01-17T19:02:30Z

cc: @andreasnoack , @JeffBezanson , @StefanKarpinski

ararslan · 2017-01-17T19:05:48Z

base/multi.jl

+    chunks = splitrange(lenR, workers())
+    accums = get(task_local_storage(), :JULIA_ACCUMULATOR, ())
+    if accums !== ()
+        accums = accums[1]


Wouldn't it be preferable to use a different variable name, since here (and below) accums is changing type?

Noted. However, I'll request that we first discuss the API and implementation model. Detailed code review can follow later.

tkelman · 2017-01-18T03:44:32Z

base/multi.jl

+    # A function which returns a length value when input the destination pid.
+    # Used to serialize the same object with different length values depending
+    # on the destination pid.
+    destf::Nullable{Function}


this could use a more descriptive name

and description of what it means when the nullable fields are null

tkelman · 2017-01-18T03:45:56Z

base/multi.jl

+                new(f, len, len, initial, initial, destf, chnl)
+end
+
+set_destf(pacc::ParallelAccumulator, f::Function) = (pacc.destf = f; pacc)


tkelman · 2017-01-18T03:56:50Z

base/multi.jl

+    return get(pacc.value)
+end
+
+function reset(pacc::ParallelAccumulator)


Another specialization of the existing exported reset. I am OK with a new export reset! too.

tkelman · 2017-01-18T04:02:48Z

base/multi.jl

+        throw(ArgumentError(string(
+                "@accumulate : ",
+                "First argument must be a variable name pointing to a ParallelAccumulator ",
+                "or a vector of variable names pointing to ParallelAccumulators. ",


why a vector rather than a tuple?

When I think "list of accumulators", a vector comes naturally to mind. Tuple or vector, whatever is more natural (or both) can be made to work.

Have removed the checks. It can be any collection of ParallelAccumulators.

The specific checks were not required.

julia> acc = map(i->ParallelAccumulator{Array}(vcat, 10), 1:10); julia> @accumulate acc @parallel for i in 1:10 foreach(x->push!(x, myid()), acc) end julia> [wait(a) for a in acc] 10-element Array{Array{Int64,1},1}: [4,4,5,5,2,2,2,3,3,3] [4,4,5,5,3,3,3,2,2,2] [4,4,3,3,3,5,5,2,2,2] [4,4,3,3,3,5,5,2,2,2] [4,4,3,3,3,5,5,2,2,2] [4,4,3,3,3,5,5,2,2,2] [4,4,3,3,3,5,5,2,2,2] [4,4,3,3,3,5,5,2,2,2] [4,4,3,3,3,5,5,2,2,2] [4,4,3,3,3,5,5,2,2,2]

amitmurthy · 2017-01-18T05:14:00Z

@tkelman , thanks for the review.

At a higher level, would folks like @parallel to change in this direction? Any other syntax possibilities to make @parallel for more like a regular for-loop and at the same time easy to use in a distributed fashion?

amitmurthy · 2017-01-19T06:31:47Z

Will add tests and docs if there is consensus on this design.

amitmurthy · 2017-01-24T06:14:26Z

I am inclining towards taking silence as consent.

Pinging a couple more folks for feedback - @ViralBShah, @alanedelman

tkelman · 2017-01-24T06:24:50Z

The reduction forms of @parallel are a bit awkward but I'm not sure it's urgent that we rush in a redesign of it right now. Would possibly be better to iterate on these while early in a release cycle, or start moving all of this code to a package asap so it isn't tied to the language's releases any more, rather than feature freezing on something still new and experimental for the duration of 0.6.

amitmurthy · 2017-01-24T06:41:08Z

No rush, but as long as we are in pre-feature freeze I don't see any reason not to go ahead with this change as long as it has a buy-in from folks - it is not like it is a major code change or revamp.

#19578 has been open for sometime now and some discussion has taken place, most of it agreeing that the reducer aspect of @parallel for must be removed.

shashi · 2017-01-24T06:43:56Z

Are we going to use @parallel arbitrary_expression to mean @everywhere arbitrary_expression? I remember this was on the parallel roadmap document or somewhere...

amitmurthy · 2017-01-24T06:49:01Z

Are we going to use @parallel arbitrary_expression to mean @everywhere arbitrary_expression?

No. Only the reducer functionality of @parallel for is being removed to make it more like a regular for-loop. In order to make it useful in a distributed scenario where we are not using shared arrays, a nee mechanism via ParallelAccumulators is being provided.

tkelman · 2017-01-24T06:52:27Z

base/multi.jl

        loop = args[1]
-    elseif na==2
+    elseif na == 2
+        depwarn("@parallel with a reducer is deprecated. Use ParallelAccumulators for reduction.", :@parallel)


it'll be a conflict magnet so it can wait a bit, but whenever this proceeds, would be good to leave a comment in the appropriate section of deprecated.jl as a reminder to remove this code

shashi · 2017-01-24T09:47:45Z

Why is @accumulate acc needed?

amitmurthy · 2017-01-24T10:07:24Z

The same ParallelAccumulator object on the caller is serialized a little differently to each worker. Specifically, during serialization, depending on the target pid, the length of the sub-range to be processed on the remote node is sent. This is done via a custom serialize implementation that looks up the list of accumulators specified in @accumulate acc and sets up a callback that returns the correct length when input a pid during serialization. The list of accumulators is passed via task local storage.

The alternative to not requiring @accumulate acc would be to parse the body of @parallel for and extract all variables bound to ParallelAccumulator objects in pfor - after all macros in the body has been suitably expanded I guess. How simple/complicated would that be?

amitmurthy · 2017-01-24T11:30:08Z

FWIW, there used to be code to identify variables of type RemoteRef in a thunk -

julia/base/multi.jl

Lines 1256 to 1265 in 7681878

    
           if isa(env,Tuple) 
        
               for v in env 
        
                   if isa(v,Box) 
        
                       v = v.contents 
        
                   end 
        
                   if isa(v,RemoteRef) 
        
                       p = v.where; break 
        
                   end 
        
               end 
        
           end

If anyone can point me how to do it in the current codebase, I can work with it and we should be able to remove @accumulate acc and auto-detect accumulators used in the @parallel for-loop body.

shashi · 2017-01-24T11:58:58Z

How about:

When serializing for a @parallel for loop, keep track of workers you are sending the accumulator to
Workers report back updates to the acc when they are done executing their part of the for loop
Once all workers to whom the accumulator was sent report back, reduce the answers at the master and notify the condition that releases wait(acc)...

I can see one problem with this: the reduce on each worker's result is not tree-reduce (does preduce do that?). But that's not impossible to implement this way too.

amitmurthy · 2017-01-24T12:08:16Z

Workers report back updates to the acc when they are done executing their part of the for loop.

This is the issue. The accumulators need to know when the for-loop body is done on the worker. That is the information captured in the custom serialization of ParallelAccumulators.

The PR does 1,2 and 3 exactly the way you mention. For step 2, the count to wait for is sent depending on the remote pid.

amitmurthy · 2017-01-24T18:49:19Z

Have removed @accumulate. Looking much better now.

julia> g1 = ParallelAccumulator{Int}(+, 10);

julia> g2 = ParallelAccumulator{Int}(*, 10);

julia> function foo()
           l1 = ParallelAccumulator{String}(string, 10);
           l2 = ParallelAccumulator{Array}(vcat, 10);
           @parallel for i in 1:10
               push!(g1, i)
               push!(g2, i)
               push!(l1, i)
               push!(l2, (i, myid()))
           end

           results = [wait(x) for x in [g1,g2,l1,l2]]
       end
foo (generic function with 1 method)

julia> foo()
4-element Array{Any,1}:
      55                                                                                
 3628800                                                                                
        "78910456123"                                                                   
        Tuple{Int64,Int64}[(1,2),(2,2),(3,2),(4,3),(5,3),(6,3),(9,5),(10,5),(7,4),(8,4)]

One new export of type ParallelAccumulator only.

tkelman · 2017-01-24T19:33:55Z

base/exports.jl

@@ -1359,6 +1360,7 @@ export
    @threadcall,

    # multiprocessing
+    @accumulate,


no longer needed?

Not required.

A final cleanup, docs and tests are pending.

StefanKarpinski · 2017-01-26T15:51:18Z

bump

amitmurthy · 2017-01-26T15:53:23Z

Will take a day or two more. Reimplementing this in a simpler fashion.

ViralBShah · 2017-01-26T15:54:52Z

It would be really nice to get this into 0.6.

amitmurthy · 2017-01-26T16:03:49Z

Simpler/cleaner implementation cooking!

amitmurthy · 2017-01-26T18:48:46Z

Superseded by #20259

ararslan added the parallelism Parallel or distributed computation label Jan 17, 2017

ararslan reviewed Jan 17, 2017

View reviewed changes

tkelman added needs docs Documentation for this change is required needs tests Unit tests are required for this change labels Jan 17, 2017

tkelman reviewed Jan 18, 2017

View reviewed changes

amitmurthy force-pushed the amitm/parfor branch from 5854b3d to c720d73 Compare January 18, 2017 08:08

amitmurthy added the needs decision A decision on this change is needed label Jan 19, 2017

amitmurthy added this to the 0.6.0 milestone Jan 19, 2017

tkelman reviewed Jan 24, 2017

View reviewed changes

amitmurthy added 2 commits January 24, 2017 23:04

"for-loop" compliant @parallel for [ci skip]

c989487

remove checks for acc types. [ci skip]

c65d57a

amitmurthy force-pushed the amitm/parfor branch from c720d73 to 4561ba7 Compare January 24, 2017 18:43

remove @accumulate acc [ci ckip]

2862032

amitmurthy force-pushed the amitm/parfor branch from 4561ba7 to 2862032 Compare January 24, 2017 18:44

tkelman reviewed Jan 24, 2017

View reviewed changes

amitmurthy mentioned this pull request Jan 26, 2017

RFC: "for-loop" compliant @parallel for.... take 2 #20259

Closed

4 tasks

amitmurthy closed this Jan 26, 2017

StefanKarpinski deleted the amitm/parfor branch January 26, 2017 18:58

StefanKarpinski restored the amitm/parfor branch January 26, 2017 18:58

amitmurthy deleted the amitm/parfor branch January 26, 2017 19:13

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RFC/WIP: "for-loop" compliant @parallel for [ci skip] #20094

RFC/WIP: "for-loop" compliant @parallel for [ci skip] #20094

amitmurthy commented Jan 17, 2017

amitmurthy commented Jan 17, 2017

ararslan Jan 17, 2017 •

edited

Loading

amitmurthy Jan 17, 2017

tkelman Jan 18, 2017 •

edited

Loading

tkelman Jan 18, 2017

tkelman Jan 18, 2017

amitmurthy Jan 18, 2017

tkelman Jan 18, 2017

amitmurthy Jan 18, 2017

amitmurthy Jan 18, 2017

amitmurthy Jan 18, 2017

amitmurthy commented Jan 18, 2017

amitmurthy commented Jan 19, 2017

amitmurthy commented Jan 24, 2017

tkelman commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

shashi commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

tkelman Jan 24, 2017

shashi commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

shashi commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

tkelman Jan 24, 2017

amitmurthy Jan 24, 2017

StefanKarpinski commented Jan 26, 2017

amitmurthy commented Jan 26, 2017

ViralBShah commented Jan 26, 2017

amitmurthy commented Jan 26, 2017

amitmurthy commented Jan 26, 2017

RFC/WIP: "for-loop" compliant @parallel for [ci skip] #20094

RFC/WIP: "for-loop" compliant @parallel for [ci skip] #20094

Conversation

amitmurthy commented Jan 17, 2017

amitmurthy commented Jan 17, 2017

ararslan Jan 17, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tkelman Jan 18, 2017 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

amitmurthy commented Jan 18, 2017

amitmurthy commented Jan 19, 2017

amitmurthy commented Jan 24, 2017

tkelman commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

shashi commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

Choose a reason for hiding this comment

shashi commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

shashi commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

amitmurthy commented Jan 24, 2017

Choose a reason for hiding this comment

Choose a reason for hiding this comment

StefanKarpinski commented Jan 26, 2017

amitmurthy commented Jan 26, 2017

ViralBShah commented Jan 26, 2017

amitmurthy commented Jan 26, 2017

amitmurthy commented Jan 26, 2017

ararslan Jan 17, 2017 •

edited

Loading

tkelman Jan 18, 2017 •

edited

Loading