-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RFC/WIP: "for-loop" compliant @parallel for [ci skip] #20094
Conversation
chunks = splitrange(lenR, workers()) | ||
accums = get(task_local_storage(), :JULIA_ACCUMULATOR, ()) | ||
if accums !== () | ||
accums = accums[1] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wouldn't it be preferable to use a different variable name, since here (and below) accums
is changing type?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. However, I'll request that we first discuss the API and implementation model. Detailed code review can follow later.
# A function which returns a length value when input the destination pid. | ||
# Used to serialize the same object with different length values depending | ||
# on the destination pid. | ||
destf::Nullable{Function} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this could use a more descriptive name
and description of what it means when the nullable fields are null
new(f, len, len, initial, initial, destf, chnl) | ||
end | ||
|
||
set_destf(pacc::ParallelAccumulator, f::Function) = (pacc.destf = f; pacc) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!
return get(pacc.value) | ||
end | ||
|
||
function reset(pacc::ParallelAccumulator) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Another specialization of the existing exported reset
. I am OK with a new export reset!
too.
throw(ArgumentError(string( | ||
"@accumulate : ", | ||
"First argument must be a variable name pointing to a ParallelAccumulator ", | ||
"or a vector of variable names pointing to ParallelAccumulators. ", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why a vector rather than a tuple?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
When I think "list of accumulators", a vector comes naturally to mind. Tuple or vector, whatever is more natural (or both) can be made to work.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have removed the checks. It can be any collection of ParallelAccumulators.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The specific checks were not required.
julia> acc = map(i->ParallelAccumulator{Array}(vcat, 10), 1:10);
julia> @accumulate acc @parallel for i in 1:10
foreach(x->push!(x, myid()), acc)
end
julia> [wait(a) for a in acc]
10-element Array{Array{Int64,1},1}:
[4,4,5,5,2,2,2,3,3,3]
[4,4,5,5,3,3,3,2,2,2]
[4,4,3,3,3,5,5,2,2,2]
[4,4,3,3,3,5,5,2,2,2]
[4,4,3,3,3,5,5,2,2,2]
[4,4,3,3,3,5,5,2,2,2]
[4,4,3,3,3,5,5,2,2,2]
[4,4,3,3,3,5,5,2,2,2]
[4,4,3,3,3,5,5,2,2,2]
[4,4,3,3,3,5,5,2,2,2]
@tkelman , thanks for the review. At a higher level, would folks like |
5854b3d
to
c720d73
Compare
Will add tests and docs if there is consensus on this design. |
I am inclining towards taking silence as consent. Pinging a couple more folks for feedback - @ViralBShah, @alanedelman |
The reduction forms of |
No rush, but as long as we are in pre-feature freeze I don't see any reason not to go ahead with this change as long as it has a buy-in from folks - it is not like it is a major code change or revamp. #19578 has been open for sometime now and some discussion has taken place, most of it agreeing that the reducer aspect of |
Are we going to use |
No. Only the reducer functionality of |
loop = args[1] | ||
elseif na==2 | ||
elseif na == 2 | ||
depwarn("@parallel with a reducer is deprecated. Use ParallelAccumulators for reduction.", :@parallel) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it'll be a conflict magnet so it can wait a bit, but whenever this proceeds, would be good to leave a comment in the appropriate section of deprecated.jl as a reminder to remove this code
Why is |
The same ParallelAccumulator object on the caller is serialized a little differently to each worker. Specifically, during serialization, depending on the target pid, the length of the sub-range to be processed on the remote node is sent. This is done via a custom serialize implementation that looks up the list of accumulators specified in The alternative to not requiring |
FWIW, there used to be code to identify variables of type Lines 1256 to 1265 in 7681878
If anyone can point me how to do it in the current codebase, I can work with it and we should be able to remove |
How about:
I can see one problem with this: the reduce on each worker's result is not tree-reduce (does |
This is the issue. The accumulators need to know when the for-loop body is done on the worker. That is the information captured in the custom serialization of ParallelAccumulators. The PR does 1,2 and 3 exactly the way you mention. For step 2, the count to wait for is sent depending on the remote pid. |
c720d73
to
4561ba7
Compare
4561ba7
to
2862032
Compare
Have removed
One new export of type |
@@ -1359,6 +1360,7 @@ export | |||
@threadcall, | |||
|
|||
# multiprocessing | |||
@accumulate, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no longer needed?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not required.
A final cleanup, docs and tests are pending.
bump |
Will take a day or two more. Reimplementing this in a simpler fashion. |
It would be really nice to get this into 0.6. |
Simpler/cleaner implementation cooking! |
Superseded by #20259 |
This addresses some of the concerns with
@parallel
as discussed at #19578 in a different manner.@parallel for
differs from a regular for-loop byThis PR:
@parallel for
The syntax is a bit more verbose, but there is much lesser scope for confusion or misplaced expectations.
will now be written as
Multiple accumulators can also be specified as a vector
Updating shared arrays work as before, there is no need for ParallelAccumulators. However, ParallelAccumulators can be used in multi-node scenarios which shared memory cannot address.
As before the input range is partitioned across workers, local reductions performed with a final reduction on the caller.
I feel this syntax and loop behavior is more in-line with a regular for-loop. Updating arrays from the body is still not allowed (except if they are shared of course). ParallelAccumulators does cover that need.
ParallelAccumulator
,@accumulate
wait(::ParallelAccumulator)
- waits for and returns value of distributed computationParallelAccumulator(reducer, length)
- reducer function, count of values to be reduced overpush!(accumulator, value)
- applies reducer over value, stores the result. LocalParallelAccumulator
push results to the caller only once (when local range iteration is complete).Feedback on the overall API and suggestions towards syntax, bikeshedding names are welcome.
Note that with this final result of both
@accumulate acc for-loop
and@accumulate acc @parallel for-loop
is the same as long as the loop only returns data via accumulators