-
Notifications
You must be signed in to change notification settings - Fork 13
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Correct way to parallelize this code? #9
Comments
What |
xref this thread, btw: https://discourse.julialang.org/t/improving-an-algorithm-that-compute-gps-distances/38213/38 |
Hi, I ran into what I guess is a similar issue. I've started looking at threading, and hoped to make things as simple as possible. From the comment here JuliaLang/julia#19777 (comment) I thought
The result is that the
Edit: I'm using Strided v1.1.1 and Julia 1.5.3. |
I can reproduce your timings and observe that something is going wrong with threading. When printing out the |
Although, digging a bit deeper, I don't think that's really the issue. Giving the large number of allocations, there seems to be something going on with type inference. Note that I have my own mechanism of implementing broadcasting, which is different from the one of base, and apparently the way I do it, it fails to be inferable for such a complicated right hand side. This is not because it cannot deal with complicated functions, but it retains the whole expression and the expression itself is too complicated. So if I change your code to f(x) = sin(x) + cos(x) * exp(x) - exp(x^2) * sin(2*x) + tan(3*x)
function dostuff_broadcast(x)
return @. f(x)
end
function dostuff_strided(x)
return @strided @. f(x)
end
function dostuff_unsafe_strided(x)
return @unsafe_strided x @. f(x)
end
function dostuff_threaded(x)
result = similar(x)
@threads for i ∈ eachindex(x)
result[i] = f(x[i])
end
return result
end I obtain
As a side remark, note that with function dostuff_unsafe_strided(x)
y = similar(x)
@unsafe_strided y x @. y = f(x)
return y
end to get a normal |
I guess this has to do with some compiler heuristics on the complexity of tuples and nested parametric types. |
Thanks for the tips @Jutho , that helps a lot! I noticed that on 10 threads the 'by hand' |
There is a constant But indeed, also the analysis performed by Strided.jl has some overhead, and is quite generic, trying to deal with a combination of permuted arrays, so that there are several nested loops without a clear preferable loop order. The simple case of a parallelising a plain broadcast operation was not my main concern when implementing this package. Maybe that particular case can be separated out early in the analysis, but probably there are already other packages which are much better at this. |
👍 thanks for the info!
Sorry, I'm very new to Julia... if it's obvious what any of those other packages are, I'd be interested to know. |
I have not been following very actively myself all the recent developments. I think you want to check out things like LoopVectorization.jl , which will also do other optimizations |
Original function:
Version 1, runs in single core:
Version 2, uses all the cores but somehow runs 10x slower:
The text was updated successfully, but these errors were encountered: