-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fixes tmapreduce
and tmapadd
for v0.7 using batch loads
#3
Conversation
@mohamed82008 Can we close #2 then and simply focus on this (I understand that this is a superset of chages)? |
The 2 PRs touch different parts of the code, so they can be merged one after another, and any rebase needed would be very simple in case I make changes to #2 that confuses git. If you want to focus on one big PR, then you can close #2 and review this but I think #2 is less "controversial" than this one in the sense that it doesn't have any functional change, just an upgrade and prettifying. It's your call! |
This is a superset of changes yes, but if #2 is merged and this is rebased, then it would be touching a different part of the code. |
Can you please rebase this PR? Then I will review it and merge. Thanks! |
Done. |
Thx. I will merge it and then clean up and let you know to have a look. |
I do not understand why:
is needed? It seems that it should not change anything. |
It was triggering some closure bug if I remember correctly, in one of the functions. This helped speed things up a little. You may want to double check. |
Tests are not passing anymore on my machine. It's the |
So also can you check this on your testsets:
Actually I cannot see where closure problems should appear - it could be rather type stability problem. The only situation where this would make a difference is if |
Inference is failing with that one. |
But it is failing in the same place as in your code. You only "cover the problem" by forcing a conversion at the very end. If you run
(to see this you have to make sure that you run the actual function not a wrapper generated when keyword arguments are processed). |
Actually - this is where batch processing seems to help most, as |
Ya it is still speeding up even though inference is struggling. Would be interesting to read |
Seems to be inferring fine. The one with the Body::Float64
1 ── %1 = Base.getfield(%%#temp#, :init)::Float64 │╻ getindex
└─── goto 3 if not false │
2 ── nothing │
3 ┄─ %4 = Base.arraylen(%%src)::Int64 │╻ length
│ %5 = Base.sitofp(Float64, %4)::Float64 ││╻╷╷╷ sqrt
│ %6 = Base.lt_float(%5, 0.0)::Bool │││╻ sqrt
└─── goto 5 if not %6 ││││
4 ── invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, %5::Float64) ││││
└─── unreachable ││││
5 ── %10 = Base.Math.sqrt_llvm(%5)::Float64 ││││
└─── goto 6 ││││
6 ── goto 7 │││
7 ── %13 = Base.mul_float(10.0, %10)::Float64 │││╻ *
│ %14 = Base.rint_llvm(%13)::Float64 │││╻ round
│ %15 = Base.le_float(-9.223372036854776e18, %14)::Bool ││││╻ <=
└─── goto 9 if not %15 ││││
8 ── %17 = Base.lt_float(%14, 9.223372036854776e18)::Bool ││││╻ <
└─── goto 10 ││││
9 ── nothing │
10 ┄ %20 = φ (8 => %17, 9 => false)::Bool ││││
└─── goto 12 if not %20 ││││
11 ─ %22 = Base.fptosi(Int64, %14)::Int64 ││││╻ unsafe_trunc
└─── goto 13 ││││
12 ─ %24 = invoke Base.InexactError(:trunc::Symbol, Int64::Any, %14::Any)::InexactError ││││
│ Base.throw(%24) ││││
└─── unreachable ││││
13 ─ goto 14 │││
14 ─ %28 = Base.slt_int(%22, %4)::Bool │││╻ <
│ %29 = Base.ifelse(%28, %22, %4)::Int64 │││
└─── goto 15 ││
15 ─ goto 16 if not false │╻ isempty
16 ┄ %32 = Base.slt_int(0, 1)::Bool ││╻╷╷╷ iterate
└─── goto 18 if not %32 │││┃│ iterate
17 ─ goto 19 ││││┃ iterate
18 ─ invoke Base.getindex(()::Tuple{}, 1::Int64) │││││
└─── unreachable │││││
19 ─ goto 20 ││││
20 ─ goto 21 ││╻ iterate
21 ─ goto 22 ││
22 ─ nothing │
│ %41 = invoke KissThreading.:(#tmapreduce#7)(%1::Float64, %29::Int64, %%::Function, %%f::typeof(log), %%op::typeof(+), %%src::Array{Float64,1})::Float64 │
└─── return %41 |
I don't understand this line but this is the only invoke Base.InexactError(:trunc::Symbol, Int64::Any, %14::Any)::InexactError |
function tmapreduce(f::Function, op::Function, src::AbstractVector; init::T, batch_size=default_batch_size(length(src))) where T
r = deepcopy(init)
i = Threads.Atomic{Int}(1)
l = Threads.SpinLock()
ls = length(src)
nt = Threads.nthreads()
return let r::T = r;
Threads.@threads for j in 1:nt
k = Threads.atomic_add!(i, batch_size)
k > ls && continue
x = f(src[k])
range = (k+1):min(k+batch_size-1, ls)
for idx in range
x = op(x, f(src[idx]))
end
k = Threads.atomic_add!(i, batch_size)
while k ≤ ls
range = k:min(k+batch_size-1, ls)
for idx in range
x = op(x, f(src[idx]))
end
k = Threads.atomic_add!(i, batch_size)
end
Threads.lock(l)
r = op(r, x)
Threads.unlock(l)
end
r
end
end |
Here is a MWE of the problem:
Non threaded version of the same (without |
Can you pass me the exact command that you sent to Julia where you have no type inference problems? |
OK - I see where you do not see the problem. You are passing mi the inference of the wrapper function. Actually the last line is our function - it is only called in your dump, but you do not see what happens inside:
|
I see. Btw the julia> function f()
x = Ref(1)
l = Threads.SpinLock()
Threads.@threads for i in 1:Threads.nthreads()
Threads.lock(l)
x[] += i
Threads.unlock(l)
end
x
end |
Let us see if we get some help here: https://discourse.julialang.org/t/type-inference-in-closures/12544. |
Can you post |
Sure. Btw you can find the Ref workaround together with some cleanup in my fork's master branch. I made a separate branch with Travis setup https://github.com/mohamed82008/KissThreading.jl/tree/travis just to see how things are. |
Someone beat me to it! |
Refs seem to work fine with arrays btw. I don't understand your concern. |
Oh I see what you mean, it converts it to a pointer. |
This PR is based on #2. It does batch load dispatch to threads achieving near linear (sometimes super-linear) scaling in test cases. Feedback appreciated!
Closes #1.