Fixes `tmapreduce` and `tmapadd` for v0.7 using batch loads #3

mohamed82008 · 2018-07-11T18:18:18Z

This PR is based on #2. It does batch load dispatch to threads achieving near linear (sometimes super-linear) scaling in test cases. Feedback appreciated!

Closes #1.

bkamins · 2018-07-11T21:33:17Z

@mohamed82008 Can we close #2 then and simply focus on this (I understand that this is a superset of chages)?

mohamed82008 · 2018-07-12T02:18:35Z

The 2 PRs touch different parts of the code, so they can be merged one after another, and any rebase needed would be very simple in case I make changes to #2 that confuses git. If you want to focus on one big PR, then you can close #2 and review this but I think #2 is less "controversial" than this one in the sense that it doesn't have any functional change, just an upgrade and prettifying. It's your call!

mohamed82008 · 2018-07-12T02:19:43Z

This is a superset of changes yes, but if #2 is merged and this is rebased, then it would be touching a different part of the code.

bkamins · 2018-07-20T09:16:40Z

Can you please rebase this PR? Then I will review it and merge. Thanks!
After this is merged I will probably do some code refactoring to prepare the package to be registered.

mohamed82008 · 2018-07-20T17:43:34Z

Done.

bkamins · 2018-07-20T18:16:04Z

Thx. I will merge it and then clean up and let you know to have a look.

bkamins · 2018-07-20T18:24:34Z

I do not understand why:

return let r::T = r;

is needed? It seems that it should not change anything.

mohamed82008 · 2018-07-20T19:18:53Z

It was triggering some closure bug if I remember correctly, in one of the functions. This helped speed things up a little. You may want to double check.

mohamed82008 · 2018-07-20T19:23:25Z

Tests are not passing anymore on my machine. It's the _randjump stuff.

bkamins · 2018-07-20T19:25:59Z

So also can you check this on your testsets:

function tmapreduce(f::Function, op::Function, src::AbstractVector; init,
                    batch_size=default_batch_size(length(src)))
    r = deepcopy(init)
    i = Threads.Atomic{Int}(1)
    l = Threads.SpinLock()
    ls = length(src)
    Threads.@threads for j in 1:Threads.nthreads()
        k = Threads.atomic_add!(i, batch_size)
        k > ls && continue
        x = f(src[k])
        for idx in (k+1):min(k+batch_size-1, ls)
            x = op(x, f(src[idx]))
        end
        k = Threads.atomic_add!(i, batch_size)
        while k ≤ ls
            for idx in k:min(k+batch_size-1, ls)
                x = op(x, f(src[idx]))
            end
            k = Threads.atomic_add!(i, batch_size)
        end
        Threads.lock(l)
        r = op(r, x)
        Threads.unlock(l)
    end
    r
end

Actually I cannot see where closure problems should appear - it could be rather type stability problem. The only situation where this would make a difference is if typeof(op(r,x)) is not typeof(r), but then also your assertion r::T might fail as it might be impossible to convert op(r,x) to typeof(r).

mohamed82008 · 2018-07-20T19:38:16Z

Inference is failing with that one.

bkamins · 2018-07-20T19:40:42Z

But it is failing in the same place as in your code. You only "cover the problem" by forcing a conversion at the very end. If you run @code_warntype on a full version of your code you will see at the end something like:

  14 ┄ %44 = Core.getfield(%9, :contents)::Any                                                                 ││
  │          Core.typeassert(%44, :($(Expr(:static_parameter, 1))))                                            ││
  │    %46 = π (%44, Int64)                                                                                    ││
  └───       return %46

(to see this you have to make sure that you run the actual function not a wrapper generated when keyword arguments are processed).

bkamins · 2018-07-20T19:42:35Z

Actually - this is where batch processing seems to help most, as x is inferred correctly and we run op(r, x) rarely when processing in batches so this is not a big issue. But I will try to make inference work correctly.

mohamed82008 · 2018-07-20T19:43:20Z

Ya it is still speeding up even though inference is struggling. Would be interesting to read Base.mapreduce

mohamed82008 · 2018-07-20T19:46:22Z

Seems to be inferring fine. The one with the ::T trick.

Body::Float64
 1 ── %1  = Base.getfield(%%#temp#, :init)::Float64                                                                                                                                      │╻     getindex
 └───       goto 3 if not false                                                                                                                                                          │
 2 ──       nothing                                                                                                                                                                      │
 3 ┄─ %4  = Base.arraylen(%%src)::Int64                                                                                                                                                  │╻     length
 │    %5  = Base.sitofp(Float64, %4)::Float64                                                                                                                                            ││╻╷╷╷  sqrt
 │    %6  = Base.lt_float(%5, 0.0)::Bool                                                                                                                                                 │││╻     sqrt
 └───       goto 5 if not %6                                                                                                                                                             ││││
 4 ──       invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, %5::Float64)                                                                                                       ││││
 └───       unreachable                                                                                                                                                                  ││││
 5 ── %10 = Base.Math.sqrt_llvm(%5)::Float64                                                                                                                                             ││││
 └───       goto 6                                                                                                                                                                       ││││
 6 ──       goto 7                                                                                                                                                                       │││
 7 ── %13 = Base.mul_float(10.0, %10)::Float64                                                                                                                                           │││╻     *
 │    %14 = Base.rint_llvm(%13)::Float64                                                                                                                                                 │││╻     round
 │    %15 = Base.le_float(-9.223372036854776e18, %14)::Bool                                                                                                                              ││││╻     <=
 └───       goto 9 if not %15                                                                                                                                                            ││││
 8 ── %17 = Base.lt_float(%14, 9.223372036854776e18)::Bool                                                                                                                               ││││╻     <
 └───       goto 10                                                                                                                                                                      ││││
 9 ──       nothing                                                                                                                                                                      │
 10 ┄ %20 = φ (8 => %17, 9 => false)::Bool                                                                                                                                               ││││
 └───       goto 12 if not %20                                                                                                                                                           ││││
 11 ─ %22 = Base.fptosi(Int64, %14)::Int64                                                                                                                                               ││││╻     unsafe_trunc
 └───       goto 13                                                                                                                                                                      ││││
 12 ─ %24 = invoke Base.InexactError(:trunc::Symbol, Int64::Any, %14::Any)::InexactError                                                                                                 ││││
 │          Base.throw(%24)                                                                                                                                                              ││││
 └───       unreachable                                                                                                                                                                  ││││
 13 ─       goto 14                                                                                                                                                                      │││
 14 ─ %28 = Base.slt_int(%22, %4)::Bool                                                                                                                                                  │││╻     <
 │    %29 = Base.ifelse(%28, %22, %4)::Int64                                                                                                                                             │││
 └───       goto 15                                                                                                                                                                      ││
 15 ─       goto 16 if not false                                                                                                                                                         │╻     isempty
 16 ┄ %32 = Base.slt_int(0, 1)::Bool                                                                                                                                                     ││╻╷╷╷  iterate
 └───       goto 18 if not %32                                                                                                                                                           │││┃│    iterate
 17 ─       goto 19                                                                                                                                                                      ││││┃     iterate
 18 ─       invoke Base.getindex(()::Tuple{}, 1::Int64)                                                                                                                                  │││││
 └───       unreachable                                                                                                                                                                  │││││
 19 ─       goto 20                                                                                                                                                                      ││││
 20 ─       goto 21                                                                                                                                                                      ││╻     iterate
 21 ─       goto 22                                                                                                                                                                      ││
 22 ─       nothing                                                                                                                                                                      │
 │    %41 = invoke KissThreading.:(#tmapreduce#7)(%1::Float64, %29::Int64, %%::Function, %%f::typeof(log), %%op::typeof(+), %%src::Array{Float64,1})::Float64                            │
 └───       return %41

mohamed82008 · 2018-07-20T19:47:20Z

I don't understand this line but this is the only Any I can see.

invoke Base.InexactError(:trunc::Symbol, Int64::Any, %14::Any)::InexactError

mohamed82008 · 2018-07-20T19:47:34Z

function tmapreduce(f::Function, op::Function, src::AbstractVector; init::T, batch_size=default_batch_size(length(src))) where T 
    r = deepcopy(init)
    i = Threads.Atomic{Int}(1)
    l = Threads.SpinLock()
    ls = length(src)
    nt = Threads.nthreads()
    return let r::T = r;
        Threads.@threads for j in 1:nt
            k = Threads.atomic_add!(i, batch_size)
            k > ls && continue
            x = f(src[k])
            range = (k+1):min(k+batch_size-1, ls)
            for idx in range
                x = op(x, f(src[idx]))
            end
            k = Threads.atomic_add!(i, batch_size)
            while k ≤ ls
                range = k:min(k+batch_size-1, ls)
                for idx in range
                    x = op(x, f(src[idx]))
                end
                k = Threads.atomic_add!(i, batch_size)
            end
            Threads.lock(l)
            r = op(r, x)
            Threads.unlock(l)
        end
        r
    end
end

bkamins · 2018-07-20T19:51:52Z

Here is a MWE of the problem:

function f()
    x = 1
    l = Threads.SpinLock()

    Threads.@threads for i in 1:Threads.nthreads()
        Threads.lock(l)
        x = x + i
        Threads.unlock(l)
    end
    x
end

Non threaded version of the same (without Threads.@threads) does correct type inference. This is an issue with closures as you have suggested, but let trick here does not help. I we do not solve it I will ask on Discourse.

bkamins · 2018-07-20T19:56:59Z

Can you pass me the exact command that you sent to Julia where you have no type inference problems?

bkamins · 2018-07-20T19:59:50Z

OK - I see where you do not see the problem. You are passing mi the inference of the wrapper function. Actually the last line is our function - it is only called in your dump, but you do not see what happens inside:

invoke KissThreading.:(#tmapreduce#7)(%1::Float64, %29::Int64, %%::Function, %%f::typeof(log), %%op::typeof(+), %%src::Array{Float64,1})::Float64                            │
 └───       return %41

mohamed82008 · 2018-07-20T20:01:04Z

I see. Btw the Ref workaround works in the above example. I really have no sense for why some of these workarounds work and others don't.

 julia> function f()
           x = Ref(1)
           l = Threads.SpinLock()
           Threads.@threads for i in 1:Threads.nthreads()
               Threads.lock(l)
               x[] += i
               Threads.unlock(l)
           end
           x
       end

bkamins · 2018-07-20T20:03:26Z

Let us see if we get some help here: https://discourse.julialang.org/t/type-inference-in-closures/12544.

bkamins · 2018-07-20T20:29:17Z

Can you post Ref on Discourse as this is very nice. However, I avoid it in my code, because currently Ref does not work in general, e.g. you cannot use it with arrays unfortunately (but maybe there is no choice).

mohamed82008 · 2018-07-20T20:36:25Z

Sure. Btw you can find the Ref workaround together with some cleanup in my fork's master branch. I made a separate branch with Travis setup https://github.com/mohamed82008/KissThreading.jl/tree/travis just to see how things are.

mohamed82008 · 2018-07-20T20:37:23Z

Someone beat me to it!

mohamed82008 · 2018-07-20T20:38:56Z

Refs seem to work fine with arrays btw. I don't understand your concern.

mohamed82008 · 2018-07-20T20:39:34Z

Oh I see what you mean, it converts it to a pointer.

mohamed82008 added 3 commits July 21, 2018 03:41

Fix tmapreduce, tmapadd and add tests

38a07a5

Make fair comparison

70e6d40

Batch loads in tmapreduce and tmapadd

adc42bf

bkamins merged commit 38730e2 into mohamed82008:master Jul 20, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fixes `tmapreduce` and `tmapadd` for v0.7 using batch loads #3

Fixes `tmapreduce` and `tmapadd` for v0.7 using batch loads #3

mohamed82008 commented Jul 11, 2018

bkamins commented Jul 11, 2018

mohamed82008 commented Jul 12, 2018

mohamed82008 commented Jul 12, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

Fixes tmapreduce and tmapadd for v0.7 using batch loads #3

Fixes tmapreduce and tmapadd for v0.7 using batch loads #3

Conversation

mohamed82008 commented Jul 11, 2018

bkamins commented Jul 11, 2018

mohamed82008 commented Jul 12, 2018

mohamed82008 commented Jul 12, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

bkamins commented Jul 20, 2018

bkamins commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

mohamed82008 commented Jul 20, 2018

Fixes `tmapreduce` and `tmapadd` for v0.7 using batch loads #3

Fixes `tmapreduce` and `tmapadd` for v0.7 using batch loads #3