Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fixes tmapreduce and tmapadd for v0.7 using batch loads #3

Merged
merged 3 commits into from
Jul 20, 2018
Merged

Fixes tmapreduce and tmapadd for v0.7 using batch loads #3

merged 3 commits into from
Jul 20, 2018

Conversation

mohamed82008
Copy link
Owner

This PR is based on #2. It does batch load dispatch to threads achieving near linear (sometimes super-linear) scaling in test cases. Feedback appreciated!

Closes #1.

@bkamins
Copy link
Collaborator

bkamins commented Jul 11, 2018

@mohamed82008 Can we close #2 then and simply focus on this (I understand that this is a superset of chages)?

@mohamed82008
Copy link
Owner Author

The 2 PRs touch different parts of the code, so they can be merged one after another, and any rebase needed would be very simple in case I make changes to #2 that confuses git. If you want to focus on one big PR, then you can close #2 and review this but I think #2 is less "controversial" than this one in the sense that it doesn't have any functional change, just an upgrade and prettifying. It's your call!

@mohamed82008
Copy link
Owner Author

This is a superset of changes yes, but if #2 is merged and this is rebased, then it would be touching a different part of the code.

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

Can you please rebase this PR? Then I will review it and merge. Thanks!
After this is merged I will probably do some code refactoring to prepare the package to be registered.

@mohamed82008
Copy link
Owner Author

Done.

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

Thx. I will merge it and then clean up and let you know to have a look.

@bkamins bkamins merged commit 38730e2 into mohamed82008:master Jul 20, 2018
@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

I do not understand why:

return let r::T = r;

is needed? It seems that it should not change anything.

@mohamed82008
Copy link
Owner Author

It was triggering some closure bug if I remember correctly, in one of the functions. This helped speed things up a little. You may want to double check.

@mohamed82008
Copy link
Owner Author

Tests are not passing anymore on my machine. It's the _randjump stuff.

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

So also can you check this on your testsets:

function tmapreduce(f::Function, op::Function, src::AbstractVector; init,
                    batch_size=default_batch_size(length(src)))
    r = deepcopy(init)
    i = Threads.Atomic{Int}(1)
    l = Threads.SpinLock()
    ls = length(src)
    Threads.@threads for j in 1:Threads.nthreads()
        k = Threads.atomic_add!(i, batch_size)
        k > ls && continue
        x = f(src[k])
        for idx in (k+1):min(k+batch_size-1, ls)
            x = op(x, f(src[idx]))
        end
        k = Threads.atomic_add!(i, batch_size)
        while k ≤ ls
            for idx in k:min(k+batch_size-1, ls)
                x = op(x, f(src[idx]))
            end
            k = Threads.atomic_add!(i, batch_size)
        end
        Threads.lock(l)
        r = op(r, x)
        Threads.unlock(l)
    end
    r
end

Actually I cannot see where closure problems should appear - it could be rather type stability problem. The only situation where this would make a difference is if typeof(op(r,x)) is not typeof(r), but then also your assertion r::T might fail as it might be impossible to convert op(r,x) to typeof(r).

@mohamed82008
Copy link
Owner Author

Inference is failing with that one.

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

But it is failing in the same place as in your code. You only "cover the problem" by forcing a conversion at the very end. If you run @code_warntype on a full version of your code you will see at the end something like:

  14 ┄ %44 = Core.getfield(%9, :contents)::Any                                                                 ││
  │          Core.typeassert(%44, :($(Expr(:static_parameter, 1))))                                            ││
  │    %46 = π (%44, Int64)                                                                                    ││
  └───       return %46

(to see this you have to make sure that you run the actual function not a wrapper generated when keyword arguments are processed).

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

Actually - this is where batch processing seems to help most, as x is inferred correctly and we run op(r, x) rarely when processing in batches so this is not a big issue. But I will try to make inference work correctly.

@mohamed82008
Copy link
Owner Author

Ya it is still speeding up even though inference is struggling. Would be interesting to read Base.mapreduce

@mohamed82008
Copy link
Owner Author

Seems to be inferring fine. The one with the ::T trick.

Body::Float64
 1 ── %1  = Base.getfield(%%#temp#, :init)::Float64                                                                                                                                      │╻     getindex
 └───       goto 3 if not false2 ──       nothing3 ┄─ %4  = Base.arraylen(%%src)::Int64                                                                                                                                                  │╻     length
 │    %5  = Base.sitofp(Float64, %4)::Float64                                                                                                                                            ││╻╷╷╷  sqrt
 │    %6  = Base.lt_float(%5, 0.0)::Bool                                                                                                                                                 │││╻     sqrt
 └───       goto 5 if not %6                                                                                                                                                             ││││
 4 ──       invoke Base.Math.throw_complex_domainerror(:sqrt::Symbol, %5::Float64)                                                                                                       ││││
 └───       unreachable                                                                                                                                                                  ││││
 5 ── %10 = Base.Math.sqrt_llvm(%5)::Float64                                                                                                                                             ││││
 └───       goto 6                                                                                                                                                                       ││││
 6 ──       goto 7                                                                                                                                                                       │││
 7 ── %13 = Base.mul_float(10.0, %10)::Float64                                                                                                                                           │││╻     *%14 = Base.rint_llvm(%13)::Float64                                                                                                                                                 │││╻     round
 │    %15 = Base.le_float(-9.223372036854776e18, %14)::Bool                                                                                                                              ││││╻     <=
 └───       goto 9 if not %15                                                                                                                                                            ││││
 8 ── %17 = Base.lt_float(%14, 9.223372036854776e18)::Bool                                                                                                                               ││││╻     <
 └───       goto 10                                                                                                                                                                      ││││
 9 ──       nothing10%20 = φ (8 => %17, 9 => false)::Bool                                                                                                                                               ││││
 └───       goto 12 if not %20                                                                                                                                                           ││││
 11%22 = Base.fptosi(Int64, %14)::Int64                                                                                                                                               ││││╻     unsafe_trunc
 └───       goto 13                                                                                                                                                                      ││││
 12%24 = invoke Base.InexactError(:trunc::Symbol, Int64::Any, %14::Any)::InexactError                                                                                                 ││││
 │          Base.throw(%24)                                                                                                                                                              ││││
 └───       unreachable                                                                                                                                                                  ││││
 13 ─       goto 14                                                                                                                                                                      │││
 14%28 = Base.slt_int(%22, %4)::Bool                                                                                                                                                  │││╻     <%29 = Base.ifelse(%28, %22, %4)::Int64                                                                                                                                             │││
 └───       goto 15                                                                                                                                                                      ││
 15 ─       goto 16 if not false                                                                                                                                                         │╻     isempty
 16%32 = Base.slt_int(0, 1)::Bool                                                                                                                                                     ││╻╷╷╷  iterate
 └───       goto 18 if not %32                                                                                                                                                           │││┃│    iterate
 17 ─       goto 19                                                                                                                                                                      ││││┃     iterate
 18 ─       invoke Base.getindex(()::Tuple{}, 1::Int64)                                                                                                                                  │││││
 └───       unreachable                                                                                                                                                                  │││││
 19 ─       goto 20                                                                                                                                                                      ││││
 20 ─       goto 21                                                                                                                                                                      ││╻     iterate
 21 ─       goto 22                                                                                                                                                                      ││
 22nothing                                                                                                                                                                      │
 │    %41 = invoke KissThreading.:(#tmapreduce#7)(%1::Float64, %29::Int64, %%::Function, %%f::typeof(log), %%op::typeof(+), %%src::Array{Float64,1})::Float64                            │
 └───       return %41                          

@mohamed82008
Copy link
Owner Author

I don't understand this line but this is the only Any I can see.

invoke Base.InexactError(:trunc::Symbol, Int64::Any, %14::Any)::InexactError

@mohamed82008
Copy link
Owner Author

function tmapreduce(f::Function, op::Function, src::AbstractVector; init::T, batch_size=default_batch_size(length(src))) where T 
    r = deepcopy(init)
    i = Threads.Atomic{Int}(1)
    l = Threads.SpinLock()
    ls = length(src)
    nt = Threads.nthreads()
    return let r::T = r;
        Threads.@threads for j in 1:nt
            k = Threads.atomic_add!(i, batch_size)
            k > ls && continue
            x = f(src[k])
            range = (k+1):min(k+batch_size-1, ls)
            for idx in range
                x = op(x, f(src[idx]))
            end
            k = Threads.atomic_add!(i, batch_size)
            while k  ls
                range = k:min(k+batch_size-1, ls)
                for idx in range
                    x = op(x, f(src[idx]))
                end
                k = Threads.atomic_add!(i, batch_size)
            end
            Threads.lock(l)
            r = op(r, x)
            Threads.unlock(l)
        end
        r
    end
end

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

Here is a MWE of the problem:

function f()
    x = 1
    l = Threads.SpinLock()

    Threads.@threads for i in 1:Threads.nthreads()
        Threads.lock(l)
        x = x + i
        Threads.unlock(l)
    end
    x
end

Non threaded version of the same (without Threads.@threads) does correct type inference. This is an issue with closures as you have suggested, but let trick here does not help. I we do not solve it I will ask on Discourse.

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

Can you pass me the exact command that you sent to Julia where you have no type inference problems?

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

OK - I see where you do not see the problem. You are passing mi the inference of the wrapper function. Actually the last line is our function - it is only called in your dump, but you do not see what happens inside:

invoke KissThreading.:(#tmapreduce#7)(%1::Float64, %29::Int64, %%::Function, %%f::typeof(log), %%op::typeof(+), %%src::Array{Float64,1})::Float64                            │
 └───       return %41      

@mohamed82008
Copy link
Owner Author

I see. Btw the Ref workaround works in the above example. I really have no sense for why some of these workarounds work and others don't.

 julia> function f()
           x = Ref(1)
           l = Threads.SpinLock()
           Threads.@threads for i in 1:Threads.nthreads()
               Threads.lock(l)
               x[] += i
               Threads.unlock(l)
           end
           x
       end

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

Let us see if we get some help here: https://discourse.julialang.org/t/type-inference-in-closures/12544.

@bkamins
Copy link
Collaborator

bkamins commented Jul 20, 2018

Can you post Ref on Discourse as this is very nice. However, I avoid it in my code, because currently Ref does not work in general, e.g. you cannot use it with arrays unfortunately (but maybe there is no choice).

@mohamed82008
Copy link
Owner Author

Sure. Btw you can find the Ref workaround together with some cleanup in my fork's master branch. I made a separate branch with Travis setup https://github.com/mohamed82008/KissThreading.jl/tree/travis just to see how things are.

@mohamed82008
Copy link
Owner Author

Someone beat me to it!

@mohamed82008
Copy link
Owner Author

Refs seem to work fine with arrays btw. I don't understand your concern.

@mohamed82008
Copy link
Owner Author

Oh I see what you mean, it converts it to a pointer.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants