Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sum(function, tuple) is slow #30465

Closed
cossio opened this issue Dec 20, 2018 · 19 comments
Closed

sum(function, tuple) is slow #30465

cossio opened this issue Dec 20, 2018 · 19 comments
Labels
performance Must go faster

Comments

@cossio
Copy link
Contributor

cossio commented Dec 20, 2018

julia> hsum(x...) = sum(abs2.(float.(x)));
julia> gsum(x...) = sum(y -> abs2(float(y)), x);

julia> @btime gsum(1.,2.,3.,4.,5.,6.,7.,8.)
  10.907 ns (0 allocations: 0 bytes)
204.0
julia> @btime hsum(1.,2.,3.,4.,5.,6.,7.,8.)
  2.436 ns (0 allocations: 0 bytes)
204.0

gsum should not be so slow compared to hsum, right?

@KristofferC
Copy link
Member

#30421, they use different summation implementation.

@stevengj stevengj added the performance Must go faster label Dec 20, 2018
@stevengj
Copy link
Member

Maybe we should have a specialized sum(function, tuple) method similar to our specialized sum(tuple)?

@cossio
Copy link
Contributor Author

cossio commented Dec 20, 2018

@KristofferC Here I don't have a sum of a generator, but of a Tuple. I'm not sure it's the same as the other issue.

@stevengj
Copy link
Member

I think that when you do sum(f, tuple) it actually constructs a generator?

@nalimilan
Copy link
Member

I think that when you do sum(f, tuple) it actually constructs a generator?

No, it calls mapfoldl(y -> abs2(float(y)), Base.add_sum, x). The difference is that we have a special sum method for tuples:

sum(x::Tuple{Any, Vararg{Any}}) = +(x...)

@cossio
Copy link
Contributor Author

cossio commented Dec 20, 2018

What about defining:

sum(f, x::Tuple{Any, Vararg{Any}}) = +(f.(x)...)

@cossio
Copy link
Contributor Author

cossio commented Dec 20, 2018

This allocates for some reason. Any ideas?

julia> sum1(f, x::Tuple}) = +(f.(x)...)
julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  562.674 ns (10 allocations: 288 bytes)
0.1761616497223787
julia> @btime sum(sin, (1.,2.,3.,4.,5.))
  56.020 ns (0 allocations: 0 bytes)
0.1761616497223787

@stevengj
Copy link
Member

Probably sufficiently large tuples get heap-allocated?

@cossio
Copy link
Contributor Author

cossio commented Dec 20, 2018

Probably sufficiently large tuples get heap-allocated?

julia> @btime sum1(sin, (1.,2.))
  20.082 ns (0 allocations: 0 bytes)
1.7507684116335782

julia> @btime sum1(sin, (1.,2.,3.))
  30.955 ns (0 allocations: 0 bytes)
1.8918884196934453

julia> @btime sum1(sin, (1.,2.,3.,4.))
  40.153 ns (0 allocations: 0 bytes)
1.1350859243855171

julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  474.749 ns (10 allocations: 288 bytes)
0.1761616497223787

I don't understand this.

@stevengj
Copy link
Member

I don't know why the threshold is a 5-tuple, but suppose you have a tuple of 1000 elements — where would the result of f.(x) be stored? Clearly there must be some threshold beyond which the temporary tuple is heap-allocated, no?

@cossio
Copy link
Contributor Author

cossio commented Dec 20, 2018

I think there should be no temporary tuple at all. The operation can be done in-place.

@cossio cossio changed the title sum(y -> f(g(x)), x) slow sum(y -> f(g(x)), x) slow Dec 20, 2018
@stevengj
Copy link
Member

stevengj commented Dec 20, 2018

It seems too optimistic to hope that the compiler should always be able to figure out that it can eliminate the tuple construction.

The following is based on the Base.afoldl code used in Base.+(a,b,c,d...), and seems to always be fast without allocating:

mapafoldl(F,op,a) = F(a)
mapafoldl(F,op,a,b) = op(F(a),F(b))
mapafoldl(F,op,a,b,c...) = op(op(F(a),F(b)), mapafoldl(F, op, c...))
function mapafoldl(F,op,a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,qs...)
    y = op(op(op(op(op(op(op(op(op(op(op(op(op(op(op(F(a),F(b)),F(c)),F(d)),F(e)),F(f)),F(g)),F(h)),F(i)),F(j)),F(k)),F(l)),F(m)),F(n)),F(o)),F(p))
    for x in qs; y = op(y,F(x)); end
    y
end
mysum(f, x::Tuple) = mapafoldl(f, +, x...)

@stevengj
Copy link
Member

stevengj commented Dec 20, 2018

(Indeed, it seems like we could use the above to define an efficient mapfoldl for tuples, which would give us a fast mapreduce, sum, etcetera.)

@stevengj
Copy link
Member

(Note that the above is not quite right for a general mapfoldl because it is not left-associative. I'll put together a PR with a proper version.)

@stevengj stevengj changed the title sum(y -> f(g(x)), x) slow sum(function, tuple) is slow Dec 21, 2018
@Nosferican
Copy link
Contributor

sum(f, x::Tuple{Any, Vararg{Any}}) = +(f.(x)...)

should probably be

sum(f, x::Tuple{Any, Vararg{Any}}) = sum(f, x for x in x)

to avoid the unnecessary materialization.

@nalimilan
Copy link
Member

I don't think it matters for tuples, no allocation happens anyway.

@chethega
Copy link
Contributor

This allocates for some reason. Any ideas?

You have an argument of type <:Function that does not appear in head position in the body of the function definition. Therefore it does not get specialized (@code_native sum1(...) shows you the fully realized code instead of the actually existing code).

julia> using BenchmarkTools
julia> sum1(f, x::Tuple) = +(f.(x)...)
julia> sum2(f::FF, x::Tuple) where FF = +(f.(x)...)

julia> @btime sum1(sin, (1.,2.,3.,4.,5.))
  597.669 ns (10 allocations: 288 bytes)
0.1761616497223787

julia> @btime sum2(sin, (1.,2.,3.,4.,5.))
  62.396 ns (0 allocations: 0 bytes)
0.1761616497223787

@cossio
Copy link
Contributor Author

cossio commented Jan 3, 2019

@chethega I don't understand the difference. I thought ::F where F was equivalent to ::Any.

@KristofferC
Copy link
Member

It works the same w.r.t dispatch but it forces specialization on the argument.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

6 participants