Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tuple type member matrix vs. simple matrix performance difference #13816

Closed
GravityAssisted opened this issue Oct 29, 2015 · 5 comments
Closed
Labels
performance Must go faster

Comments

@GravityAssisted
Copy link

Lets say I define a tuple type with two members a,b

immutable TupleTest{A,B}
         a::A
         b::B
       end

tv = TupleTest(rand(3,3),rand(4,2))

next, I also define a simple matrix

tc = rand(4,2)

such that typeof(tc) and typeof(tv.b) give the same result.

typeof(tc) == typeof(tv.b)
> true

Now benchmarking them I get the following results:

@time for i in 1:10^6
       tv.b+tv.b*2.0;
       end
  0.215183 seconds (3.00 M allocations: 274.658 MB, 10.95% gc time)

@time for i in 1:10^6
       tc+tc*2.0;
       end
  0.179003 seconds (3.00 M allocations: 274.658 MB, 11.28% gc time)

The tuple type version is 15%-20% slower. Is this due to the overhead of using tuples and is that constant or does it grow with problem size ? I don't understand how tuple types work internally, but I read somewhere that there is no overhead if you are using the same concrete types in the computation. Does that statement apply here as-well ? Pardon me if this questions sounds naive, am trying to understand the reason for performance difference.

Assuming that the overhead of using tuple types is constant, then if my computation within the loop is complex, the relative performance different should decrease, which will be nice...

I am on Julia 0.4

thanks,
Nitin

@tkelman
Copy link
Contributor

tkelman commented Oct 29, 2015

Try hoisting the field access outside of the loop:

tvb = tv.b
for i in 1:10^6
    tvb+tvb*2.0;
end

@GravityAssisted
Copy link
Author

@tkelman that does fix it, thanks! But how should one know to do that ? and why did it work ?

As an end user, I am afraid their mite be many such small tricks which are not in the docs, makes me doubt the performance of my code. What I mean is, I expect compiler to do such low-level optimizations.

@tkelman tkelman added the performance Must go faster label Oct 29, 2015
@tkelman
Copy link
Contributor

tkelman commented Oct 29, 2015

Ref #9755 - I would think with an immutable this should be working already? It's a known issue that will hopefully be fixed with future compiler optimizations (#3440), but there may have been a regression here, or maybe it only happens automatically if you use @inbounds or run with -O or use newer LLVM. Not sure.

@JeffBezanson
Copy link
Member

This is due to the slowness of accessing global variables. If you put the two loops inside a function, there is no difference:

julia> function f(tv,tc)
         @time for i in 1:10^6
                     tv.b+tv.b*2.0;
                     end
       @time for i in 1:10^6
                     tc+tc*2.0;
                     end
       end
f (generic function with 1 method)

julia> f(tv,tc)
  0.134492 seconds (3.00 M allocations: 274.658 MB, 8.79% gc time)
  0.132891 seconds (3.00 M allocations: 274.658 MB, 6.81% gc time)

@tkelman
Copy link
Contributor

tkelman commented Oct 30, 2015

closing as a milder-than-usual duplicate of #8870

@tkelman tkelman closed this as completed Oct 30, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
performance Must go faster
Projects
None yet
Development

No branches or pull requests

3 participants