Tuple type member matrix vs. simple matrix performance difference #13816

GravityAssisted · 2015-10-29T06:30:00Z

Lets say I define a tuple type with two members a,b

immutable TupleTest{A,B}
         a::A
         b::B
       end

tv = TupleTest(rand(3,3),rand(4,2))

next, I also define a simple matrix

tc = rand(4,2)

such that typeof(tc) and typeof(tv.b) give the same result.

typeof(tc) == typeof(tv.b)
> true

Now benchmarking them I get the following results:

@time for i in 1:10^6
       tv.b+tv.b*2.0;
       end
  0.215183 seconds (3.00 M allocations: 274.658 MB, 10.95% gc time)

@time for i in 1:10^6
       tc+tc*2.0;
       end
  0.179003 seconds (3.00 M allocations: 274.658 MB, 11.28% gc time)

The tuple type version is 15%-20% slower. Is this due to the overhead of using tuples and is that constant or does it grow with problem size ? I don't understand how tuple types work internally, but I read somewhere that there is no overhead if you are using the same concrete types in the computation. Does that statement apply here as-well ? Pardon me if this questions sounds naive, am trying to understand the reason for performance difference.

Assuming that the overhead of using tuple types is constant, then if my computation within the loop is complex, the relative performance different should decrease, which will be nice...

I am on Julia 0.4

thanks,
Nitin

The text was updated successfully, but these errors were encountered:

tkelman · 2015-10-29T07:02:13Z

Try hoisting the field access outside of the loop:

tvb = tv.b
for i in 1:10^6
    tvb+tvb*2.0;
end

GravityAssisted · 2015-10-29T07:09:43Z

@tkelman that does fix it, thanks! But how should one know to do that ? and why did it work ?

As an end user, I am afraid their mite be many such small tricks which are not in the docs, makes me doubt the performance of my code. What I mean is, I expect compiler to do such low-level optimizations.

tkelman · 2015-10-29T07:15:07Z

Ref #9755 - I would think with an immutable this should be working already? It's a known issue that will hopefully be fixed with future compiler optimizations (#3440), but there may have been a regression here, or maybe it only happens automatically if you use @inbounds or run with -O or use newer LLVM. Not sure.

JeffBezanson · 2015-10-29T13:28:33Z

This is due to the slowness of accessing global variables. If you put the two loops inside a function, there is no difference:

julia> function f(tv,tc)
         @time for i in 1:10^6
                     tv.b+tv.b*2.0;
                     end
       @time for i in 1:10^6
                     tc+tc*2.0;
                     end
       end
f (generic function with 1 method)

julia> f(tv,tc)
  0.134492 seconds (3.00 M allocations: 274.658 MB, 8.79% gc time)
  0.132891 seconds (3.00 M allocations: 274.658 MB, 6.81% gc time)

tkelman · 2015-10-30T21:32:02Z

closing as a milder-than-usual duplicate of #8870

tkelman added the performance Must go faster label Oct 29, 2015

tkelman closed this as completed Oct 30, 2015

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Tuple type member matrix vs. simple matrix performance difference #13816

Tuple type member matrix vs. simple matrix performance difference #13816

GravityAssisted commented Oct 29, 2015

tkelman commented Oct 29, 2015

GravityAssisted commented Oct 29, 2015

tkelman commented Oct 29, 2015

JeffBezanson commented Oct 29, 2015

tkelman commented Oct 30, 2015

Tuple type member matrix vs. simple matrix performance difference #13816

Tuple type member matrix vs. simple matrix performance difference #13816

Comments

GravityAssisted commented Oct 29, 2015

tkelman commented Oct 29, 2015

GravityAssisted commented Oct 29, 2015

tkelman commented Oct 29, 2015

JeffBezanson commented Oct 29, 2015

tkelman commented Oct 30, 2015