Outline over/underflow functionality in ComplexF64 division for performance #29699

thchr · 2018-10-18T00:04:32Z

Besides some very minor restructuring, the only notable change here is basically the manual outlining (@noinline) of the functionality that takes care of over/underflow scaling (see e.g. #29688 (comment)).

This improves the speed of the very common case of non-over/underflowing input (and slightly reduces the speed of the over/underflowing case, by ~2-3 ns, on the other hand, due to an additional non-inlined function call)

Before:

@btime /(2.3-4.2im, 3.1+1.5im)
  11.377 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

After:

julia> @btime /(2.3-4.2im, 3.1+1.5im)
  5.119 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

This type of micro-optimization could probably be done for a lot of input-checked functions, whenever one has a reasonable expectation that a certain class of input-values are far more likely than others.
Personally, I do think it makes the code somewhat less readable/more verbose, so I'm not sure if it is worthwhile. Still, I thought I'd make this PR as an example, at least for discussion.

StefanKarpinski · 2018-10-18T12:20:11Z

base/complex.jl

+
+# sub-functionality for /(z::ComplexF64, w::ComplexF64)
+function cdiv(a::Float64, b::Float64, c::Float64, d::Float64)
+    abs(d)<=abs(c) ? (p,q)=robust_cdiv1(a,b,c,d) : ((p,q)=robust_cdiv1(b,a,d,c); q=-q)


This seems like it would be clearer using multiline if/else syntax. I'm a big fan of the ternary operator, but once you're doing assignments and using ( ; ) to chain multiple expressions in one of the branches, that's really taking it too far. I know this one-liner was in the original code but now that it's in its own function, it seems better to write it out clearly :)

Fixed, thanks!

base/complex.jl

KristofferC · 2018-10-18T16:15:44Z

base/complex.jl

-    # division operations
-    abs(d)<=abs(c) ? ((p,q)=robust_cdiv1(a,b,c,d)  ) : ((p,q)=robust_cdiv1(b,a,d,c); q=-q)
-    return ComplexF64(p*s,q*s) # undo scaling
+    return a,b,c,d,s
 end
 function robust_cdiv1(a::Float64, b::Float64, c::Float64, d::Float64)


This doesn't seem to get inlined.

You could use @inline to force the issue.

KristofferC · 2018-10-18T16:17:57Z

base/complex.jl

+end
+
+# sub-functionality for /(z::ComplexF64, w::ComplexF64)
+function cdiv(a::Float64, b::Float64, c::Float64, d::Float64)


This doesn't seem to get inlined

If I check with @code_typed I'm only seeing invoked-calls to robust_cdiv1 and scaling_cdiv (but no calls to cdiv)- I assume that means they're the only non-inlined calls?
I guess it might be preferable to have robust_cdiv1 inlined though.

Worth benchmarking, but presumably you'd want the main body with a non-inlined slow case for normalization and then the normalization body with everything inlined into it.

thchr · 2018-10-18T23:18:27Z

I added @inline to both cdiv() and robust_cdiv1() which, as far as I can tell from @code_typed succeeds in inlining all function bodies (except, of course, the outlined scaling_cdiv()). Thanks for pointing out the necessity of doing that---I mistakenly assumed the compiler was so aggressive I didn't need to check that.

The new timings are: (with input a=2.3-4.2im and b=3.1+1.5im; I shouldn't have included them explicitly in the calls before---that, in a way that I don't understand, makes the revised version better than it probably is)

Before

julia> @btime /($a,$b)
  10.239 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

First post (without inlined cdiv() and robust_cdiv1())

julia> @btime /($a,$b)
  7.964 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

After (with inlined cdiv() and robust_cdiv1())

julia> @btime /($a,$b)
  7.395 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

So there was indeed a bit more to be gained by forcing inlining!

thchr · 2018-10-30T18:55:40Z

@KristofferC @StefanKarpinski : Does this need further review?

thchr added 2 commits October 17, 2018 18:24

Merge branch 'JuliaLang-master'

7ec4fa5

outline over/underflow functionality in ComplexF64 division

52312d1

thchr force-pushed the complex-division-outline branch from 0279934 to 52312d1 Compare October 18, 2018 01:42

StefanKarpinski reviewed Oct 18, 2018

View reviewed changes

Swap over-long ternary for conventional if-else syntax

f69448d

KristofferC reviewed Oct 18, 2018

View reviewed changes

force inline of hot subfunctions

5659e23

KristofferC merged commit 9704474 into JuliaLang:master Oct 30, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Outline over/underflow functionality in ComplexF64 division for performance #29699

Outline over/underflow functionality in ComplexF64 division for performance #29699

thchr commented Oct 18, 2018

StefanKarpinski Oct 18, 2018

thchr Oct 18, 2018

KristofferC Oct 18, 2018

StefanKarpinski Oct 18, 2018

KristofferC Oct 18, 2018

thchr Oct 18, 2018

StefanKarpinski Oct 18, 2018

thchr commented Oct 18, 2018 •

edited

Loading

thchr commented Oct 30, 2018

Outline over/underflow functionality in ComplexF64 division for performance #29699

Outline over/underflow functionality in ComplexF64 division for performance #29699

Conversation

thchr commented Oct 18, 2018

StefanKarpinski Oct 18, 2018

Choose a reason for hiding this comment

thchr Oct 18, 2018

Choose a reason for hiding this comment

KristofferC Oct 18, 2018

Choose a reason for hiding this comment

StefanKarpinski Oct 18, 2018

Choose a reason for hiding this comment

KristofferC Oct 18, 2018

Choose a reason for hiding this comment

thchr Oct 18, 2018

Choose a reason for hiding this comment

StefanKarpinski Oct 18, 2018

Choose a reason for hiding this comment

thchr commented Oct 18, 2018 • edited Loading

thchr commented Oct 30, 2018

thchr commented Oct 18, 2018 •

edited

Loading