ComplexF64 division: combine four if-statements into two if-elseif-statements #29042

thchr · 2018-09-04T20:22:04Z

This commit combines what was previously four if-statements into two if-elseif-statements in the implementation of over/underflow-proof complex division, i.e. of /(z::ComplexF64, w::ComplexF64). This should allow branching to terminate sooner: in practice, testing with a pair of random numbers, I get a 1.22× speedup with this change.

The rewrite from four if-statements to two if-elseif-statements is allowable since floatmax(Float64)/2 > floatmin(Float64)*2/eps(Float64), so the previous ab-pairs of if-statements cannot be reached simultaneously; similarly for the cd-pairs.

There's also some minor restructuring, just to make the function a simpler read.

Overall, complex division still seems surprisingly slow - certainly, the slowdown is greater than what was quoted in the algorithm's paper. Most of that might be due to additional branching though.

…atements

thchr · 2018-09-04T21:56:34Z

By the way, the current implementation seems to be a strict port of LAPACK's implementation of the algorithm; probably following the initial suggestion in #5072.
I'm not sure whether there are additional tricks that could be played to further improve the speed?

Right now, the checked version is ~7-9× slower than the naive, unchecked version. With the above change it is still ~5-7× slower.

KristofferC · 2018-09-04T22:03:30Z

You can always use @fastmath if you don't care for the accuracy.

thchr · 2018-09-04T22:05:55Z

Right, I realized that rather late today; but thanks!
Still, I guess it's of interest to have the standard implementation as fast as possible?

KristofferC · 2018-09-04T22:09:19Z

Of course, but (correct me if I am wrong) I felt there was a hint of questioning regarding if the performance penalty was really worth it to gain this extra precision. I just wanted to point out that there is already a way for you to make that choice by using the @fastmath macro.

thchr · 2018-09-04T22:41:55Z

You're not wrong at all: I even asked on Slack earlier (and had it explained to me) - it's good to be less ignorant now than I was earlier in the day :). It's great to know that the @fastmath macro enables this choice so neatly; thanks!

If I'm being entirely honest, my main interest in /(z,w) was just trying to understand why that "simple" operation required so many lines of code :).

StefanKarpinski · 2018-09-04T22:51:31Z

We definitely want this to be as fast as possible. @simonbyrne and @stevengj may find this of interest and/or have useful feedback.

simonbyrne · 2018-09-04T23:29:21Z

In general, it looks good. Would be a good case to add to https://github.com/JuliaCI/BaseBenchmarks.jl.

Two more things that might be worth trying:

Since we don't really care about NaN handling here, it may be faster to do

absa = abs(a); absb = abs(b)
ab = a >= b ? a : b

We could get it down to one branch by always scaling, i.e. something like

if ab >= floatmax(Float64)/(2*bs)
    a/=2; b/=2; s*=2  # scale down a,b
else
    a*=bs;   b*=bs;   s/=bs   # scale up a,b
end

simonbyrne · 2018-09-04T23:55:27Z

base/complex.jl

@@ -360,8 +360,8 @@ inv(z::Complex{<:Union{Float16,Float32}}) =
 #             c + i*d
 function /(z::ComplexF64, w::ComplexF64)
    a, b = reim(z); c, d = reim(w)
-    ab = max(abs(a), abs(b))
-    cd = max(abs(c), abs(d))
+    @fastmath ab = max(abs(a), abs(b))


Note @fastmath isn't really valid here: it implies that we can assume a or b are never Inf, NaN or subnormal.

Ah, you're quick. I thought I could already put my lessons above to use.

Maybe I'm misunderstanding the nature of the @fastmath macro here: I thought it simply swapped the max function for max_fast (and same for abs, though abs and abs_fast appear identical)?

Regardless, I'll swap it to your initial suggestion instead, thanks.

@fastmath is probably okay in this case, but occasionally it can cause some problems when the compiler gets carried away. Probably better to be explicit.

thchr · 2018-09-05T00:27:30Z

Thanks @simonbyrne! Regarding your suggestions:

This is great - that indeed leads to further speedup (~1.2-1.3×)! Wonderful. And it seems reasonable not to care about NaN-handling; if any components are NaN, what follows is bound to go haywire anyway. Added it to the PR.
Took me a little while to understand the change to the initial if-check: I think I get it now.
I tried it out with the substitutions you suggested (below). Unfortunately, it doesn't appear to improve the speed - instead, it lowers it (by the same factor as above, but in the wrong direction; tested for rand(ComplexF64) input only though). Maybe the reduced branch isn't worth the extra scaling operations?

    if ab >= halfov/bs
        a/=two;  b/=two;  s*=two  # scale down a,b
    else
        a*=bs;   b*=bs;   s/=bs   # scale up a,b
    end
    if cd >= halfov/bs
        c*=half; d*=half; s*=half # scale down c,d
    else
        c*=bs;   d*=bs;   s*=bs   # scale up c,d
    end

simonbyrne · 2018-09-05T17:06:49Z

Unfortunately, it doesn't appear to improve the speed - instead, it lowers it

That's a pity, but thanks for trying. Pipelining & branch prediction make it difficult to figure out these sort of micro-optimisations.

simonbyrne

Looks great, thanks!

From a purely stylistic point of view, you could probably also get rid of the half and two values: the compiler is able to optimise simple things like x/2 and x*2 into the optimal form. But that's not a big deal.

thchr · 2018-09-05T17:33:40Z

Good point; I had just kept the two and half around to reduce code churn: I don't like them either -- off they go.

I just realized that, in principle, the tricks above (+two and half variables as well) ought to be done for the inv(w::ComplexF64) method as well. Do you prefer that here or in a separate PR?

simonbyrne · 2018-09-05T18:18:22Z

I just realized that, in principle, the tricks above (+two and half variables as well) ought to be done for the inv(w::ComplexF64) method as well. Do you prefer that here or in a separate PR?

A separate PR might be easier (@ me on it).

thchr · 2018-09-18T17:09:22Z

The Travis failure seems unrelated. Does this need further review?

KristofferC · 2018-09-18T17:32:22Z

@simonbyrne please merge if this looks ok to you

simonbyrne · 2018-09-18T17:35:18Z

Thanks!

ComplexF64 division: combine four if-statements into two if-elseif-st…

8913022

…atements

JeffBezanson added the complex Complex numbers label Sep 4, 2018

ararslan requested a review from stevengj September 4, 2018 21:33

ararslan added the maths Mathematical functions label Sep 4, 2018

add @fastmath to magnitude check; NaN handling not needed

bf5df6e

simonbyrne reviewed Sep 4, 2018

View reviewed changes

don't use @fastmath; use explicit route instead

32a0eb9

simonbyrne approved these changes Sep 5, 2018

View reviewed changes

remove unnecessary two & half variables

283528f

thchr mentioned this pull request Sep 5, 2018

ComplexF64 inv: combine if-statements and use faster max without NaN handling #29058

Closed

simonbyrne merged commit 8dd3326 into JuliaLang:master Sep 18, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ComplexF64 division: combine four if-statements into two if-elseif-statements #29042

ComplexF64 division: combine four if-statements into two if-elseif-statements #29042

thchr commented Sep 4, 2018

thchr commented Sep 4, 2018

KristofferC commented Sep 4, 2018

thchr commented Sep 4, 2018

KristofferC commented Sep 4, 2018

thchr commented Sep 4, 2018

StefanKarpinski commented Sep 4, 2018

simonbyrne commented Sep 4, 2018

simonbyrne Sep 4, 2018

thchr Sep 4, 2018

thchr Sep 4, 2018

thchr Sep 5, 2018

simonbyrne Sep 5, 2018

thchr commented Sep 5, 2018 •

edited

Loading

simonbyrne commented Sep 5, 2018

simonbyrne left a comment

thchr commented Sep 5, 2018

simonbyrne commented Sep 5, 2018

thchr commented Sep 18, 2018

KristofferC commented Sep 18, 2018

simonbyrne commented Sep 18, 2018

ComplexF64 division: combine four if-statements into two if-elseif-statements #29042

ComplexF64 division: combine four if-statements into two if-elseif-statements #29042

Conversation

thchr commented Sep 4, 2018

thchr commented Sep 4, 2018

KristofferC commented Sep 4, 2018

thchr commented Sep 4, 2018

KristofferC commented Sep 4, 2018

thchr commented Sep 4, 2018

StefanKarpinski commented Sep 4, 2018

simonbyrne commented Sep 4, 2018

simonbyrne Sep 4, 2018

Choose a reason for hiding this comment

thchr Sep 4, 2018

Choose a reason for hiding this comment

thchr Sep 4, 2018

Choose a reason for hiding this comment

thchr Sep 5, 2018

Choose a reason for hiding this comment

simonbyrne Sep 5, 2018

Choose a reason for hiding this comment

thchr commented Sep 5, 2018 • edited Loading

simonbyrne commented Sep 5, 2018

simonbyrne left a comment

Choose a reason for hiding this comment

thchr commented Sep 5, 2018

simonbyrne commented Sep 5, 2018

thchr commented Sep 18, 2018

KristofferC commented Sep 18, 2018

simonbyrne commented Sep 18, 2018

thchr commented Sep 5, 2018 •

edited

Loading