Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ComplexF64 division: combine four if-statements into two if-elseif-statements #29042

Merged
merged 4 commits into from
Sep 18, 2018

Conversation

thchr
Copy link
Contributor

@thchr thchr commented Sep 4, 2018

This commit combines what was previously four if-statements into two if-elseif-statements in the implementation of over/underflow-proof complex division, i.e. of /(z::ComplexF64, w::ComplexF64). This should allow branching to terminate sooner: in practice, testing with a pair of random numbers, I get a 1.22× speedup with this change.

The rewrite from four if-statements to two if-elseif-statements is allowable since floatmax(Float64)/2 > floatmin(Float64)*2/eps(Float64), so the previous ab-pairs of if-statements cannot be reached simultaneously; similarly for the cd-pairs.

There's also some minor restructuring, just to make the function a simpler read.

Overall, complex division still seems surprisingly slow - certainly, the slowdown is greater than what was quoted in the algorithm's paper. Most of that might be due to additional branching though.

@JeffBezanson JeffBezanson added the complex Complex numbers label Sep 4, 2018
@ararslan ararslan added the maths Mathematical functions label Sep 4, 2018
@thchr
Copy link
Contributor Author

thchr commented Sep 4, 2018

By the way, the current implementation seems to be a strict port of LAPACK's implementation of the algorithm; probably following the initial suggestion in #5072.
I'm not sure whether there are additional tricks that could be played to further improve the speed?

Right now, the checked version is ~7-9× slower than the naive, unchecked version. With the above change it is still ~5-7× slower.

@KristofferC
Copy link
Sponsor Member

You can always use @fastmath if you don't care for the accuracy.

@thchr
Copy link
Contributor Author

thchr commented Sep 4, 2018

Right, I realized that rather late today; but thanks!
Still, I guess it's of interest to have the standard implementation as fast as possible?

@KristofferC
Copy link
Sponsor Member

Of course, but (correct me if I am wrong) I felt there was a hint of questioning regarding if the performance penalty was really worth it to gain this extra precision. I just wanted to point out that there is already a way for you to make that choice by using the @fastmath macro.

@thchr
Copy link
Contributor Author

thchr commented Sep 4, 2018

You're not wrong at all: I even asked on Slack earlier (and had it explained to me) - it's good to be less ignorant now than I was earlier in the day :). It's great to know that the @fastmath macro enables this choice so neatly; thanks!

If I'm being entirely honest, my main interest in /(z,w) was just trying to understand why that "simple" operation required so many lines of code :).

@StefanKarpinski
Copy link
Sponsor Member

We definitely want this to be as fast as possible. @simonbyrne and @stevengj may find this of interest and/or have useful feedback.

@simonbyrne
Copy link
Contributor

In general, it looks good. Would be a good case to add to https://github.com/JuliaCI/BaseBenchmarks.jl.

Two more things that might be worth trying:

  1. Since we don't really care about NaN handling here, it may be faster to do
absa = abs(a); absb = abs(b)
ab = a >= b ? a : b
  1. We could get it down to one branch by always scaling, i.e. something like
if ab >= floatmax(Float64)/(2*bs)
    a/=2; b/=2; s*=2  # scale down a,b
else
    a*=bs;   b*=bs;   s/=bs   # scale up a,b
end

base/complex.jl Outdated
@@ -360,8 +360,8 @@ inv(z::Complex{<:Union{Float16,Float32}}) =
# c + i*d
function /(z::ComplexF64, w::ComplexF64)
a, b = reim(z); c, d = reim(w)
ab = max(abs(a), abs(b))
cd = max(abs(c), abs(d))
@fastmath ab = max(abs(a), abs(b))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note @fastmath isn't really valid here: it implies that we can assume a or b are never Inf, NaN or subnormal.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, you're quick. I thought I could already put my lessons above to use.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I'm misunderstanding the nature of the @fastmath macro here: I thought it simply swapped the max function for max_fast (and same for abs, though abs and abs_fast appear identical)?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regardless, I'll swap it to your initial suggestion instead, thanks.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@fastmath is probably okay in this case, but occasionally it can cause some problems when the compiler gets carried away. Probably better to be explicit.

@thchr
Copy link
Contributor Author

thchr commented Sep 5, 2018

Thanks @simonbyrne! Regarding your suggestions:

  1. This is great - that indeed leads to further speedup (~1.2-1.3×)! Wonderful. And it seems reasonable not to care about NaN-handling; if any components are NaN, what follows is bound to go haywire anyway. Added it to the PR.

  2. Took me a little while to understand the change to the initial if-check: I think I get it now.
    I tried it out with the substitutions you suggested (below). Unfortunately, it doesn't appear to improve the speed - instead, it lowers it (by the same factor as above, but in the wrong direction; tested for rand(ComplexF64) input only though). Maybe the reduced branch isn't worth the extra scaling operations?

    if ab >= halfov/bs
        a/=two;  b/=two;  s*=two  # scale down a,b
    else
        a*=bs;   b*=bs;   s/=bs   # scale up a,b
    end
    if cd >= halfov/bs
        c*=half; d*=half; s*=half # scale down c,d
    else
        c*=bs;   d*=bs;   s*=bs   # scale up c,d
    end

@simonbyrne
Copy link
Contributor

Unfortunately, it doesn't appear to improve the speed - instead, it lowers it

That's a pity, but thanks for trying. Pipelining & branch prediction make it difficult to figure out these sort of micro-optimisations.

Copy link
Contributor

@simonbyrne simonbyrne left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great, thanks!

From a purely stylistic point of view, you could probably also get rid of the half and two values: the compiler is able to optimise simple things like x/2 and x*2 into the optimal form. But that's not a big deal.

@thchr
Copy link
Contributor Author

thchr commented Sep 5, 2018

Good point; I had just kept the two and half around to reduce code churn: I don't like them either -- off they go.

I just realized that, in principle, the tricks above (+two and half variables as well) ought to be done for the inv(w::ComplexF64) method as well. Do you prefer that here or in a separate PR?

@simonbyrne
Copy link
Contributor

I just realized that, in principle, the tricks above (+two and half variables as well) ought to be done for the inv(w::ComplexF64) method as well. Do you prefer that here or in a separate PR?

A separate PR might be easier (@ me on it).

@thchr
Copy link
Contributor Author

thchr commented Sep 18, 2018

The Travis failure seems unrelated. Does this need further review?

@KristofferC
Copy link
Sponsor Member

@simonbyrne please merge if this looks ok to you

@simonbyrne simonbyrne merged commit 8dd3326 into JuliaLang:master Sep 18, 2018
@simonbyrne
Copy link
Contributor

Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
complex Complex numbers maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants