Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Outline over/underflow functionality in ComplexF64 division for performance #29699

Merged
merged 4 commits into from
Oct 30, 2018

Conversation

thchr
Copy link
Contributor

@thchr thchr commented Oct 18, 2018

Besides some very minor restructuring, the only notable change here is basically the manual outlining (@noinline) of the functionality that takes care of over/underflow scaling (see e.g. #29688 (comment)).

This improves the speed of the very common case of non-over/underflowing input (and slightly reduces the speed of the over/underflowing case, by ~2-3 ns, on the other hand, due to an additional non-inlined function call)

Before:

@btime /(2.3-4.2im, 3.1+1.5im)
  11.377 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

After:

julia> @btime /(2.3-4.2im, 3.1+1.5im)
  5.119 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

This type of micro-optimization could probably be done for a lot of input-checked functions, whenever one has a reasonable expectation that a certain class of input-values are far more likely than others.
Personally, I do think it makes the code somewhat less readable/more verbose, so I'm not sure if it is worthwhile. Still, I thought I'd make this PR as an example, at least for discussion.

@thchr thchr force-pushed the complex-division-outline branch from 0279934 to 52312d1 Compare October 18, 2018 01:42
base/complex.jl Outdated

# sub-functionality for /(z::ComplexF64, w::ComplexF64)
function cdiv(a::Float64, b::Float64, c::Float64, d::Float64)
abs(d)<=abs(c) ? (p,q)=robust_cdiv1(a,b,c,d) : ((p,q)=robust_cdiv1(b,a,d,c); q=-q)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like it would be clearer using multiline if/else syntax. I'm a big fan of the ternary operator, but once you're doing assignments and using ( ; ) to chain multiple expressions in one of the branches, that's really taking it too far. I know this one-liner was in the original code but now that it's in its own function, it seems better to write it out clearly :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks!

base/complex.jl Show resolved Hide resolved
base/complex.jl Outdated
# division operations
abs(d)<=abs(c) ? ((p,q)=robust_cdiv1(a,b,c,d) ) : ((p,q)=robust_cdiv1(b,a,d,c); q=-q)
return ComplexF64(p*s,q*s) # undo scaling
return a,b,c,d,s
end
function robust_cdiv1(a::Float64, b::Float64, c::Float64, d::Float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to get inlined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could use @inline to force the issue.

base/complex.jl Outdated
end

# sub-functionality for /(z::ComplexF64, w::ComplexF64)
function cdiv(a::Float64, b::Float64, c::Float64, d::Float64)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem to get inlined

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I check with @code_typed I'm only seeing invoked-calls to robust_cdiv1 and scaling_cdiv (but no calls to cdiv)- I assume that means they're the only non-inlined calls?
I guess it might be preferable to have robust_cdiv1 inlined though.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Worth benchmarking, but presumably you'd want the main body with a non-inlined slow case for normalization and then the normalization body with everything inlined into it.

@thchr
Copy link
Contributor Author

thchr commented Oct 18, 2018

I added @inline to both cdiv() and robust_cdiv1() which, as far as I can tell from @code_typed succeeds in inlining all function bodies (except, of course, the outlined scaling_cdiv()). Thanks for pointing out the necessity of doing that---I mistakenly assumed the compiler was so aggressive I didn't need to check that.

The new timings are: (with input a=2.3-4.2im and b=3.1+1.5im; I shouldn't have included them explicitly in the calls before---that, in a way that I don't understand, makes the revised version better than it probably is)

Before

julia> @btime /($a,$b)
  10.239 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

First post (without inlined cdiv() and robust_cdiv1())

julia> @btime /($a,$b)
  7.964 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

After (with inlined cdiv() and robust_cdiv1())

julia> @btime /($a,$b)
  7.395 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im

So there was indeed a bit more to be gained by forcing inlining!

@thchr
Copy link
Contributor Author

thchr commented Oct 30, 2018

@KristofferC @StefanKarpinski : Does this need further review?

@KristofferC KristofferC merged commit 9704474 into JuliaLang:master Oct 30, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants