-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Outline over/underflow functionality in ComplexF64 division for performance #29699
Conversation
0279934
to
52312d1
Compare
base/complex.jl
Outdated
|
||
# sub-functionality for /(z::ComplexF64, w::ComplexF64) | ||
function cdiv(a::Float64, b::Float64, c::Float64, d::Float64) | ||
abs(d)<=abs(c) ? (p,q)=robust_cdiv1(a,b,c,d) : ((p,q)=robust_cdiv1(b,a,d,c); q=-q) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems like it would be clearer using multiline if/else
syntax. I'm a big fan of the ternary operator, but once you're doing assignments and using ( ; )
to chain multiple expressions in one of the branches, that's really taking it too far. I know this one-liner was in the original code but now that it's in its own function, it seems better to write it out clearly :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed, thanks!
base/complex.jl
Outdated
# division operations | ||
abs(d)<=abs(c) ? ((p,q)=robust_cdiv1(a,b,c,d) ) : ((p,q)=robust_cdiv1(b,a,d,c); q=-q) | ||
return ComplexF64(p*s,q*s) # undo scaling | ||
return a,b,c,d,s | ||
end | ||
function robust_cdiv1(a::Float64, b::Float64, c::Float64, d::Float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to get inlined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You could use @inline
to force the issue.
base/complex.jl
Outdated
end | ||
|
||
# sub-functionality for /(z::ComplexF64, w::ComplexF64) | ||
function cdiv(a::Float64, b::Float64, c::Float64, d::Float64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem to get inlined
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If I check with @code_typed
I'm only seeing invoked-calls to robust_cdiv1
and scaling_cdiv
(but no calls to cdiv
)- I assume that means they're the only non-inlined calls?
I guess it might be preferable to have robust_cdiv1
inlined though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Worth benchmarking, but presumably you'd want the main body with a non-inlined slow case for normalization and then the normalization body with everything inlined into it.
I added The new timings are: (with input Before julia> @btime /($a,$b)
10.239 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im First post (without inlined julia> @btime /($a,$b)
7.964 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im After (with inlined julia> @btime /($a,$b)
7.395 ns (0 allocations: 0 bytes)
0.06998313659359187 - 1.3887015177065767im So there was indeed a bit more to be gained by forcing inlining! |
@KristofferC @StefanKarpinski : Does this need further review? |
Besides some very minor restructuring, the only notable change here is basically the manual outlining (
@noinline
) of the functionality that takes care of over/underflow scaling (see e.g. #29688 (comment)).This improves the speed of the very common case of non-over/underflowing input (and slightly reduces the speed of the over/underflowing case, by ~2-3 ns, on the other hand, due to an additional non-inlined function call)
Before:
After:
This type of micro-optimization could probably be done for a lot of input-checked functions, whenever one has a reasonable expectation that a certain class of input-values are far more likely than others.
Personally, I do think it makes the code somewhat less readable/more verbose, so I'm not sure if it is worthwhile. Still, I thought I'd make this PR as an example, at least for discussion.