-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix hypot with more than two arguments #30301
Conversation
Can you add a test case? There probably are more efficient ways to do this (similar to how |
@simonbyrne Added a test. You are right this is correct but about 5 times slower than the previous implementation. I can try to make it faster with some advice. On the other hand, is there a reason against |
The main issue is that |
It seems to me then that But for now just merge this? |
See also the discussions in #27141 and #27251. I would have thought that something like this would be faster function hypot2(x::Number...)
maxabs = float(maximum(abs, x))
(iszero(maxabs) || isinf(maxabs)) && return maxabs
return maxabs * sqrt(sum(y -> abs2(y/maxabs), x))
end but |
Is there any difference between |
|
One way forward would be to:
|
As a side note here, I was recently reading up on MKL's super fast reduction to bi- and tridiagonal matrices. They mentioned that, at some point, one of the bottlenecks becomes |
I don't have time to go that deep into this. My suggestion is to merge this (since the current implementation just overflows), and later implement a more complete solution (as in @andreasnoack or @simonbyrne suggestions). |
@cossio, julia> x = rand(10);
julia> @btime maximum($x);
15.835 ns (0 allocations: 0 bytes)
julia> @btime maximum(abs, $x);
17.352 ns (0 allocations: 0 bytes) That's why it confuses me that it is allocating in the code above. |
@stevengj Could be because of the anonymous function in the sum? function hypot3(x::Number...)
maxabs = float(maximum(abs, x))
(iszero(maxabs) || isinf(maxabs)) && return maxabs
t = zero(maxabs)
for y in x
t += abs2(y/maxabs)
end
return maxabs * sqrt(t)
end allocates less:
|
And this version is still better: function hypot4(x::Number...)
maxabs = -Inf
for y in x
maxabs = max(maxabs, abs(y))
end
(iszero(maxabs) || isinf(maxabs)) && return maxabs
t = zero(maxabs)
for y in x
t += abs2(y/maxabs)
end
return maxabs * sqrt(t)
end
|
Maybe I'm missing something, but it seems most of the issues are due to argument splatting. E.g., the following variation on @stevengj's implementation is allocation free: function _hypot(x::NTuple{T,<:Number}) where T
maxabs = float(maximum(abs, x))
(iszero(maxabs) || isinf(maxabs)) && return maxabs
return maxabs * sqrt(sum(y -> abs2(y/maxabs), x))
end
hypot5(x::Number...) = _hypot(promote(x...)) julia> @btime hypot5(3,4,5)
22.299 ns (0 allocations: 0 bytes)
7.0710678118654755 I imagine there is a better way to do this dance - e.g. to avoid the last bit of extra splatting - but I thought it might be useful to note here anyway. |
This is the fastest implementation I've found so far, based on @thchr's comment about splatting and @andreasnoack's comment about speculative execution: function _hypot6(x::NTuple{N,<:Number}) where N
simple = sum(y -> abs2(float(y)), x)
isinf(simple) || iszero(simple) || return sqrt(simple)
maxabs = float(maximum(abs, x))
(iszero(maxabs) || isinf(maxabs)) && return maxabs
return maxabs * sqrt(sum(y -> abs2(y/maxabs), x))
end
hypot6(x::Number...) = _hypot6(promote(x...)) It's about 30% faster than However, type inference is failing for it: Anyway, @thchr's |
Building again on @stevengj's latest version, this does not allocate and infers - and is faster still - but, admittedly, is not a very pretty thing: function _hypot7(x::NTuple{N,<:Number}) where N
simple = sum(y -> abs2(float(y)), x)
isinf(simple) || iszero(simple) || return sqrt(simple)
return __hypot7(x)
end
function __hypot7(x::NTuple{N,<:Number}) where N
maxabs = float(maximum(abs, x))
(iszero(maxabs) || isinf(maxabs)) && return maxabs
return maxabs * sqrt(sum(y -> abs2(y/maxabs), x))
end
hypot7(x::Number...) = _hypot7(promote(x...)) @btime hypot7(3,4,5)
10.899 ns (0 allocations: 0 bytes)
7.0710678118654755 [I guess this version has an additional function-call overhead cost (to |
@thchr very nice! But I don't see why this is slower than the original version, hypot0(x::Number...) = sqrt(sum(abs2(y) for y in x)) julia> @btime hypot0(3,4,5)
3.990 ns (0 allocations: 0 bytes)
7.0710678118654755
julia> @btime hypot7(3,4,5)
11.216 ns (0 allocations: 0 bytes)
7.0710678118654755 EDIT: This version is as fast, if we only rewrite function _hypot7(x::NTuple{N,<:Number}) where N
simple = sum(abs2(y) for y in x)
isinf(simple) || iszero(simple) || return sqrt(simple)
return __hypot7(x)
end
|
Those seem negligible. See edit in my comment just above yours. |
That particular benchmark is very specific to having Quite a few unexpected inference gotcha's and fragility in this example, somehow... EDIT: The |
Then we can just leave |
The
Indeed, there should probably be a |
@stevengj Ok. Also added the test for |
What did the final benchmark timings end up like? |
3ce778b
to
7699f7d
Compare
I fixed the comments @simonbyrne @stevengj .
which is at least as fast as the original implementation. I also did some minor changes to the two-argument version (#31922, @cfborges), because it wasn't dealing well with dimensionful numbers (note that the If there are no further issues (and tests pass) can we get this merged? The original implementation is broken anyway so this is a net improvement. |
I cannot tell why the doctest build fails. Should I worry? |
It looks like there have been some nice compiler improvements over time! hypot(x::Number, xs::Number...) = _hypot(float.(promote(x, xs...))...)
function _hypot(x::T...) where {T<:Number}
maxabs = maximum(abs, x)
isnan(maxabs) && any(isinf, x) && return T(Inf)
(iszero(maxabs) || isinf(maxabs)) && return maxabs
return maxabs * sqrt(sum(y -> abs2(y / maxabs), x))
end now passes all tests and is as fast or faster than the current implementation. Note that it also passes the tests added in #31922 @cfborges for the two-arg version, and is faster. What do you guys think? Should we replace it all with this? It is lovely and simple. |
9c16f9e
to
e6fb8c6
Compare
Turns out the doctest failing was important. |
I went ahead a committed the simplified varargs version, keeping the more complicated version with a separate two-args method in a separate commit in case we want to rollback. I think we should just go ahead with this simpler code. |
CI is failing with an ambiguity error that I don't understand. |
I would make two general comments: First: The current two argument hypot code achieves (or nearly achieves if there is no fma) a level of accuracy that is requested in the IEEE754 standard (i.e. correctly rounded result). There is no reason to undo that and replace it with an algorithm that doesn't. Second: The IEEE754 standard indicates that hypot should be a two argument function. I am of the opinion that more than two arguments leads to the 2-norm and that should be a different function. One man's point of view. Your mileage may vary. |
If there is a test-case where the current two-args version achieves more accuracy, it would be nice to add it to the tests explicitly. Can you suggest an example @cfborges? I can add it in this PR. I was actually surprised the simple implementation passed all the tests that are already there. As to the multiple args version, I agree that we can just use LinearAlgebra.norm. But for some reason the multi-args hypot was present before this PR and I think removing it at this point would be a breaking change. Perhaps the best solution is to keep the current two-args version and use #30301 (comment) for the multi-args version. |
2ab5ce9
to
c37b2ed
Compare
This needs a rebase, but @stevengj I assume you're still in favor? |
Fixes #27141: The previous code led to under/overflow.
Before this change:
After this change: