Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix hypot with more than two arguments #30301

Closed
wants to merge 3 commits into from
Closed

Conversation

cossio
Copy link
Contributor

@cossio cossio commented Dec 7, 2018

Fixes #27141: The previous code led to under/overflow.

Before this change:

a = 10^10
hypot(a, a, a, a)
# ERROR: DomainError with -5.828369621610136e18:

After this change:

a = 10^10
hypot(a,a,a,a)
# 2.0e10

@cossio cossio changed the title fix hypot with multiple arguments fix hypot with more than two arguments Dec 7, 2018
@simonbyrne
Copy link
Contributor

Can you add a test case?

There probably are more efficient ways to do this (similar to how norm is implemented in LinearAlgebra), but this is a net improvement.

@cossio
Copy link
Contributor Author

cossio commented Dec 7, 2018

@simonbyrne Added a test.

You are right this is correct but about 5 times slower than the previous implementation. I can try to make it faster with some advice.

On the other hand, is there a reason against hypot(x...) = norm(x)?

@JeffBezanson JeffBezanson added the maths Mathematical functions label Dec 7, 2018
@simonbyrne
Copy link
Contributor

The main issue is that norm is defined in the LinearAlgebra stdlib.

@cossio
Copy link
Contributor Author

cossio commented Dec 7, 2018

It seems to me then that hypot could be removed in favor of norm.

But for now just merge this?

@stevengj
Copy link
Member

stevengj commented Dec 8, 2018

See also the discussions in #27141 and #27251.

I would have thought that something like this would be faster

function hypot2(x::Number...)
    maxabs = float(maximum(abs, x))
    (iszero(maxabs) || isinf(maxabs)) && return maxabs
    return maxabs * sqrt(sum(y -> abs2(y/maxabs), x))
end

but @btime says it is allocating for some reason when I call hypot2(3,4,5), even though @code_warntype doesn't find any type instabilities. What's going wrong?

base/math.jl Outdated Show resolved Hide resolved
@cossio
Copy link
Contributor Author

cossio commented Dec 10, 2018

Is there any difference between hypot(x...) and norm(x)? What's the point of having two functions here?

@cossio
Copy link
Contributor Author

cossio commented Dec 10, 2018

@stevengj maximum allocates. See #28928

@simonbyrne
Copy link
Contributor

simonbyrne commented Dec 10, 2018

Is there any difference between hypot(x...) and norm(x)? What's the point of having two functions here?

norm is the reducing version of hypot (similar to maximum vs max, sum vs +, etc). norm also includes much more functionality (e.g. handling arrays of arrays).

@simonbyrne
Copy link
Contributor

simonbyrne commented Dec 10, 2018

One way forward would be to:

  1. write an efficient implementation of mapreduce(f, ::typeof(hypot), x) in Base,
  2. define hypot(x::Number...) = mapreduce(identity, hypot, x), and
  3. define LinearAlgebra.generic_norm2(x) = mapreduce(norm, hypot, x).

@andreasnoack
Copy link
Member

andreasnoack commented Dec 10, 2018

As a side note here, I was recently reading up on MKL's super fast reduction to bi- and tridiagonal matrices. They mentioned that, at some point, one of the bottlenecks becomes norm and for MKL they have implemented a different norm that speculatively computes nrmsq = dot(x,x) and checks if the value is "safe" and then only computes the slow scaled version if nrmsq is outside the safe bounds and otherwise just returns sqrt(nrmsq). In most cases, this is much faster and in rare cases, it is much slower.

@cossio
Copy link
Contributor Author

cossio commented Dec 13, 2018

I don't have time to go that deep into this. My suggestion is to merge this (since the current implementation just overflows), and later implement a more complete solution (as in @andreasnoack or @simonbyrne suggestions).

@stevengj
Copy link
Member

stevengj commented Dec 13, 2018

@cossio, maximum only allocates when the result is an array, not when the result is a scalar.

julia> x = rand(10);

julia> @btime maximum($x);
  15.835 ns (0 allocations: 0 bytes)

julia> @btime maximum(abs, $x);
  17.352 ns (0 allocations: 0 bytes)

That's why it confuses me that it is allocating in the code above.

@cossio
Copy link
Contributor Author

cossio commented Dec 13, 2018

@stevengj Could be because of the anonymous function in the sum?

function hypot3(x::Number...)
    maxabs = float(maximum(abs, x))
    (iszero(maxabs) || isinf(maxabs)) && return maxabs
    t = zero(maxabs)
    for y in x
        t += abs2(y/maxabs)
    end
    return maxabs * sqrt(t)
end

allocates less:

julia> @btime hypot3(3,4,5)
  1.933 μs (8 allocations: 272 bytes)
7.0710678118654755

julia> @btime hypot2(3,4,5)
  2.732 μs (14 allocations: 384 bytes)
7.0710678118654755

@cossio
Copy link
Contributor Author

cossio commented Dec 13, 2018

And this version is still better:

function hypot4(x::Number...)
           maxabs = -Inf
           for y in x
               maxabs = max(maxabs, abs(y))
           end
           (iszero(maxabs) || isinf(maxabs)) && return maxabs
           t = zero(maxabs)
           for y in x
               t += abs2(y/maxabs)
           end
           return maxabs * sqrt(t)
end
julia> @btime hypot4(3,4,5)
  28.573 ns (1 allocation: 16 bytes)
7.0710678118654755

@thchr
Copy link
Contributor

thchr commented Dec 13, 2018

Maybe I'm missing something, but it seems most of the issues are due to argument splatting. E.g., the following variation on @stevengj's implementation is allocation free:

function _hypot(x::NTuple{T,<:Number}) where T
    maxabs = float(maximum(abs, x))
    (iszero(maxabs) || isinf(maxabs)) && return maxabs
    return maxabs * sqrt(sum(y -> abs2(y/maxabs), x))
end
hypot5(x::Number...) = _hypot(promote(x...))
julia> @btime hypot5(3,4,5)
  22.299 ns (0 allocations: 0 bytes)
7.0710678118654755

I imagine there is a better way to do this dance - e.g. to avoid the last bit of extra splatting - but I thought it might be useful to note here anyway.

@stevengj
Copy link
Member

stevengj commented Dec 13, 2018

This is the fastest implementation I've found so far, based on @thchr's comment about splatting and @andreasnoack's comment about speculative execution:

function _hypot6(x::NTuple{N,<:Number}) where N
    simple = sum(y -> abs2(float(y)), x)
    isinf(simple) || iszero(simple) || return sqrt(simple)
    maxabs = float(maximum(abs, x))
    (iszero(maxabs) || isinf(maxabs)) && return maxabs
    return maxabs * sqrt(sum(y -> abs2(y/maxabs), x))
end
hypot6(x::Number...) = _hypot6(promote(x...))

It's about 30% faster than reduce(hypot, x) for 3 arguments and about 3× faster for 5 arguments.

However, type inference is failing for it: @code_warntype says that the inferred return type for _hypot6((3,4,5)) is Any, which is mysterious to me. Is this an inference bug?

Anyway, @thchr's hypot5 above is almost as fast, is simpler, and doesn't hit any inference problems, so I would go with that for now.

@thchr
Copy link
Contributor

thchr commented Dec 13, 2018

Building again on @stevengj's latest version, this does not allocate and infers - and is faster still - but, admittedly, is not a very pretty thing:

function _hypot7(x::NTuple{N,<:Number}) where N
    simple = sum(y -> abs2(float(y)), x)
    isinf(simple) || iszero(simple) || return sqrt(simple)
    return __hypot7(x)
end
function __hypot7(x::NTuple{N,<:Number}) where N
    maxabs = float(maximum(abs, x))
    (iszero(maxabs) || isinf(maxabs)) && return maxabs
    return maxabs * sqrt(sum(y -> abs2(y/maxabs), x))
end
hypot7(x::Number...) = _hypot7(promote(x...))
@btime hypot7(3,4,5)
  10.899 ns (0 allocations: 0 bytes)
7.0710678118654755

[I guess this version has an additional function-call overhead cost (to ___hypot7) if the speculative part fails]

@cossio
Copy link
Contributor Author

cossio commented Dec 13, 2018

@thchr very nice! But I don't see why this is slower than the original version,

hypot0(x::Number...) = sqrt(sum(abs2(y) for y in x))
julia> @btime hypot0(3,4,5)
  3.990 ns (0 allocations: 0 bytes)
7.0710678118654755

julia> @btime hypot7(3,4,5)
  11.216 ns (0 allocations: 0 bytes)
7.0710678118654755

EDIT: This version is as fast, if we only rewrite _hypot7 as follows:

function _hypot7(x::NTuple{N,<:Number}) where N
    simple = sum(abs2(y) for y in x)
    isinf(simple) || iszero(simple) || return sqrt(simple)
    return __hypot7(x)
end
julia> @btime hypot7(3,4,5)
  3.980 ns (0 allocations: 0 bytes)
7.0710678118654755

@stevengj
Copy link
Member

stevengj commented Dec 13, 2018

@vtjnash, why does type-inference succeed for hypot7(3,4,5) and not for hypot6(3,4,5) above? That doesn't seem right.

@cossio, there is the overhead of the extra isinf and iszero checks.

@cossio
Copy link
Contributor Author

cossio commented Dec 13, 2018

@cossio, there is the overhead of the extra isinf and iszero checks.

Those seem negligible. See edit in my comment just above yours.

@thchr
Copy link
Contributor

thchr commented Dec 13, 2018

@cossio, there is the overhead of the extra isinf and iszero checks.

Those seem negligible. See edit in my comment just above yours.

That particular benchmark is very specific to having ::Integer inputs, however. The simple = sum(y -> abs2(float(y)), x) is actually somewhat faster than the generator version for ::Float64 inputs (I guess float inputs would be more common; though, overall, the vararg hypot function seems unlikely to see a lot of action). Still, the difference is a little odd...

Quite a few unexpected inference gotcha's and fragility in this example, somehow...

EDIT: The sum(y -> abs2(float(y)), x) and sum(abs2(y) for y in x) differences for ::Float64 arguments are probably just noise. The former is as fast as the latter also for ::Int arguments if the float conversion is dropped; I guess it actually isn't needed here, or?

@cossio
Copy link
Contributor Author

cossio commented Dec 13, 2018

The sum(y -> abs2(float(y)), x) and sum(abs2(y) for y in x) differences for ::Float64 arguments are probably just noise. The former is as fast as the latter also for ::Int arguments if the float conversion is dropped; I guess it actually isn't needed here, or?

Then we can just leave sum(abs2(y) for y in x)? Looks more readable to me.

@stevengj
Copy link
Member

stevengj commented Dec 13, 2018

The former is as fast as the latter also for ::Int arguments if the float conversion is dropped; I guess it actually isn't needed here, or?

The float conversion is needed before computing abs2, because otherwise you can get spurious wraparound:

julia> abs2(10^18)
-5527149226598858752

julia> abs2(float(10^18))
1.0e36

Indeed, there should probably be a hypot test for hypot(i,i) ≈ i*√2 and hypot(i,i,i) ≈ i*√3 where i = typemax(Int), which is another bug in the current version.

@cossio
Copy link
Contributor Author

cossio commented Dec 13, 2018

@stevengj Ok. Also added the test for typemax(Int).

test/math.jl Outdated Show resolved Hide resolved
@thchr
Copy link
Contributor

thchr commented May 30, 2019

What did the final benchmark timings end up like?

@cossio cossio force-pushed the fix_hypot branch 5 times, most recently from 3ce778b to 7699f7d Compare January 12, 2020 22:33
@cossio
Copy link
Contributor Author

cossio commented Jan 12, 2020

I fixed the comments @simonbyrne @stevengj .
Using the example benchmark we were using at the beginning I get @thchr

julia> @btime hypot(3,4,5)
  1.539 ns (0 allocations: 0 bytes)
7.0710678118654755

which is at least as fast as the original implementation.

I also did some minor changes to the two-argument version (#31922, @cfborges), because it wasn't dealing well with dimensionful numbers (note that the Furlong tests were disabled for some reason). Now the two-argument version also passes the Furlong tests.

If there are no further issues (and tests pass) can we get this merged? The original implementation is broken anyway so this is a net improvement.

@cossio
Copy link
Contributor Author

cossio commented Jan 13, 2020

I cannot tell why the doctest build fails. Should I worry?

@cossio cossio requested a review from simonbyrne January 13, 2020 10:22
@cossio
Copy link
Contributor Author

cossio commented Jan 13, 2020

It looks like there have been some nice compiler improvements over time!
The following simpler code (originally suggested by @stevengj):

hypot(x::Number, xs::Number...) = _hypot(float.(promote(x, xs...))...)
function _hypot(x::T...) where {T<:Number}
    maxabs = maximum(abs, x)
    isnan(maxabs) && any(isinf, x) && return T(Inf)
    (iszero(maxabs) || isinf(maxabs)) && return maxabs
    return maxabs * sqrt(sum(y -> abs2(y / maxabs), x))
end

now passes all tests and is as fast or faster than the current implementation. Note that it also passes the tests added in #31922 @cfborges for the two-arg version, and is faster. What do you guys think? Should we replace it all with this? It is lovely and simple.

@cossio cossio force-pushed the fix_hypot branch 2 times, most recently from 9c16f9e to e6fb8c6 Compare January 14, 2020 11:31
@cossio
Copy link
Contributor Author

cossio commented Jan 14, 2020

Turns out the doctest failing was important. hypot(3, 4im) should return real 5, but was returning complex 5 + 0im. I fixed that and added two examples like this as ordinary tests.

@cossio
Copy link
Contributor Author

cossio commented Jan 14, 2020

I went ahead a committed the simplified varargs version, keeping the more complicated version with a separate two-args method in a separate commit in case we want to rollback.

I think we should just go ahead with this simpler code.

@cossio
Copy link
Contributor Author

cossio commented Jan 14, 2020

CI is failing with an ambiguity error that I don't understand.

@cfborges
Copy link
Contributor

I would make two general comments:

First: The current two argument hypot code achieves (or nearly achieves if there is no fma) a level of accuracy that is requested in the IEEE754 standard (i.e. correctly rounded result). There is no reason to undo that and replace it with an algorithm that doesn't.

Second: The IEEE754 standard indicates that hypot should be a two argument function. I am of the opinion that more than two arguments leads to the 2-norm and that should be a different function.

One man's point of view. Your mileage may vary.

@cossio
Copy link
Contributor Author

cossio commented Jan 14, 2020

If there is a test-case where the current two-args version achieves more accuracy, it would be nice to add it to the tests explicitly. Can you suggest an example @cfborges? I can add it in this PR. I was actually surprised the simple implementation passed all the tests that are already there.

As to the multiple args version, I agree that we can just use LinearAlgebra.norm. But for some reason the multi-args hypot was present before this PR and I think removing it at this point would be a breaking change.

Perhaps the best solution is to keep the current two-args version and use #30301 (comment) for the multi-args version.

@cossio cossio force-pushed the fix_hypot branch 2 times, most recently from 2ab5ce9 to c37b2ed Compare January 15, 2020 22:51
@Keno
Copy link
Member

Keno commented Jun 3, 2020

This needs a rebase, but @stevengj I assume you're still in favor?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Vararg version of hypot does overflow and underflow on master
10 participants