inline ^ for literal powers of numbers #20637

stevengj · 2017-02-16T22:16:09Z

Taking advantage of #20530, this PR inlines x^p for literal powers p of numbers where power_by_squaring is normally called, excluding floating-point numbers which already use an LLVM intrinsic (#19890). As discussed in #20527, this has the advantages:

Literal powers for dimensionful numbers automatically become type-stable, because they turn into a fixed number of * calls.
x^-n now works for integer x and literal n (closes Exponentiation by a negative power throws a DomainError #3024).
Speed improvements.

(Needs tests, doc updates.)

tkelman · 2017-02-16T22:27:08Z

base/gmp.jl

@@ -416,6 +416,10 @@ function ^(x::BigInt, y::Culong)
    ccall((:__gmpz_pow_ui, :libgmp), Void, (Ptr{BigInt}, Ptr{BigInt}, Culong), &z, &x, y)
    return z
 end
+@generated function ^{p}(x::BigInt, ::Type{Val{p}})
+    p < 0 && return :(inv(x)^p)


This really need not be a generated function. Constant folding should be able to take care of this just fine.

Would constant folding eliminate the type instability?

vtjnash · 2017-02-16T23:00:06Z

base/intfuncs.jl

+# for numbers, inline the power_by_squaring method for literal p.
+# (this also allows integer^-p to give a floating-point result
+#  in a type-stable fashion).
+@generated function ^{p}(x::Number, ::Type{Val{p}})


This really shouldn't be @generated. It's too likely to cause runtime performance issues with limited type information and potentially other issues with compile-time convergence / termination.

The power_by_squaring definition here is also inaccurate except for ::Integer (llvm already implements this optimization for those), and Complex{<:Integer}. This definition would likely also be slower for non-bitstypes, such as BigInt.

BigInt overrides this. We are already using power_by_squaring for Complex{Float} and in all other cases where this method is called, so there will be no accuracy regressions. "Inaccurate" seems like an exaggeration, since the roundoff errors grow extremely slowly, as O(sqrt(log(p))) I think.

Can you give an example of where this would cause a runtime performance issue compared to calling power_by_squaring?

It seems analogous to saying we should default to sum_kbn because sum(::Array) is "inaccurate." Yes, sum_kbn is slightly more accurate, but only slightly in most cases, and it is vastly slower.

GlenHertz · 2017-02-17T00:16:26Z

Should it be: p < 0 && return :(inv(x)^abs(p)) ?

StefanKarpinski · 2017-02-17T00:33:32Z

What I had in mind when I described this as an experimental feature was that we don't use it by default but have a LiteralPowers package that can define this sort of thing for people to try out. Other packages like units packages could also define methods for literal powers.

simonbyrne · 2017-02-17T08:57:07Z

Does this mean that:

n = -2
y = 2^n

and

y = 2^(-2)

will give different results? If so, I'm not sure I'm so keen about that.

stevengj · 2017-02-17T11:52:08Z

@simonbyrne, yes, one case throws an error, as opposed to both cases. This was discussed in #20527.

stevengj · 2017-02-17T11:55:09Z

@StefanKarpinski, if we only use it one package that hardly anyone uses, then we won't really get experience in whether it causes confusion, so we won't be able to decide whether to really use this in 1.0. If we use it for 2^-2, we'll learn quickly whether ordinary users get upset by this working differently from from 2^p.

stevengj · 2017-02-21T22:07:43Z

Updated to use optimal addition chains for powers < 256. Also updated so that @fastmath uses this too.

felixrehren · 2017-02-22T08:50:59Z

Optimal addition chains are so cool! 😍

StefanKarpinski · 2017-02-23T16:06:05Z

base/intfuncs.jl

+    ex = Union{Expr,Symbol}[] # expressions to compute intermediate powers
+    if p < 0
+        x′ = gensym()
+        push!(ex, :($x′ = inv($x)))


Isn't this likely to be more accurate if the inversion happens at the end?

Maybe, for integers, but in that case it's also likelier to overflow. For floating-point values, I don't see why it would be more accurate to put the inv at the end.

StefanKarpinski · 2017-02-23T17:38:32Z

Ironically, I'm seeing internal_pow not getting inlined on this branch, e.g.:

julia> function ff()
           x = 3^17
           return 2x + 1
       end
ff (generic function with 1 method)

julia> ff()
258280327

julia> @code_llvm ff()

define i64 @julia_ff_67552() #0 !dbg !5 {
top:
  %0 = call i64 @jlsys_internal_pow_53152(i64 3, i8** inttoptr (i64 4532589264 to i8**))
  %1 = shl i64 %0, 1
  %2 = or i64 %1, 1
  ret i64 %2
}

julia> @code_native ff()
	.section	__TEXT,__text,regular,pure_instructions
Filename: REPL[46]
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 2
	movabsq	$internal_pow, %rax
	movabsq	$4532589264, %rsi       ## imm = 0x10E29D2D0
	movl	$3, %edi
	callq	*%rax
Source line: 3
	leaq	1(%rax,%rax), %rax
	popq	%rbp
	retq
	nopw	%cs:(%rax,%rax)

stevengj · 2017-02-23T17:43:50Z

@StefanKarpinski, yes, it doesn't inline for large powers (I forget what the threshold is) because the code is long enough to fail the inlining heuristic. It's not clear to me what the desired behavior is, here. e.g. if you have several x^17 expressions, would you want them all to be inlined or would you want them all to call the same function that computes an unrolled x^17?

StefanKarpinski · 2017-02-23T17:52:16Z

I suppose it depends – if the whole thing can be boiled down to a constant, then you'd want to inline it, of course, but that's a separate optimization issue from this PR. I tried this patch:

diff --git a/base/intfuncs.jl b/base/intfuncs.jl
index 0585bf22b7..d57bc4d706 100644
--- a/base/intfuncs.jl
+++ b/base/intfuncs.jl
@@ -201,7 +201,7 @@ end
 # To avoid ambiguities for methods that dispatch on the
 # first argument, we dispatch the fallback via internal_pow:
 ^(x, p) = internal_pow(x, p)
-internal_pow{p}(x, ::Type{Val{p}}) = x^p
+@inline internal_pow{p}(x, ::Type{Val{p}}) = x^p

 # This table represents the optimal "power tree"
 # based on Knuth's "TAOCP vol 2: Seminumerical Algorithms",
@@ -305,10 +305,10 @@ inlined_pow(x::Symbol, p::Integer) = inlined_pow(x, Int(p))
 # the unrolled expression for literal p
 # (this also allows integer^-p to give a floating-point result
 #  in a type-stable fashion).
-@generated internal_pow{p}(x::Number, ::Type{Val{p}}) = inlined_pow(:x, p)
+@inline @generated internal_pow{p}(x::Number, ::Type{Val{p}}) = inlined_pow(:x, p)

 # for hardware floating-point types, we already call powi or powf
-internal_pow{p}(x::Union{Float32,Float64}, ::Type{Val{p}}) = x^p
+@inline internal_pow{p}(x::Union{Float32,Float64}, ::Type{Val{p}}) = x^p

Would that be beneficial to ensure that internal_pow is always inlined into x^p?

StefanKarpinski · 2017-07-24T18:49:27Z

Now that 0.6 is out with special-cased parsing of integer literal powers and the world has not come to a fiery death, we should consider these function changes (or similar ones) for 0.7/1.0.

Keno · 2017-08-03T21:30:22Z

There's was a suggestion to make sure that the various @code_ macros correctly recognize this and point at the literals.

JeffBezanson · 2017-08-31T18:22:32Z

base/gmp.jl

@@ -443,9 +443,6 @@ end
 ^(x::Integer, y::BigInt ) = bigint_pow(BigInt(x), y)
 ^(x::Bool   , y::BigInt ) = Base.power_by_squaring(x, y)

-# override default inlining of x^2 and x^3 etc.
-^{p}(x::BigInt, ::Type{Val{p}}) = x^p


Shouldn't we still call GMP in this case?

StefanKarpinski · 2017-08-31T18:25:09Z

There's some disagreement on the triage call about this. @JeffBezanson and @vtjnash feel that most of this should live in a package that people opt into if they want this level of sophistication in their exponentiation. It's also somewhat unclear when such large powers would be useful. I want at least the negative literal exponent part so that x^-2 will "just work"; having dimensional quantities be automatically type stable would also be nice. What is the observable difference here? Just the speed? Wouldn't there also be accuracy changes for floating-point types?

stevengj · 2017-10-20T18:03:20Z

There would be some changes in the roundoff errors for Complex{Float64} and similar, but it shouldn't be any less accurate than the current method (power-by-squaring). Float64 and Float32 powers are unaffected. The main effect is the speed for literal powers > 3. I agree that maybe that could go into a package, though.

I'm not so worried about dimensional types, since those can and do easily support ^literal already.

The negative-exponent case is somewhat independent; I ~~will~~ have split it into a separate PR: #24240

stevengj · 2017-10-26T19:44:17Z

We can probably remove the 1.0 milestone from this now that #24240 has merged the negative-power parts, since the remaining parts are effectively just an optimization?

StefanKarpinski · 2017-10-26T22:10:17Z

Agree: if we change this it should only be if it is both faster and at least as accurate, which seems like an acceptable 1.x change. It might change the results of someone's code but then so can changing CPUs or numbers of threads or just running threaded code again.

musm · 2020-12-15T18:40:30Z

Is this something we should revisit ?

stevengj · 2021-05-28T14:58:46Z

For now, I pulled this code out into a FastPow.jl package that provides a @fastpow macro to do this transformation.

stevengj · 2021-07-07T00:45:02Z

Note that the https://github.com/JuliaMath/FastPow.jl package now exists to provide a @fastpow macro for enabling optimal addition-chain exponentiation in a block of code.

oscardssmith · 2023-01-18T17:07:44Z

I think that between improvements in ^ (to use an integer power method) and @fastpow make this obsolete.

tkelman reviewed Feb 16, 2017

View reviewed changes

vtjnash reviewed Feb 16, 2017

View reviewed changes

stevengj mentioned this pull request Feb 17, 2017

minimal inlining of x^Val{p} for p=0,1,2,3 and x::Number #20648

Merged

kshyatt added domain:maths Mathematical functions performance Must go faster labels Feb 20, 2017

stevengj force-pushed the inlinepow branch from 96cacfb to 0883f64 Compare February 21, 2017 22:06

StefanKarpinski reviewed Feb 23, 2017

View reviewed changes

stevengj mentioned this pull request Feb 24, 2017

inline x^literal only for hardware-based number types x #20782

Merged

stevengj added 3 commits February 27, 2017 10:02

inline ^ for literal powers of numbers

9929e47

use optimal addition chain for x^Val{p}, including in at-fastmath

98410dd

be more conservative about types to inline powers

d2fc3e4

stevengj force-pushed the inlinepow branch from 0883f64 to d2fc3e4 Compare February 27, 2017 15:12

stevengj mentioned this pull request Jul 21, 2017

Make @fastmath aware of Base.literal_pow #21099

Closed

StefanKarpinski added this to the 1.0 milestone Jul 24, 2017

vtjnash added the needs tests Unit tests are required for this change label Aug 3, 2017

StefanKarpinski mentioned this pull request Aug 3, 2017

unexpected code generation for multivariate monomials #23118

Closed

JeffBezanson reviewed Aug 31, 2017

View reviewed changes

KristofferC mentioned this pull request Sep 21, 2017

Surprising performance for ^(::Float64, ::Int) vs ^(::Complex128, ::Int) #23804

Closed

stevengj mentioned this pull request Oct 20, 2017

make x^-n equivalent to inv(x)^n for literal n #24240

Merged

StefanKarpinski removed this from the 1.0 milestone Oct 26, 2017

oscardssmith closed this Jan 18, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

inline ^ for literal powers of numbers #20637

inline ^ for literal powers of numbers #20637

stevengj commented Feb 16, 2017

tkelman Feb 16, 2017

Keno Feb 16, 2017

stevengj Feb 16, 2017

vtjnash Feb 16, 2017

stevengj Feb 16, 2017 •

edited

Loading

stevengj Feb 16, 2017 •

edited

Loading

GlenHertz commented Feb 17, 2017

StefanKarpinski commented Feb 17, 2017

simonbyrne commented Feb 17, 2017

stevengj commented Feb 17, 2017

stevengj commented Feb 17, 2017 •

edited

Loading

stevengj commented Feb 21, 2017

felixrehren commented Feb 22, 2017

StefanKarpinski Feb 23, 2017 •

edited

Loading

stevengj Feb 23, 2017

StefanKarpinski commented Feb 23, 2017

stevengj commented Feb 23, 2017

StefanKarpinski commented Feb 23, 2017

StefanKarpinski commented Jul 24, 2017

Keno commented Aug 3, 2017

JeffBezanson Aug 31, 2017

StefanKarpinski commented Aug 31, 2017

stevengj commented Oct 20, 2017 •

edited

Loading

stevengj commented Oct 26, 2017

StefanKarpinski commented Oct 26, 2017

musm commented Dec 15, 2020

stevengj commented May 28, 2021

stevengj commented Jul 7, 2021

oscardssmith commented Jan 18, 2023

inline ^ for literal powers of numbers #20637

inline ^ for literal powers of numbers #20637

Conversation

stevengj commented Feb 16, 2017

tkelman Feb 16, 2017

Choose a reason for hiding this comment

Keno Feb 16, 2017

Choose a reason for hiding this comment

stevengj Feb 16, 2017

Choose a reason for hiding this comment

vtjnash Feb 16, 2017

Choose a reason for hiding this comment

stevengj Feb 16, 2017 • edited Loading

Choose a reason for hiding this comment

stevengj Feb 16, 2017 • edited Loading

Choose a reason for hiding this comment

GlenHertz commented Feb 17, 2017

StefanKarpinski commented Feb 17, 2017

simonbyrne commented Feb 17, 2017

stevengj commented Feb 17, 2017

stevengj commented Feb 17, 2017 • edited Loading

stevengj commented Feb 21, 2017

felixrehren commented Feb 22, 2017

StefanKarpinski Feb 23, 2017 • edited Loading

Choose a reason for hiding this comment

stevengj Feb 23, 2017

Choose a reason for hiding this comment

StefanKarpinski commented Feb 23, 2017

stevengj commented Feb 23, 2017

StefanKarpinski commented Feb 23, 2017

StefanKarpinski commented Jul 24, 2017

Keno commented Aug 3, 2017

JeffBezanson Aug 31, 2017

Choose a reason for hiding this comment

StefanKarpinski commented Aug 31, 2017

stevengj commented Oct 20, 2017 • edited Loading

stevengj commented Oct 26, 2017

StefanKarpinski commented Oct 26, 2017

musm commented Dec 15, 2020

stevengj commented May 28, 2021

stevengj commented Jul 7, 2021

oscardssmith commented Jan 18, 2023

stevengj Feb 16, 2017 •

edited

Loading

stevengj Feb 16, 2017 •

edited

Loading

stevengj commented Feb 17, 2017 •

edited

Loading

StefanKarpinski Feb 23, 2017 •

edited

Loading

stevengj commented Oct 20, 2017 •

edited

Loading