Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

inline ^ for literal powers of numbers #20637

Closed
wants to merge 3 commits into from

Conversation

stevengj
Copy link
Member

Taking advantage of #20530, this PR inlines x^p for literal powers p of numbers where power_by_squaring is normally called, excluding floating-point numbers which already use an LLVM intrinsic (#19890). As discussed in #20527, this has the advantages:

(Needs tests, doc updates.)

base/gmp.jl Outdated
@@ -416,6 +416,10 @@ function ^(x::BigInt, y::Culong)
ccall((:__gmpz_pow_ui, :libgmp), Void, (Ptr{BigInt}, Ptr{BigInt}, Culong), &z, &x, y)
return z
end
@generated function ^{p}(x::BigInt, ::Type{Val{p}})
p < 0 && return :(inv(x)^p)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

^(-p)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really need not be a generated function. Constant folding should be able to take care of this just fine.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would constant folding eliminate the type instability?

base/intfuncs.jl Outdated
# for numbers, inline the power_by_squaring method for literal p.
# (this also allows integer^-p to give a floating-point result
# in a type-stable fashion).
@generated function ^{p}(x::Number, ::Type{Val{p}})
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really shouldn't be @generated. It's too likely to cause runtime performance issues with limited type information and potentially other issues with compile-time convergence / termination.

The power_by_squaring definition here is also inaccurate except for ::Integer (llvm already implements this optimization for those), and Complex{<:Integer}. This definition would likely also be slower for non-bitstypes, such as BigInt.

Copy link
Member Author

@stevengj stevengj Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BigInt overrides this. We are already using power_by_squaring for Complex{Float} and in all other cases where this method is called, so there will be no accuracy regressions. "Inaccurate" seems like an exaggeration, since the roundoff errors grow extremely slowly, as O(sqrt(log(p))) I think.

Can you give an example of where this would cause a runtime performance issue compared to calling power_by_squaring?

Copy link
Member Author

@stevengj stevengj Feb 16, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems analogous to saying we should default to sum_kbn because sum(::Array) is "inaccurate." Yes, sum_kbn is slightly more accurate, but only slightly in most cases, and it is vastly slower.

@GlenHertz
Copy link
Contributor

Should it be: p < 0 && return :(inv(x)^abs(p)) ?

@StefanKarpinski
Copy link
Sponsor Member

What I had in mind when I described this as an experimental feature was that we don't use it by default but have a LiteralPowers package that can define this sort of thing for people to try out. Other packages like units packages could also define methods for literal powers.

@simonbyrne
Copy link
Contributor

Does this mean that:

n = -2
y = 2^n

and

y = 2^(-2)

will give different results? If so, I'm not sure I'm so keen about that.

@stevengj
Copy link
Member Author

@simonbyrne, yes, one case throws an error, as opposed to both cases. This was discussed in #20527.

@stevengj
Copy link
Member Author

stevengj commented Feb 17, 2017

@StefanKarpinski, if we only use it one package that hardly anyone uses, then we won't really get experience in whether it causes confusion, so we won't be able to decide whether to really use this in 1.0. If we use it for 2^-2, we'll learn quickly whether ordinary users get upset by this working differently from from 2^p.

@stevengj
Copy link
Member Author

Updated to use optimal addition chains for powers < 256. Also updated so that @fastmath uses this too.

@felixrehren
Copy link
Contributor

Optimal addition chains are so cool! 😍

ex = Union{Expr,Symbol}[] # expressions to compute intermediate powers
if p < 0
x′ = gensym()
push!(ex, :($x′ = inv($x)))
Copy link
Sponsor Member

@StefanKarpinski StefanKarpinski Feb 23, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Isn't this likely to be more accurate if the inversion happens at the end?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe, for integers, but in that case it's also likelier to overflow. For floating-point values, I don't see why it would be more accurate to put the inv at the end.

@StefanKarpinski
Copy link
Sponsor Member

Ironically, I'm seeing internal_pow not getting inlined on this branch, e.g.:

julia> function ff()
           x = 3^17
           return 2x + 1
       end
ff (generic function with 1 method)

julia> ff()
258280327

julia> @code_llvm ff()

define i64 @julia_ff_67552() #0 !dbg !5 {
top:
  %0 = call i64 @jlsys_internal_pow_53152(i64 3, i8** inttoptr (i64 4532589264 to i8**))
  %1 = shl i64 %0, 1
  %2 = or i64 %1, 1
  ret i64 %2
}

julia> @code_native ff()
	.section	__TEXT,__text,regular,pure_instructions
Filename: REPL[46]
	pushq	%rbp
	movq	%rsp, %rbp
Source line: 2
	movabsq	$internal_pow, %rax
	movabsq	$4532589264, %rsi       ## imm = 0x10E29D2D0
	movl	$3, %edi
	callq	*%rax
Source line: 3
	leaq	1(%rax,%rax), %rax
	popq	%rbp
	retq
	nopw	%cs:(%rax,%rax)

@stevengj
Copy link
Member Author

@StefanKarpinski, yes, it doesn't inline for large powers (I forget what the threshold is) because the code is long enough to fail the inlining heuristic. It's not clear to me what the desired behavior is, here. e.g. if you have several x^17 expressions, would you want them all to be inlined or would you want them all to call the same function that computes an unrolled x^17?

@StefanKarpinski
Copy link
Sponsor Member

I suppose it depends – if the whole thing can be boiled down to a constant, then you'd want to inline it, of course, but that's a separate optimization issue from this PR. I tried this patch:

diff --git a/base/intfuncs.jl b/base/intfuncs.jl
index 0585bf22b7..d57bc4d706 100644
--- a/base/intfuncs.jl
+++ b/base/intfuncs.jl
@@ -201,7 +201,7 @@ end
 # To avoid ambiguities for methods that dispatch on the
 # first argument, we dispatch the fallback via internal_pow:
 ^(x, p) = internal_pow(x, p)
-internal_pow{p}(x, ::Type{Val{p}}) = x^p
+@inline internal_pow{p}(x, ::Type{Val{p}}) = x^p

 # This table represents the optimal "power tree"
 # based on Knuth's "TAOCP vol 2: Seminumerical Algorithms",
@@ -305,10 +305,10 @@ inlined_pow(x::Symbol, p::Integer) = inlined_pow(x, Int(p))
 # the unrolled expression for literal p
 # (this also allows integer^-p to give a floating-point result
 #  in a type-stable fashion).
-@generated internal_pow{p}(x::Number, ::Type{Val{p}}) = inlined_pow(:x, p)
+@inline @generated internal_pow{p}(x::Number, ::Type{Val{p}}) = inlined_pow(:x, p)

 # for hardware floating-point types, we already call powi or powf
-internal_pow{p}(x::Union{Float32,Float64}, ::Type{Val{p}}) = x^p
+@inline internal_pow{p}(x::Union{Float32,Float64}, ::Type{Val{p}}) = x^p

Would that be beneficial to ensure that internal_pow is always inlined into x^p?

@StefanKarpinski
Copy link
Sponsor Member

Now that 0.6 is out with special-cased parsing of integer literal powers and the world has not come to a fiery death, we should consider these function changes (or similar ones) for 0.7/1.0.

@vtjnash vtjnash added the needs tests Unit tests are required for this change label Aug 3, 2017
@Keno
Copy link
Member

Keno commented Aug 3, 2017

There's was a suggestion to make sure that the various @code_ macros correctly recognize this and point at the literals.

@@ -443,9 +443,6 @@ end
^(x::Integer, y::BigInt ) = bigint_pow(BigInt(x), y)
^(x::Bool , y::BigInt ) = Base.power_by_squaring(x, y)

# override default inlining of x^2 and x^3 etc.
^{p}(x::BigInt, ::Type{Val{p}}) = x^p
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't we still call GMP in this case?

@StefanKarpinski
Copy link
Sponsor Member

There's some disagreement on the triage call about this. @JeffBezanson and @vtjnash feel that most of this should live in a package that people opt into if they want this level of sophistication in their exponentiation. It's also somewhat unclear when such large powers would be useful. I want at least the negative literal exponent part so that x^-2 will "just work"; having dimensional quantities be automatically type stable would also be nice. What is the observable difference here? Just the speed? Wouldn't there also be accuracy changes for floating-point types?

@stevengj
Copy link
Member Author

stevengj commented Oct 20, 2017

There would be some changes in the roundoff errors for Complex{Float64} and similar, but it shouldn't be any less accurate than the current method (power-by-squaring). Float64 and Float32 powers are unaffected. The main effect is the speed for literal powers > 3. I agree that maybe that could go into a package, though.

I'm not so worried about dimensional types, since those can and do easily support ^literal already.

The negative-exponent case is somewhat independent; I will have split it into a separate PR: #24240

@stevengj
Copy link
Member Author

We can probably remove the 1.0 milestone from this now that #24240 has merged the negative-power parts, since the remaining parts are effectively just an optimization?

@StefanKarpinski
Copy link
Sponsor Member

Agree: if we change this it should only be if it is both faster and at least as accurate, which seems like an acceptable 1.x change. It might change the results of someone's code but then so can changing CPUs or numbers of threads or just running threaded code again.

@StefanKarpinski StefanKarpinski removed this from the 1.0 milestone Oct 26, 2017
@musm
Copy link
Contributor

musm commented Dec 15, 2020

Is this something we should revisit ?

@stevengj
Copy link
Member Author

For now, I pulled this code out into a FastPow.jl package that provides a @fastpow macro to do this transformation.

@stevengj
Copy link
Member Author

stevengj commented Jul 7, 2021

Note that the https://github.com/JuliaMath/FastPow.jl package now exists to provide a @fastpow macro for enabling optimal addition-chain exponentiation in a block of code.

@oscardssmith
Copy link
Member

I think that between improvements in ^ (to use an integer power method) and @fastpow make this obsolete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:maths Mathematical functions needs tests Unit tests are required for this change performance Must go faster
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Exponentiation by a negative power throws a DomainError