Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use powi from x^p #43

Merged
merged 4 commits into from
Jan 7, 2019
Merged

Use powi from x^p #43

merged 4 commits into from
Jan 7, 2019

Conversation

tkf
Copy link
Contributor

@tkf tkf commented Jan 6, 2019

There was a problem in the dispatch of ^ and Vec{4,Float64}(1)^2 was evaluated as Vec{4,Float64}(1)^Vec{4,Float64}(2). This PR fixes it. (Originally found in #41)

Before this PR (uses float pow):

julia> @code_llvm Vec{4,Float64}(1)^2

;  @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:1059 within `^'
define void @"julia_^_12563"({ <4 x double> }* noalias nocapture sret, { <4 x double> } addrspace(11)* nocapture nonnull readonly dereferenceable(32), i64) {
top:
; ┌ @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:95 within `Type'
; │┌ @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:97 within `macro expansion'
; ││┌ @ float.jl:60 within `Type'
     %3 = sitofp i64 %2 to double
; ││└
    %4 = insertelement <4 x double> undef, double %3, i32 0
    %5 = shufflevector <4 x double> %4, <4 x double> undef, <4 x i32> zeroinitializer
; └└
;  @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:1059 within `^' @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:985
; ┌ @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:538 within `llvmwrap' @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:538
; │┌ @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:557 within `macro expansion'
; ││┌ @ sysimg.jl:18 within `getproperty'
     %6 = getelementptr inbounds { <4 x double> }, { <4 x double> } addrspace(11)* %1, i64 0, i32 0
; ││└
    %7 = load <4 x double>, <4 x double> addrspace(11)* %6, align 16
    %res.i = call <4 x double> @llvm.pow.v4f64(<4 x double> %7, <4 x double> %5)
; └└
;  @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:1059 within `^'
  %.sroa.0.0..sroa_idx = getelementptr inbounds { <4 x double> }, { <4 x double> }* %0, i64 0, i32 0
  store <4 x double> %res.i, <4 x double>* %.sroa.0.0..sroa_idx, align 32
  ret void
}

After this PR (uses int powi):

julia> @code_llvm Vec{4,Float64}(1)^2

;  @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:1018 within `^'
define void @"julia_^_13198"({ <4 x double> }* noalias nocapture sret, { <4 x double> } addrspace(11)* nocapture nonnull readonly dereferenceable(32), i64) {
top:
; ┌ @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:566 within `llvmwrap' @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:566
; │┌ @ /home/takafumi/.julia/dev/SIMD/src/SIMD.jl:584 within `macro expansion'
; ││┌ @ sysimg.jl:18 within `getproperty'
     %3 = getelementptr inbounds { <4 x double> }, { <4 x double> } addrspace(11)* %1, i64 0, i32 0
; ││└
    %4 = load <4 x double>, <4 x double> addrspace(11)* %3, align 16
    %res.i = call <4 x double> @llvm.powi.v4f64(<4 x double> %4, i64 %2)
; └└
  %.sroa.0.0..sroa_idx = getelementptr inbounds { <4 x double> }, { <4 x double> }* %0, i64 0, i32 0
  store <4 x double> %res.i, <4 x double>* %.sroa.0.0..sroa_idx, align 32
  ret void
}

@codecov-io
Copy link

codecov-io commented Jan 6, 2019

Codecov Report

Merging #43 into master will increase coverage by 0.15%.
The diff coverage is 89.47%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master      #43      +/-   ##
==========================================
+ Coverage   82.96%   83.12%   +0.15%     
==========================================
  Files           1        1              
  Lines         763      782      +19     
==========================================
+ Hits          633      650      +17     
- Misses        130      132       +2
Impacted Files Coverage Δ
src/SIMD.jl 83.12% <89.47%> (+0.15%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0e4d17c...e2fec95. Read the comment docs.

test/runtests.jl Outdated
@@ -235,7 +235,7 @@ using Test, InteractiveUtils
==, !=, <, <=, >, >=,
+, -, *, /, ^, copysign, flipsign, max, min, rem)
@test op(42, V4F64(v4f64)) === op(V4F64(42), V4F64(v4f64))
@test op(V4F64(v4f64), 42) === op(V4F64(v4f64), V4F64(42))
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like x^42 was too large for 32 bit machines? https://ci.appveyor.com/project/eschnett/simd-jl/builds/21409891

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test fails for 64-bit machines and succeeds for 32-bit machines.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I misread the table. But still, 4^42 (note: v4f64[end] == -4) is much larger than maxintfloat() so I suppose decreasing the exponent makes sense?

But actually, I don't understand why it works in 64 bit Linux and not in 64 bit Windows. Is it something we should worry about? Is there any explanation? Maybe Julia/LLVM emits different machine code just because they are on different machines (Travis log says ivybridge and Appveyor log says haswell)?

Copy link
Owner

@eschnett eschnett left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The expression 4^42 is not evaluated. The base is Float64, so the expression is 4.0^42, which does not overflow at all.

I don't know why this would fail on 64-bit Windows.

test/runtests.jl Outdated
@@ -235,7 +235,7 @@ using Test, InteractiveUtils
==, !=, <, <=, >, >=,
+, -, *, /, ^, copysign, flipsign, max, min, rem)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The operator ^ should not be in this test, since this test tests type promotion, and type promotion should not happen for ^. There should be a separate test for ^ testing integer exponents. The original test can then remain unchanged.

The new test for ^ could compare the result with a result obtained by repeated multiplication.

# `^(::ScalarTypes, v2::Vec)`.
@inline Base.:^(v1::Vec{N,T}, x2::IntegerTypes) where {N,T<:FloatingTypes} =
llvmwrap(Val{:powi}, v1, Int(x2))
@inline Base.:^(v1::Vec{N,T}, x2::Integer) where {N,T<:FloatingTypes} =
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are you using Base.:^ instead of Base. ^ here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just thought it's better since Base.:^ is more explicit. FYI, it looks like julia code base prefer Base.:^:

$ git grep 'Base\. ^'
$ git grep 'Base\.:^'
base/compiler/ssair/show.jl:^(s::String, i::Int) = Base.:^(s, i)
base/mathconstants.jl:    Base.:^(::Irrational{:ℯ}, x::T) = exp(x)
doc/src/base/math.md:Base.:^(::Number, ::Number)
doc/src/base/strings.md:Base.:^(::AbstractString, ::Integer)
stdlib/LinearAlgebra/docs/src/index.md:Base.:^(::AbstractMatrix, ::Number)
stdlib/LinearAlgebra/docs/src/index.md:Base.:^(::Number, ::AbstractMatrix)
stdlib/LinearAlgebra/src/dense.jl:Base.:^(b::Number, A::AbstractMatrix) = exp!(log(b)*A)
stdlib/LinearAlgebra/src/dense.jl:Base.:^(::Irrational{:ℯ}, A::AbstractMatrix) = exp(A)
test/math.jl:Base.:^(x::Number, y::Float22716) = x^(y.x)

But I don't mind using Base. ^. Let me know if I need to switch to it.

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I haven't seen this syntax before. If it works then it's fine. I'm mostly worried about consistency of style in the source code; people shouldn't wonder why there is ^ in one and :^ in other places. If you prefer :^, then I'd prefer a pull request that changes this everywhere.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like this is the only place Base.:^ is used but there are other similar cases like Base. %; I replaced them all.

@tkf
Copy link
Contributor Author

tkf commented Jan 6, 2019

Thanks for the review. I updated the test (and also rebased).

# Make sure our dispatching rule does not select floating point `pow`.
# See: https://github.com/eschnett/SIMD.jl/pull/43
ir = llvm_ir(^, (V4F64(v4f64), 2))
@test occursin("@llvm.powi.v4f64", ir)
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Base.:~(b::$Boolsz) = $Boolsz(~b.int)
Base.:!(b::$Boolsz) = ~b
Base.:&(b1::$Boolsz, b2::$Boolsz) = $Boolsz(b1.int & b2.int)
Base.:|(b1::$Boolsz, b2::$Boolsz) = $Boolsz(b1.int | b2.int)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are inside the comment so I wasn't sure if it's better to change them or leave them as-is. Let me know if I need to revert them (I'll remove this from the patch and then force-push).

@eschnett eschnett merged commit 89ca5b4 into eschnett:master Jan 7, 2019
@@ -560,6 +560,33 @@ end
end
end

# Functions taking two arguments, second argument is a scalar
@generated function llvmwrap(::Type{Val{Op}}, v1::Vec{N,T1},
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should be able to just use ccall("@llvm.powi.v4f64", llvmcall, RT, (AT...,), args...)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it specific to @llvm.powi? Can all other llvmwrap methods be implemented with ccall? Maybe only if there is only one instruction other than declare?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants