Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Constant Folding openlibm functions #9942

Open
jwmerrill opened this issue Jan 27, 2015 · 14 comments
Open

Constant Folding openlibm functions #9942

jwmerrill opened this issue Jan 27, 2015 · 14 comments
Labels
maths Mathematical functions performance Must go faster

Comments

@jwmerrill
Copy link

https://groups.google.com/forum/#!topic/julia-users/Jndl9sYwj5Q reports a performance regression in some simple code from a blog post that was meant to illustrate the importance of type stability: http://www.johnmyleswhite.com/notebook/2013/12/06/writing-type-stable-code-in-julia/

The code is

function sumofsins2(n::Integer)  
    r = 0.0  
    for i in 1:n  
        r += sin(3.4)  
    end  
    return r  
end

and the blog post gives the output (as of Julia 2.x) of code_llvm(sumofsins2, (Int,))

define double @julia_sumofsins21068(i64) {  
top:  
  %1 = icmp slt i64 %0, 1, !dbg !5151  
  br i1 %1, label %L2, label %pass, !dbg !5151  

pass:                                             ; preds = %top, %pass  
  %"#s6.04" = phi i64 [ %3, %pass ], [ 1, %top ]  
  %r.03 = phi double [ %2, %pass ], [ 0.000000e+00, %top ]  
  %2 = fadd double %r.03, 0xBFD05AC910FF4C6C, !dbg !5156  
  %3 = add i64 %"#s6.04", 1, !dbg !5156  
  %4 = icmp sgt i64 %3, %0, !dbg !5151  
  br i1 %4, label %L2, label %pass, !dbg !5151  

L2:                                               ; preds = %pass, %top  
  %r.0.lcssa = phi double [ 0.000000e+00, %top ], [ %2, %pass ]  
  ret double %r.0.lcssa, !dbg !5157  
} 

In this IR code, the sin(3.4) has been constant folded to 0xBFD05AC910FF4C6C.

As of Julia 3.4, the new llvm code is

define double @"julia_sumofsins2;20064"(i64) {
top:
  %1 = icmp sgt i64 %0, 0, !dbg !841
  br i1 %1, label %L, label %L3, !dbg !841

L:                                                ; preds = %top, %pass
  %r.0 = phi double [ %6, %pass ], [ 0.000000e+00, %top ]
  %"#s119.0" = phi i64 [ %5, %pass ], [ 1, %top ]
  %2 = call double inttoptr (i64 4551878240 to double (double)*)(double 3.400000e+00), !dbg !842
  %3 = fcmp ord double %2, 0.000000e+00, !dbg !842
  br i1 %3, label %pass, label %fail, !dbg !842

fail:                                             ; preds = %L
  %4 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !842, !tbaa %jtbaa_const
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %4, i32 4), !dbg !842
  unreachable, !dbg !842

pass:                                             ; preds = %L
  %5 = add i64 %"#s119.0", 1, !dbg !841
  %6 = fadd double %r.0, %2, !dbg !842
  %7 = icmp eq i64 %"#s119.0", %0, !dbg !842
  br i1 %7, label %L3, label %L, !dbg !842

L3:                                               ; preds = %pass, %top
  %r.1 = phi double [ 0.000000e+00, %top ], [ %6, %pass ]
  ret double %r.1, !dbg !845
}

so it looks like the call to sin is no longer being constant folded.

I haven't checked the generated code on master, but it sounds like the performance regression is still present there.

@simonster
Copy link
Member

On 0.2.1:

julia> code_llvm(sin, (Float64,))

define double @julia_sin(double) {
top:
  %1 = call double @sin(double %0), !dbg !3342
  %2 = fcmp ord double %1, 0.000000e+00, !dbg !3342
  %3 = fcmp uno double %0, 0.000000e+00, !dbg !3342
  %4 = or i1 %2, %3, !dbg !3342
  br i1 %4, label %pass, label %fail, !dbg !3342

fail:                                             ; preds = %top
  %5 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !3342
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %5, i32 282), !dbg !3342
  unreachable, !dbg !3342

pass:                                             ; preds = %top
  ret double %1, !dbg !3342
}

On master:

define double @julia_sin_43255(double) {
top:
  %1 = call double inttoptr (i64 4668253248 to double (double)*)(double %0), !dbg !265
  %2 = fcmp ord double %1, 0.000000e+00, !dbg !265
  %3 = fcmp uno double %0, 0.000000e+00, !dbg !265
  %4 = or i1 %2, %3, !dbg !265
  br i1 %4, label %pass, label %fail, !dbg !265

fail:                                             ; preds = %top
  %5 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !265, !tbaa %jtbaa_const
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %5, i32 123), !dbg !265
  unreachable, !dbg !265

pass:                                             ; preds = %top
  ret double %1, !dbg !265
}

I'd guess that for LLVM to constant fold sin, it has to know that it's named sin, and somewhere along the line (precompilation?) we stopped giving it that information.

@vtjnash
Copy link
Sponsor Member

vtjnash commented Jan 27, 2015

correct. we now force llvm to use the sin in libopenlibm, rather than giving it the freedom to pick any function named sin

@dhoegh
Copy link
Contributor

dhoegh commented Jan 28, 2015

I do not get the performance from constant folding, here is the output from code_llvm(sumofsins2, (Int,)).

julia> code_llvm(sumofsins2, (Int,))

define double @julia_sumofsins2_1172(i64) {
top:
  %1 = icmp sgt i64 %0, 0, !dbg !3561
  br i1 %1, label %L, label %L3, !dbg !3561

L:                                                ; preds = %top, %pass
  %r.0 = phi double [ %6, %pass ], [ 0.000000e+00, %top ]
  %"#s3.0" = phi i64 [ %5, %pass ], [ 1, %top ]
  %2 = call double inttoptr (i64 1752503104 to double (double)*)(double 3.400000e+00), !dbg !3562
  %3 = fcmp ord double %2, 0.000000e+00, !dbg !3562
  br i1 %3, label %pass, label %fail, !dbg !3562

fail:                                             ; preds = %L
  %4 = load %jl_value_t** @jl_domain_exception, align 8, !dbg !3562, !tbaa %jtbaa_const
  call void @jl_throw_with_superfluous_argument(%jl_value_t* %4, i32 4), !dbg !3562
  unreachable, !dbg !3562

pass:                                             ; preds = %L
  %5 = add i64 %"#s3.0", 1, !dbg !3561
  %6 = fadd double %r.0, %2, !dbg !3562
  %7 = icmp eq i64 %"#s3.0", %0, !dbg !3562
  br i1 %7, label %L3, label %L, !dbg !3562

L3:                                               ; preds = %pass, %top
  %r.1 = phi double [ 0.000000e+00, %top ], [ %6, %pass ]
  ret double %r.1, !dbg !3565
}

My version info is:

julia> versioninfo()
Julia Version 0.4.0-dev+2847
Commit fc61385 (2015-01-21 18:34 UTC)
Platform Info:
  System: Windows (x86_64-w64-mingw32)
  CPU: Intel(R) Core(TM) i7-2630QM CPU @ 2.00GHz
  WORD_SIZE: 64
  BLAS: libopenblas (USE64BITINT DYNAMIC_ARCH NO_AFFINITY Sandybridge)
  LAPACK: libopenblas
  LIBM: libopenlibm
  LLVM: libLLVM-3.3

@vtjnash vtjnash changed the title Constant Folding Regression Constant Folding openlibm functions Jan 29, 2015
@ihnorton ihnorton added the performance Must go faster label Mar 24, 2015
@eschnett
Copy link
Contributor

LLVM defines intrinsics for many operations (sqrt, sin, etc). It seems we need to generate these intrinsics instead of libcalls to benefit from LLVM's optimizations. Does this sound reasonable? This looks to be straightforward-but-tedious.

@simonster
Copy link
Member

We would also need to make the LLVM intrinsics call openlibm instead of the system libm.

@eschnett
Copy link
Contributor

Can you remind me why we need to call openlibm? This prevents not only constant folding, but also using hardware instructions if they are available, e.g. vsqrtpd on Intel.

LLVM tries to intercept libcalls to certain well-known functions, such as sqrt in the global namespace. I assume (couldn't find the logic yet) that it thus also replaces intrinsics by such libcalls if necessary. To ensure it calls openlibm instead of libm, we would need to declare these openlibm functions to LLVM's code generators.

@eschnett
Copy link
Contributor

Regarding making LLVM call openlibm instead of libm: If Julia is configured with UNTRUSTED_SYSTEM_LIBM, then it uses openlibm instead of libm -- this includes code generated by LLVM. That is:

  • either we always want to use openlibm for generated code; we can then simply always configure with this option (suitably renamed)
  • or we only want to use openlibm if libm is broken; in this case, the existing configure mechanism should suffice.

What am I missing?

Given this, adding new intrinsics is easy. I added the sin intrinsic for fun -- it's three lines in src/intrinsics.cpp, and the respective change in base/math.jl.

@eschnett
Copy link
Contributor

See https://github.com/eschnett/julia/tree/math-intrinsics for the prototype implementation.

@eschnett
Copy link
Contributor

I looked at http://www.johnmyleswhite.com/notebook/2013/12/06/writing-type-stable-code-in-julia/ again with this branch, and there is unfortunately no speedup. LLVM correctly recognizes sees the sin intrinsic, but it still doesn't constant-fold. For reference, here is the inner loop as generate on my system:

L78:    vmovsd  %xmm0, -24(%rbp)
Source line: 12
    vmovsd  -32(%rbp), %xmm0
    callq   *%r14
    vucomisd    %xmm0, %xmm0
    jp  L132
    vmovsd  -24(%rbp), %xmm1
    vaddsd  %xmm0, %xmm1, %xmm1
    vmovaps %xmm1, %xmm0
    decq    %rbx
    jne L78

The call etc. is still there. Thus, to enable constant-folding, it is likely necessary to address #5234.

@eschnett
Copy link
Contributor

Update: No speedup when building against LLVM 3.3 (the default), but with LLVM 3.6, the speedup is restored. Apparently LLVM 3.3 still lacks certain optimizations for sin.

@stevengj
Copy link
Member

stevengj commented Dec 8, 2015

Note that #14324 is also relevant here — this issue is not really specific to libm functions and LLVM intrinsics.

@KristofferC
Copy link
Sponsor Member

Any idea how to progress on this? This feels a bit sad (latest master)

f(x) = cos(x) + cos(x)

function f2(x)
    c = cos(x)
    c + c
end

@time for i = 1:10^6 f(2.0) end,
#  0.021374 seconds

@time for i = 1:10^6 f2(2.0) end
#  0.010027 seconds

@stevengj
Copy link
Member

stevengj commented Apr 21, 2016

@KristofferC, that particular case would involve CSE (common subexpression elimination) for @pure functions, which is a bit different than the issue here.

@oscardssmith
Copy link
Member

At this point, we only use libm for expm1, log2, log10, and some rounding functions. log2 and log10 both could probably be moved to pure Julia pretty easily. I'll look into it.

@ViralBShah ViralBShah added the maths Mathematical functions label Mar 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
maths Mathematical functions performance Must go faster
Projects
None yet
Development

No branches or pull requests

10 participants