Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allocates when precompiled #35972

Closed
GunnarFarneback opened this issue May 21, 2020 · 17 comments · Fixed by #43990
Closed

Allocates when precompiled #35972

GunnarFarneback opened this issue May 21, 2020 · 17 comments · Fixed by #43990
Labels
compiler:precompilation Precompilation of modules performance Must go faster

Comments

@GunnarFarneback
Copy link
Contributor

If I make a package M with this content of src/M.jl:

module M
f(x) = x > 0 ? cbrt(x) : 0.0
precompile(f, (Float64,))
end

and run

using M, BenchmarkTools
x1 = fill(0.0, 1000000);
x2 = fill(1.0, 1000000);
@btime M.f.($x1);
@btime M.f.($x2);

the result is

  842.301 μs (2 allocations: 7.63 MiB)
  20.586 ms (2000002 allocations: 38.15 MiB)

Without precompile(f, (Float64,)) the excessive allocations go away.

@GunnarFarneback
Copy link
Contributor Author

Minimized from

using Colors, BenchmarkTools
x1 = fill(XYZ{Float32}(0,0,0), 1000, 1000);
x2 = fill(XYZ{Float32}(1,0,0), 1000, 1000);
@btime convert.(Lab, $x1);
@btime convert.(Lab, $x2);

@kimikage
Copy link
Contributor

kimikage commented May 23, 2020

As mentioned in JuliaGraphics/Colors.jl#425, this problem is more serious than the impression given by the MWE.

In the case of MWE, f() will be eventually recompiled because f() is in a "shallow" place. However, in the Colors.jl, cbrt is in "deep" places from the public API (e.g. colordiff and distinguishable_colors), so it is difficult to avoid the problem.

This problem seems to have nothing to do with type instability. The problem occurs even if the concrete type is specified explicitly.

As far as I can tell, not only cbrt but also exp has the same problem. Their common point is Union{Float32, Float64}. The problem doesn't occur with log, which has separate log(::Float32) and log(::Float64).

@kimikage
Copy link
Contributor

kimikage commented Jun 4, 2021

The difference in the likelihood of problems depending on the type of function may be due to the difference in the cost of inlining. If so, this could be a variation of #34055.

@kimikage
Copy link
Contributor

kimikage commented Jun 8, 2021

IIRC, in the typical cases, this could be avoided by not running precompile().

However, at least in Julia v1.6.1, there seems to be a case where __precompile__(false) is required.

@kimikage
Copy link
Contributor

There is a difference in whether the trigger is a precompilation or a JIT compilation, but the phenomenon is similar to #35537.

@kimikage
Copy link
Contributor

FYI, here is an MWE that shows that this is related to inlining.

module Issue35972

lg20(x) = x > 0.0 ? -f20(1.0 - x) : 0.0
lg21(x) = x > 0.0 ? -f21(1.0 - x) : 0.0

# inline-worthy
f20(x) = @evalpoly(x,
    0.0, 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, 1/8, 1/9,
    1/10, 1/11, 1/12, 1/13, 1/14, 1/15, 1/16, 1/17, 1/18, 1/19,
    1/20)

# not inline-worthy
f21(x) = @evalpoly(x,
    0.0, 1/1, 1/2, 1/3, 1/4, 1/5, 1/6, 1/7, 1/8, 1/9,
    1/10, 1/11, 1/12, 1/13, 1/14, 1/15, 1/16, 1/17, 1/18, 1/19,
    1/20, 1/21)

precompile(lg20, (Float64,))
precompile(lg21, (Float64,))

end # module
julia> using Issue35972

julia> x = rand(1_000_000);

julia> Issue35972.lg20.(x);

julia> Issue35972.lg21.(x);

julia> @time Issue35972.lg20.(x);
  0.004678 seconds (4 allocations: 7.629 MiB)

julia> @time Issue35972.lg21.(x);
  0.022517 seconds (2.00 M allocations: 38.147 MiB)

julia> versioninfo()
Julia Version 1.8.0-DEV.333
Commit c12e63fba3 (2021-08-11 02:09 UTC)
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LIBM: libopenlibm
  LLVM: libLLVM-12.0.1 (ORCJIT, tigerlake)

@timholy
Copy link
Member

timholy commented Aug 20, 2021

In the demo in the OP (and every other one I've tried), running the second @btime under @profile and then viewing the results with Profile(; C=true) contains quite a few sections that contain [unknown stackframe]:

   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 33   @Base/broadcast.jl:670; _broadcast_getindex_evalf
  5╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎  33   @M/src/M.jl:3; f
  2╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   2    /home/tim/src/julia-master/cli/loader_exe.c:14; jl_get_pgcstack_static
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    /home/tim/src/julia-master/src/gc.c:1205; jl_gc_pool_alloc
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    /home/tim/src/julia-master/src/gc.c:880; maybe_collect
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    /home/tim/src/julia-master/src/gc.c:1229; jl_gc_pool_alloc
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    /home/tim/src/julia-master/src/gc.c:1246; jl_gc_pool_alloc
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    /home/tim/src/julia-master/src/gf.c:2250; jl_invoke
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    /home/tim/src/julia-master/src/gf.c:2251; jl_invoke
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   1    /home/tim/src/julia-master/src/gf.c:2252; jl_invoke
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    1    /home/tim/src/julia-master/src/gf.c:2224; _jl_invoke
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎   20   [unknown stackframe]
  2╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/special/cbrt.jl:0; cbrt(x::Float64)
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    2    @Base/special/cbrt.jl:146; cbrt(x::Float64)
   ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎     2    @Base/special/cbrt.jl:68; _approx_cbrt
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/int.jl:87; +
  1╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎    ╎ 1    @Base/int.jl:280; div

Also worth noting that the line number for the statement after the [unknown_stackframe] is 0.

@timholy
Copy link
Member

timholy commented Aug 20, 2021

This is also interesting:

julia> using M, BenchmarkTools, Profile

julia> @code_llvm M.f(3.2)
;  @ /tmp/pkgs/M/src/M.jl:3 within `f`
define double @julia_f_273(double %0) #0 {
top:
  %1 = alloca {}*, align 8
  %gcframe7 = alloca [3 x {}*], align 16
  %gcframe7.sub = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 0
  %2 = bitcast [3 x {}*]* %gcframe7 to i8*
  call void @llvm.memset.p0i8.i32(i8* nonnull align 16 dereferenceable(24) %2, i8 0, i32 24, i1 false)
  %thread_ptr = call i8* asm "movq %fs:0, $0", "=r"() #6
  %ppgcstack_i8 = getelementptr i8, i8* %thread_ptr, i64 -8
  %ppgcstack = bitcast i8* %ppgcstack_i8 to {}****
  %pgcstack = load {}***, {}**** %ppgcstack, align 8
; ┌ @ operators.jl:378 within `>`
; │┌ @ float.jl:490 within `<` @ float.jl:444
    %3 = bitcast [3 x {}*]* %gcframe7 to i64*
    store i64 4, i64* %3, align 16
    %4 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 1
    %5 = bitcast {}** %4 to {}***
    %6 = load {}**, {}*** %pgcstack, align 8
    store {}** %6, {}*** %5, align 8
    %7 = bitcast {}*** %pgcstack to {}***
    store {}** %gcframe7.sub, {}*** %7, align 8
    %8 = fcmp ule double %0, 0.000000e+00
; └└
  br i1 %8, label %L8, label %L6

L6:                                               ; preds = %top
  %ptls_field3 = getelementptr inbounds {}**, {}*** %pgcstack, i64 2305843009213693954
  %9 = bitcast {}*** %ptls_field3 to i8**
  %ptls_load45 = load i8*, i8** %9, align 8
  %10 = call noalias nonnull {}* @jl_gc_pool_alloc(i8* %ptls_load45, i32 1392, i32 16) #2
  %11 = bitcast {}* %10 to i64*
  %12 = getelementptr inbounds i64, i64* %11, i64 -1
  store atomic i64 140704265234768, i64* %12 unordered, align 8
  %13 = bitcast {}* %10 to double*
  store double %0, double* %13, align 8
  %14 = getelementptr inbounds [3 x {}*], [3 x {}*]* %gcframe7, i64 0, i64 2
  store {}* %10, {}** %14, align 16
  store {}* %10, {}** %1, align 8
  %15 = call nonnull {}* @jl_invoke({}* inttoptr (i64 140704310911760 to {}*), {}** nonnull %1, i32 1, {}* inttoptr (i64 140704435100720 to {}*))
  %16 = bitcast {}* %15 to double*
  %17 = load double, double* %16, align 8
  %18 = load {}*, {}** %4, align 8
  %19 = bitcast {}*** %pgcstack to {}**
  store {}* %18, {}** %19, align 8
  ret double %17

L8:                                               ; preds = %top
  %20 = load {}*, {}** %4, align 8
  %21 = bitcast {}*** %pgcstack to {}**
  store {}* %20, {}** %21, align 8
  ret double 0.000000e+00
}

julia> x2 = fill(1.0, 1000000);

julia> @btime M.f.($x2);
  24.999 ms (2000002 allocations: 38.15 MiB)

julia> @code_llvm M.f(3.2)
;  @ /tmp/pkgs/M/src/M.jl:3 within `f`
define double @julia_f_457(double %0) #0 {
top:
; ┌ @ operators.jl:378 within `>`
; │┌ @ float.jl:490 within `<` @ float.jl:444
    %1 = fcmp ule double %0, 0.000000e+00
; └└
  br i1 %1, label %L8, label %L6

L6:                                               ; preds = %top
  %2 = call double @j_cbrt_459(double %0) #0
  ret double %2

L8:                                               ; preds = %top
  ret double 0.000000e+00
}

Changing that first @code_llvm to @code_typed gives the good performance/codegen, regardless of whether one sets optimize=true or optimize=false.

@timholy
Copy link
Member

timholy commented Aug 21, 2021

Partial progress: the difference seems to come down to the fact that we get to

julia/src/codegen.cpp

Lines 3696 to 3699 in eb83c4d

if (!handled) {
Value *r = emit_jlcall(ctx, jlinvoke_func, boxed(ctx, lival), argv, nargs, JLCALL_F2_CC);
result = mark_julia_type(ctx, r, true, rt);
}
with handled = false when it's loaded from the cache, but with handled = true when we define the module in-session. This is because the CodeInstance isn't cached (it's not internal) and so
jl_value_t *ci = ctx.params->lookup(mi, ctx.world, ctx.world); // TODO: need to use the right pair world here
returns jl_nothing.

@timholy
Copy link
Member

timholy commented Aug 21, 2021

And more progress. With this diff that puts debugging statements in both codegen and type-inference:

diff --git a/base/compiler/typeinfer.jl b/base/compiler/typeinfer.jl
index a3895c380f..052c0a4baa 100644
--- a/base/compiler/typeinfer.jl
+++ b/base/compiler/typeinfer.jl
@@ -850,9 +850,12 @@ end
 # compute (and cache) an inferred AST and return type
 function typeinf_ext(interp::AbstractInterpreter, mi::MethodInstance)
     method = mi.def::Method
+    println(mi)
     for i = 1:2 # test-and-lock-and-test
         i == 2 && ccall(:jl_typeinf_begin, Cvoid, ())
         code = get(code_cache(interp), mi, nothing)
+        println(i)
+        println(code)
         if code isa CodeInstance
             # see if this code already exists in the cache
             inf = code.inferred
@@ -895,14 +898,22 @@ function typeinf_ext(interp::AbstractInterpreter, mi::MethodInstance)
         end
     end
     if ccall(:jl_get_module_infer, Cint, (Any,), method.module) == 0
+        println("retrieving ", mi)
         return retrieve_code_info(mi)
     end
+    println("inferring ", mi)
     lock_mi_inference(interp, mi)
     frame = InferenceState(InferenceResult(mi), #=cached=#true, interp)
     frame === nothing && return nothing
     typeinf(interp, frame)
     ccall(:jl_typeinf_end, Cvoid, ())
     frame.src.inferred || return nothing
+    print("for ", mi, ", the cache is ")
+    if isdefined(mi, :cache)
+        println(typeof(mi.cache.inferred))
+    else
+        println("not cached")
+    end
     return frame.src
 end
 
diff --git a/src/codegen.cpp b/src/codegen.cpp
index d05080ee42..022d1958bb 100644
--- a/src/codegen.cpp
+++ b/src/codegen.cpp
@@ -3622,6 +3622,7 @@ static jl_cgval_t emit_invoke(jl_codectx_t &ctx, jl_expr_t *ex, jl_value_t *rt)
 
     bool handled = false;
     jl_cgval_t result;
+    jl_(ex);
     if (lival.constant) {
         jl_method_instance_t *mi = (jl_method_instance_t*)lival.constant;
         assert(jl_is_method_instance(mi));
@@ -3643,6 +3644,7 @@ static jl_cgval_t emit_invoke(jl_codectx_t &ctx, jl_expr_t *ex, jl_value_t *rt)
         else {
             jl_value_t *ci = ctx.params->lookup(mi, ctx.world, ctx.world); // TODO: need to use the right pair world here
             jl_code_instance_t *codeinst = (jl_code_instance_t*)ci;
+            jl_printf(JL_STDOUT, "Is the CodeInstance lookup nothing? %d\n", ci == jl_nothing);
             if (ci != jl_nothing && codeinst->invoke != jl_fptr_sparam) { // check if we know we definitely can't handle this specptr
                 if (codeinst->invoke == jl_fptr_const_return) {
                     result = mark_julia_const(codeinst->rettype_const);

with the debugging to Core.Compiler not built (hit Ctrl-C after makeing the C code), doing the following:

using Issue35972, MethodAnalysis
cd("/home/tim/src/julia-master/base")
include("compiler/compiler.jl")
x = rand(1_000_000);
map(Issue35972.lg21, x);    # map is less compile-intensive than broadcast

yields the following final section:

...
inferring collect_similar(Array{Float64, 1}, Base.Generator{Array{Float64, 1}, typeof(Issue35972.lg21)}) from collect_similar(Any, Any)
for collect_similar(Array{Float64, 1}, Base.Generator{Array{Float64, 1}, typeof(Issue35972.lg21)}) from collect_similar(Any, Any), the cache is Array{UInt8, 1}
Expr(:invoke, _collect(Array{Float64, 1}, Base.Generator{Array{Float64, 1}, typeof(Issue35972.lg21)}, Base.EltypeUnknown, Base.HasShape{1}) from _collect(Any, Any, Base.EltypeUnknown, Union{Base.HasLength, Base.HasShape{N} where N}), Base._collect, Core.Argument(n=2), Core.Argument(n=3), quote Base.EltypeUnknown() end, quote Base.HasShape{1}() end)
Is the CodeInstance lookup nothing? 0
Expr(:invoke, evalpoly(Float64, Tuple{Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64}) from evalpoly(Any, Tuple), Base.Math.evalpoly, SSAValue(75), (0, 1, 0.5, 0.333333, 0.25, 0.2, 0.166667, 0.142857, 0.125, 0.111111, 0.1, 0.0909091, 0.0833333, 0.0769231, 0.0714286, 0.0666667, 0.0625, 0.0588235, 0.0555556, 0.0526316, 0.05, 0.047619))
Is the CodeInstance lookup nothing? 1
Expr(:invoke, evalpoly(Float64, Tuple{Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64}) from evalpoly(Any, Tuple), Base.Math.evalpoly, SSAValue(24), (0, 1, 0.5, 0.333333, 0.25, 0.2, 0.166667, 0.142857, 0.125, 0.111111, 0.1, 0.0909091, 0.0833333, 0.0769231, 0.0714286, 0.0666667, 0.0625, 0.0588235, 0.0555556, 0.0526316, 0.05, 0.047619))
Is the CodeInstance lookup nothing? 1
evalpoly(Float64, Tuple{Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64}) from evalpoly(Any, Tuple)
1
nothing
2
nothing
inferring evalpoly(Float64, Tuple{Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64}) from evalpoly(Any, Tuple)
for evalpoly(Float64, Tuple{Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64}) from evalpoly(Any, Tuple), the cache is Array{UInt8, 1}

but then next:

mi = methodinstance(evalpoly, (Float64, NTuple{22,Float64}))
julia> mi.cache
Core.CodeInstance(MethodInstance for evalpoly(::Float64, ::NTuple{22, Float64}), #undef, 0x000000000000360b, 0xffffffffffffffff, Float64, #undef, nothing, true, true, Ptr{Nothing} @0x00007f0400263df0, Ptr{Nothing} @0x00007f0400263d50)

The inferred field is not cached! And I'm a little surprised that codegen tries twice before executing inference, and then doesn't try again. Whereas if you define the module in-session, the final section looks like this:

inferring collect_similar(Array{Float64, 1}, Base.Generator{Array{Float64, 1}, typeof(Issue35972.lg21)}) from collect_similar(Any, Any)
for collect_similar(Array{Float64, 1}, Base.Generator{Array{Float64, 1}, typeof(Issue35972.lg21)}) from collect_similar(Any, Any), the cache is Array{UInt8, 1}
Expr(:invoke, _collect(Array{Float64, 1}, Base.Generator{Array{Float64, 1}, typeof(Issue35972.lg21)}, Base.EltypeUnknown, Base.HasShape{1}) from _collect(Any, Any, Base.EltypeUnknown, Union{Base.HasLength, Base.HasShape{N} where N}), Base._collect, Core.Argument(n=2), Core.Argument(n=3), quote Base.EltypeUnknown() end, quote Base.HasShape{1}() end)
Is the CodeInstance lookup nothing? 0
Expr(:invoke, evalpoly(Float64, Tuple{Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64}) from evalpoly(Any, Tuple), Base.Math.evalpoly, SSAValue(75), (0, 1, 0.5, 0.333333, 0.25, 0.2, 0.166667, 0.142857, 0.125, 0.111111, 0.1, 0.0909091, 0.0833333, 0.0769231, 0.0714286, 0.0666667, 0.0625, 0.0588235, 0.0555556, 0.0526316, 0.05, 0.047619))
Is the CodeInstance lookup nothing? 0
Expr(:invoke, evalpoly(Float64, Tuple{Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64, Float64}) from evalpoly(Any, Tuple), Base.Math.evalpoly, SSAValue(24), (0, 1, 0.5, 0.333333, 0.25, 0.2, 0.166667, 0.142857, 0.125, 0.111111, 0.1, 0.0909091, 0.0833333, 0.0769231, 0.0714286, 0.0666667, 0.0625, 0.0588235, 0.0555556, 0.0526316, 0.05, 0.047619))
Is the CodeInstance lookup nothing? 0

@timholy
Copy link
Member

timholy commented Aug 21, 2021

@snoopi_deep turns out to be pretty good for debugging. From a section when you define the module in the current session:

      ├─ InferenceTimingNode: 0.000380/0.005951 on iterate(::Base.Generator{Vector{Float64}, typeof(Main.Issue35972.lg21)}) with 6 direct children
      │  ├─ InferenceTimingNode: 0.000037/0.000037 on getproperty(::Base.Generator{Vector{Float64}, typeof(Main.Issue35972.lg21)}, ::Symbol) with 0 direct children
      │  ├─ InferenceTimingNode: 0.000024/0.000024 on getproperty(::Base.Generator{Vector{Float64}, typeof(Main.Issue35972.lg21)}, iter::Symbol) with 0 direct children
      │  ├─ InferenceTimingNode: 0.000024/0.000024 on getproperty(::Base.Generator{Vector{Float64}, typeof(Main.Issue35972.lg21)}, f::Symbol) with 0 direct children
      │  ├─ InferenceTimingNode: 0.000027/0.000027 on getindex(::Tuple{Float64, Int64}, 1::Int64) with 0 direct children
      │  ├─ InferenceTimingNode: 0.000135/0.005432 on Main.Issue35972.lg21(::Float64) with 1 direct children
      │  │  └─ InferenceTimingNode: 0.001037/0.005297 on Main.Issue35972.f21(::Float64) with 22 direct children
      │  │     ⋮
      │  │     
      │  └─ InferenceTimingNode: 0.000028/0.000028 on getindex(::Tuple{Float64, Int64}, 2::Int64) with 0 direct children

vs when it's loaded from the precompile cache:

│     ├─ InferenceTimingNode: 0.001050/0.001475 on iterate(::Base.Generator{Vector{Float64}, typeof(Issue35972.lg21)}) with 5 direct children
│     │  ├─ InferenceTimingNode: 0.000109/0.000109 on getproperty(::Base.Generator{Vector{Float64}, typeof(Issue35972.lg21)}, ::Symbol) with 0 direct children
│     │  ├─ InferenceTimingNode: 0.000075/0.000075 on getproperty(::Base.Generator{Vector{Float64}, typeof(Issue35972.lg21)}, iter::Symbol) with 0 direct children
│     │  ├─ InferenceTimingNode: 0.000080/0.000080 on getproperty(::Base.Generator{Vector{Float64}, typeof(Issue35972.lg21)}, f::Symbol) with 0 direct children
│     │  ├─ InferenceTimingNode: 0.000083/0.000083 on getindex(::Tuple{Float64, Int64}, 1::Int64) with 0 direct children
│     │  └─ InferenceTimingNode: 0.000079/0.000079 on getindex(::Tuple{Float64, Int64}, 2::Int64) with 0 direct children

Guess: since it already has the MethodInstance it's (inappropriately) not inferring lg21.

@timholy
Copy link
Member

timholy commented Aug 22, 2021

(A briefer version of this was also posted to slack internals)

To help decide how to fix this (either (1) run inference before codegen, or (2) cache the needed Base CodeInstances), I tallied the number of CodeInstances in each internal category using this diff:

diff --git a/src/dump.c b/src/dump.c
index 8c9d68f518..ce42448159 100644
--- a/src/dump.c
+++ b/src/dump.c
@@ -505,6 +505,8 @@ enum METHOD_SERIALIZATION_MODE {
     METHOD_EXTERNAL_MT = 2,
 };
 
+static int codeinst_count[3];  // number of external, thunk, and workslist code instances, respectively
+
 static void jl_serialize_value_(jl_serializer_state *s, jl_value_t *v, int as_literal) JL_GC_DISABLED
 {
     if (jl_serialize_generic(s, v)) {
@@ -691,6 +693,7 @@ static void jl_serialize_value_(jl_serializer_state *s, jl_value_t *v, int as_li
             internal = 1;
         else if (module_in_worklist(mi->def.method->module))
             internal = 2;
+        codeinst_count[internal] += 1;
         write_uint8(s->s, internal);
         if (!internal) {
             // also flag this in the backref table as special
@@ -2240,6 +2243,7 @@ JL_DLLEXPORT int jl_save_incremental(const char *fname, jl_array_t *worklist)
         return 1;
     }
     JL_GC_PUSH2(&mod_array, &udeps);
+    codeinst_count[0] = codeinst_count[1] = codeinst_count[2] = 0;
     mod_array = jl_get_loaded_modules();
 
     serializer_worklist = worklist;
@@ -2349,6 +2353,7 @@ JL_DLLEXPORT int jl_save_incremental(const char *fname, jl_array_t *worklist)
     }
     write_int32(&f, 0); // mark the end of the source text
     ios_close(&f);
+    jl_printf(JL_STDOUT, "external: %d; module-thunk: %d; worklist: %d\n", codeinst_count[0], codeinst_count[1], codeinst_count[2]);
     JL_GC_POP();
 
     return 0;

Results: https://gist.github.com/timholy/502a5a545a69f492b6180050a0b745e6 (the packages were chosen as ones that now have substantial precompilation)

@kimikage
Copy link
Contributor

I don't fully understand the precompilation, but empirically it seems to me that the inference is context dependent. If external methods are cached by individual packages, will the behavior change depending on the order of loading?

I agree with the idea of caching more CodeInstances to improve the latency problem. However, I think that some kind of restriction, such as "strict" type stability, will be required.

So, IMO, it would be better to achieve (1) first.

@timholy
Copy link
Member

timholy commented Aug 26, 2021

xref #42016

quinnj added a commit to JuliaData/CSV.jl that referenced this issue Sep 8, 2021
This was from an effort to play around with precompilation some more.
There's a tricky balance where if we precompile too much of `CSV.File`,
then we run into the Base bug where there are Base methods missing and
cause painful allocations via invoke
(JuliaLang/julia#35972). The state of
precompilation for CSV.jl is thus pretty disappointing: precompiling
currently takes ~7.5s (which is fine, this is the number we're fine to
have keep growing), loading CSV.jl precompiled takes ~2.5s (seems a
little high, but not terrible), but the first call to `CSV.File` takes
~6.5s :sad_face:. At this point, I've spent too much time and don't
understand the ins and outs of precompilation interactions with later
runtime code to debug further, but hopefully we can recruit help from
some experts to take CSV.jl precompilation to the next level. As-is,
it's workable, so let's go with this for now and get this darn release
out.
quinnj added a commit to JuliaData/CSV.jl that referenced this issue Sep 8, 2021
This was from an effort to play around with precompilation some more.
There's a tricky balance where if we precompile too much of `CSV.File`,
then we run into the Base bug where there are Base methods missing and
cause painful allocations via invoke
(JuliaLang/julia#35972). The state of
precompilation for CSV.jl is thus pretty disappointing: precompiling
currently takes ~7.5s (which is fine, this is the number we're fine to
have keep growing), loading CSV.jl precompiled takes ~2.5s (seems a
little high, but not terrible), but the first call to `CSV.File` takes
~6.5s :sad_face:. At this point, I've spent too much time and don't
understand the ins and outs of precompilation interactions with later
runtime code to debug further, but hopefully we can recruit help from
some experts to take CSV.jl precompilation to the next level. As-is,
it's workable, so let's go with this for now and get this darn release
out.
timholy added a commit to JuliaImages/ImageFiltering.jl that referenced this issue Feb 8, 2022
Putting the string interpolation behind a `@noinline` is the key change,
but it doesn't hurt to hoise the range creation.

Of the few remaining allocations, about half will be fixed by
JuliaLang/julia#43990, as they are another
instance of JuliaLang/julia#35972
timholy added a commit to JuliaImages/ImageFiltering.jl that referenced this issue Feb 9, 2022
Putting the string interpolation behind a `@noinline` is the key change,
but it doesn't hurt to hoist the range creation.

Of the few remaining allocations, about half will be fixed by
JuliaLang/julia#43990, as they are another
instance of JuliaLang/julia#35972
timholy added a commit that referenced this issue Feb 17, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations needed from other code (Base, other packages)
were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred. This makes
precompilation more intuitive (now it saves all relevant inference
results), and substantially reduces latency for inference-bound
packages.

Closes #42016
Fixes #35972

Issue #35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.
N5N3 added a commit to N5N3/julia that referenced this issue Feb 20, 2022
commit 88a1cbf
Author: Tim Holy <tim.holy@gmail.com>
Date:   Sat Feb 19 03:14:45 2022 -0600

    Copy over CodeInstances too

commit 48c91b3
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 14:22:02 2022 -0600

    Exclude MethodInstances that don't link to worklist module

    This should prevent us from serializing too much code.

commit 3241c4c
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 12:23:12 2022 -0600

    Add invalidation test

commit ead1fd9
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 10:23:52 2022 -0600

    Fix a failure to invalidate

commit b44a8fc
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Jan 27 02:54:47 2022 -0600

    Serialize external CodeInstances

    Prior to this PR, Julia's precompiled `*.ji` files saved just two
    categories of code: unspecialized method definitions and
    type-specialized code for the methods defined by the package.  Any
    novel specializations needed from other code (Base, other packages)
    were not saved, and therefore effectively thrown away.

    This PR caches all the code---internal or external---called during
    package definition that hadn't been previously inferred. This makes
    precompilation more intuitive (now it saves all relevant inference
    results), and substantially reduces latency for inference-bound
    packages.

    Closes JuliaLang#42016
    Fixes JuliaLang#35972

    Issue JuliaLang#35972 arose because codegen got started without re-inferring
    some discarded CodeInstances. This forced the compiler to insert a
    `jl_invoke`. This PR fixes the issue because needed CodeInstances are
    no longer discarded by precompilation.
N5N3 added a commit to N5N3/julia that referenced this issue Feb 22, 2022
commit 88a1cbf
Author: Tim Holy <tim.holy@gmail.com>
Date:   Sat Feb 19 03:14:45 2022 -0600

    Copy over CodeInstances too

commit 48c91b3
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 14:22:02 2022 -0600

    Exclude MethodInstances that don't link to worklist module

    This should prevent us from serializing too much code.

commit 3241c4c
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 12:23:12 2022 -0600

    Add invalidation test

commit ead1fd9
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 10:23:52 2022 -0600

    Fix a failure to invalidate

commit b44a8fc
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Jan 27 02:54:47 2022 -0600

    Serialize external CodeInstances

    Prior to this PR, Julia's precompiled `*.ji` files saved just two
    categories of code: unspecialized method definitions and
    type-specialized code for the methods defined by the package.  Any
    novel specializations needed from other code (Base, other packages)
    were not saved, and therefore effectively thrown away.

    This PR caches all the code---internal or external---called during
    package definition that hadn't been previously inferred. This makes
    precompilation more intuitive (now it saves all relevant inference
    results), and substantially reduces latency for inference-bound
    packages.

    Closes JuliaLang#42016
    Fixes JuliaLang#35972

    Issue JuliaLang#35972 arose because codegen got started without re-inferring
    some discarded CodeInstances. This forced the compiler to insert a
    `jl_invoke`. This PR fixes the issue because needed CodeInstances are
    no longer discarded by precompilation.
N5N3 added a commit to N5N3/julia that referenced this issue Feb 22, 2022
commit 88a1cbf
Author: Tim Holy <tim.holy@gmail.com>
Date:   Sat Feb 19 03:14:45 2022 -0600

    Copy over CodeInstances too

commit 48c91b3
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 14:22:02 2022 -0600

    Exclude MethodInstances that don't link to worklist module

    This should prevent us from serializing too much code.

commit 3241c4c
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 12:23:12 2022 -0600

    Add invalidation test

commit ead1fd9
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 10:23:52 2022 -0600

    Fix a failure to invalidate

commit b44a8fc
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Jan 27 02:54:47 2022 -0600

    Serialize external CodeInstances

    Prior to this PR, Julia's precompiled `*.ji` files saved just two
    categories of code: unspecialized method definitions and
    type-specialized code for the methods defined by the package.  Any
    novel specializations needed from other code (Base, other packages)
    were not saved, and therefore effectively thrown away.

    This PR caches all the code---internal or external---called during
    package definition that hadn't been previously inferred. This makes
    precompilation more intuitive (now it saves all relevant inference
    results), and substantially reduces latency for inference-bound
    packages.

    Closes JuliaLang#42016
    Fixes JuliaLang#35972

    Issue JuliaLang#35972 arose because codegen got started without re-inferring
    some discarded CodeInstances. This forced the compiler to insert a
    `jl_invoke`. This PR fixes the issue because needed CodeInstances are
    no longer discarded by precompilation.
N5N3 added a commit to N5N3/julia that referenced this issue Feb 23, 2022
commit 88a1cbf
Author: Tim Holy <tim.holy@gmail.com>
Date:   Sat Feb 19 03:14:45 2022 -0600

    Copy over CodeInstances too

commit 48c91b3
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 14:22:02 2022 -0600

    Exclude MethodInstances that don't link to worklist module

    This should prevent us from serializing too much code.

commit 3241c4c
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 12:23:12 2022 -0600

    Add invalidation test

commit ead1fd9
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 10:23:52 2022 -0600

    Fix a failure to invalidate

commit b44a8fc
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Jan 27 02:54:47 2022 -0600

    Serialize external CodeInstances

    Prior to this PR, Julia's precompiled `*.ji` files saved just two
    categories of code: unspecialized method definitions and
    type-specialized code for the methods defined by the package.  Any
    novel specializations needed from other code (Base, other packages)
    were not saved, and therefore effectively thrown away.

    This PR caches all the code---internal or external---called during
    package definition that hadn't been previously inferred. This makes
    precompilation more intuitive (now it saves all relevant inference
    results), and substantially reduces latency for inference-bound
    packages.

    Closes JuliaLang#42016
    Fixes JuliaLang#35972

    Issue JuliaLang#35972 arose because codegen got started without re-inferring
    some discarded CodeInstances. This forced the compiler to insert a
    `jl_invoke`. This PR fixes the issue because needed CodeInstances are
    no longer discarded by precompilation.
N5N3 added a commit to N5N3/julia that referenced this issue Feb 23, 2022
commit 88a1cbf
Author: Tim Holy <tim.holy@gmail.com>
Date:   Sat Feb 19 03:14:45 2022 -0600

    Copy over CodeInstances too

commit 48c91b3
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 14:22:02 2022 -0600

    Exclude MethodInstances that don't link to worklist module

    This should prevent us from serializing too much code.

commit 3241c4c
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 12:23:12 2022 -0600

    Add invalidation test

commit ead1fd9
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 10:23:52 2022 -0600

    Fix a failure to invalidate

commit b44a8fc
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Jan 27 02:54:47 2022 -0600

    Serialize external CodeInstances

    Prior to this PR, Julia's precompiled `*.ji` files saved just two
    categories of code: unspecialized method definitions and
    type-specialized code for the methods defined by the package.  Any
    novel specializations needed from other code (Base, other packages)
    were not saved, and therefore effectively thrown away.

    This PR caches all the code---internal or external---called during
    package definition that hadn't been previously inferred. This makes
    precompilation more intuitive (now it saves all relevant inference
    results), and substantially reduces latency for inference-bound
    packages.

    Closes JuliaLang#42016
    Fixes JuliaLang#35972

    Issue JuliaLang#35972 arose because codegen got started without re-inferring
    some discarded CodeInstances. This forced the compiler to insert a
    `jl_invoke`. This PR fixes the issue because needed CodeInstances are
    no longer discarded by precompilation.
N5N3 added a commit to N5N3/julia that referenced this issue Feb 23, 2022
commit 88a1cbf
Author: Tim Holy <tim.holy@gmail.com>
Date:   Sat Feb 19 03:14:45 2022 -0600

    Copy over CodeInstances too

commit 48c91b3
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 14:22:02 2022 -0600

    Exclude MethodInstances that don't link to worklist module

    This should prevent us from serializing too much code.

commit 3241c4c
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 12:23:12 2022 -0600

    Add invalidation test

commit ead1fd9
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 10:23:52 2022 -0600

    Fix a failure to invalidate

commit b44a8fc
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Jan 27 02:54:47 2022 -0600

    Serialize external CodeInstances

    Prior to this PR, Julia's precompiled `*.ji` files saved just two
    categories of code: unspecialized method definitions and
    type-specialized code for the methods defined by the package.  Any
    novel specializations needed from other code (Base, other packages)
    were not saved, and therefore effectively thrown away.

    This PR caches all the code---internal or external---called during
    package definition that hadn't been previously inferred. This makes
    precompilation more intuitive (now it saves all relevant inference
    results), and substantially reduces latency for inference-bound
    packages.

    Closes JuliaLang#42016
    Fixes JuliaLang#35972

    Issue JuliaLang#35972 arose because codegen got started without re-inferring
    some discarded CodeInstances. This forced the compiler to insert a
    `jl_invoke`. This PR fixes the issue because needed CodeInstances are
    no longer discarded by precompilation.
N5N3 added a commit to N5N3/julia that referenced this issue Feb 24, 2022
commit 88a1cbf
Author: Tim Holy <tim.holy@gmail.com>
Date:   Sat Feb 19 03:14:45 2022 -0600

    Copy over CodeInstances too

commit 48c91b3
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 14:22:02 2022 -0600

    Exclude MethodInstances that don't link to worklist module

    This should prevent us from serializing too much code.

commit 3241c4c
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 12:23:12 2022 -0600

    Add invalidation test

commit ead1fd9
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Feb 17 10:23:52 2022 -0600

    Fix a failure to invalidate

commit b44a8fc
Author: Tim Holy <tim.holy@gmail.com>
Date:   Thu Jan 27 02:54:47 2022 -0600

    Serialize external CodeInstances

    Prior to this PR, Julia's precompiled `*.ji` files saved just two
    categories of code: unspecialized method definitions and
    type-specialized code for the methods defined by the package.  Any
    novel specializations needed from other code (Base, other packages)
    were not saved, and therefore effectively thrown away.

    This PR caches all the code---internal or external---called during
    package definition that hadn't been previously inferred. This makes
    precompilation more intuitive (now it saves all relevant inference
    results), and substantially reduces latency for inference-bound
    packages.

    Closes JuliaLang#42016
    Fixes JuliaLang#35972

    Issue JuliaLang#35972 arose because codegen got started without re-inferring
    some discarded CodeInstances. This forced the compiler to insert a
    `jl_invoke`. This PR fixes the issue because needed CodeInstances are
    no longer discarded by precompilation.
timholy added a commit that referenced this issue Feb 24, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations of methods from Base or previously-loaded
packages were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred, as long
as there is a backedge linking it back to a method owned by
a module being precompiled. (The latter condition ensures it will
actually be called by package methods, and not merely transiently
generated for the purpose of, e.g., metaprogramming or variable
initialization.) This makes precompilation more intuitive (now it
saves all relevant inference results), and substantially reduces
latency for inference-bound packages.

Closes #42016
Fixes #35972

Issue #35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.
KristofferC pushed a commit that referenced this issue Feb 24, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations of methods from Base or previously-loaded
packages were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred, as long
as there is a backedge linking it back to a method owned by
a module being precompiled. (The latter condition ensures it will
actually be called by package methods, and not merely transiently
generated for the purpose of, e.g., metaprogramming or variable
initialization.) This makes precompilation more intuitive (now it
saves all relevant inference results), and substantially reduces
latency for inference-bound packages.

Closes #42016
Fixes #35972

Issue #35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.

(cherry picked from commit df81bf9)
staticfloat pushed a commit to JuliaCI/julia-buildkite-testing that referenced this issue Mar 2, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations of methods from Base or previously-loaded
packages were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred, as long
as there is a backedge linking it back to a method owned by
a module being precompiled. (The latter condition ensures it will
actually be called by package methods, and not merely transiently
generated for the purpose of, e.g., metaprogramming or variable
initialization.) This makes precompilation more intuitive (now it
saves all relevant inference results), and substantially reduces
latency for inference-bound packages.

Closes JuliaLang#42016
Fixes JuliaLang#35972

Issue JuliaLang#35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.
KristofferC pushed a commit that referenced this issue Mar 7, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations of methods from Base or previously-loaded
packages were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred, as long
as there is a backedge linking it back to a method owned by
a module being precompiled. (The latter condition ensures it will
actually be called by package methods, and not merely transiently
generated for the purpose of, e.g., metaprogramming or variable
initialization.) This makes precompilation more intuitive (now it
saves all relevant inference results), and substantially reduces
latency for inference-bound packages.

Closes #42016
Fixes #35972

Issue #35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.

(cherry picked from commit df81bf9)
LilithHafner pushed a commit to LilithHafner/julia that referenced this issue Mar 8, 2022
Prior to this PR, Julia's precompiled `*.ji` files saved just two
categories of code: unspecialized method definitions and
type-specialized code for the methods defined by the package.  Any
novel specializations of methods from Base or previously-loaded
packages were not saved, and therefore effectively thrown away.

This PR caches all the code---internal or external---called during
package definition that hadn't been previously inferred, as long
as there is a backedge linking it back to a method owned by
a module being precompiled. (The latter condition ensures it will
actually be called by package methods, and not merely transiently
generated for the purpose of, e.g., metaprogramming or variable
initialization.) This makes precompilation more intuitive (now it
saves all relevant inference results), and substantially reduces
latency for inference-bound packages.

Closes JuliaLang#42016
Fixes JuliaLang#35972

Issue JuliaLang#35972 arose because codegen got started without re-inferring
some discarded CodeInstances. This forced the compiler to insert a
`jl_invoke`. This PR fixes the issue because needed CodeInstances are
no longer discarded by precompilation.
@amilsted
Copy link
Contributor

amilsted commented Jun 15, 2024

This still seems to happen with FunctionWrappers.

Here's my test package:

module FWAllocs

using PrecompileTools
using FunctionWrappers

function test_precomp(N)
    fw = FunctionWrappers.FunctionWrapper{Float64, Tuple{Float64}}(cos)
    res = 0.0
    for i in 1:N
        res += fw(i*0.1)
    end
    return res
end

function test_no_precomp(N)
    fw = FunctionWrappers.FunctionWrapper{Float64, Tuple{Float64}}(cos)
    res = 0.0
    for i in 1:N
        res += fw(i*0.1)
    end
    return res
end

@setup_workload begin
    @compile_workload begin
        test_precomp(10)
    end
end

end

and here's what happens on Julia 1.10.4:

julia> using FWAllocs

julia> @time FWAllocs.test_precomp(10)
  0.053401 seconds (96.14 k allocations: 5.794 MiB, 99.86% compilation time)
8.177847573818267

julia> @time FWAllocs.test_precomp(10)
  0.000004 seconds (22 allocations: 376 bytes)
8.177847573818267

julia> @time FWAllocs.test_precomp(20)
  0.000011 seconds (42 allocations: 696 bytes)
8.377322108212503

julia> @time FWAllocs.test_no_precomp(20)
  0.000007 seconds (2 allocations: 56 bytes)
8.377322108212503

julia> 

Also, is it odd that test_precomp needs to compile on first use, but test_no_precomp does not?

@kimikage
Copy link
Contributor

I confirmed that the FanctionWrappers issue was reproduced on nightly too.

julia> versioninfo()
Julia Version 1.12.0-DEV.738
Commit 67c9989e83 (2024-06-15 01:40 UTC)
Build Info:
  Official https://julialang.org/ release
Platform Info:
  OS: Windows (x86_64-w64-mingw32)
  CPU: 8 × 11th Gen Intel(R) Core(TM) i7-1165G7 @ 2.80GHz
  WORD_SIZE: 64
  LLVM: libLLVM-17.0.6 (ORCJIT, tigerlake)
Threads: 1 default, 0 interactive, 1 GC (on 8 virtual cores)

Since the original problem with this issue seems to have been resolved, I think there is something else causing it.
This issue is somewhat old, so it might be better to file it as a new issue.

@kimikage
Copy link
Contributor

Of course, since FunctionWrappers deals with pointers, the precompiled cache should not be used as is. So, it is reasonable in itself that the recompilation takes place on the first use of test_precomp.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:precompilation Precompilation of modules performance Must go faster
Projects
None yet
Development

Successfully merging a pull request may close this issue.

5 participants