Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

introduce @nospecializeinfer macro to tell the compiler to avoid excess inference #41931

Merged
merged 2 commits into from
May 23, 2023

Conversation

aviatesk
Copy link
Sponsor Member

@aviatesk aviatesk commented Aug 19, 2021

This commit introduces a new compiler annotation called @nospecializeinfer,
which allows us to request the compiler to avoid excessive inference.

TL;DR;

3efba74 adds @nospecializeinfer macro on various Core.Compiler functions
and achieves the following sysimage size reduction:

this commit master %
Core.Compiler compilation (sec) 66.4551 71.0846 0.935
corecompiler.jl (KB) 17638080 18407248 0.958
sys.jl (KB) 88736432 89361280 0.993
sys-o.a (KB) 189484400 189907096 0.998

@nospecialize mechanism

T discuss @nospecializeinfer, let's first understand the behavior of
@nospecialize.

Its docstring says that

This is only a hint for the compiler to avoid excess code generation.

, and it works by suppressing dispatches with complex runtime
occurrences of the annotated arguments. This could be understood with
the example below:

julia> function call_func_itr(func, itr)
           local r = 0
           r += func(itr[1])
           r += func(itr[2])
           r += func(itr[3])
           r
       end;

julia> _isa = isa; # just for the sake of explanation, global variable to prevent inlining

julia> func_specialize(a) = _isa(a, Function);

julia> func_nospecialize(@nospecialize a) = _isa(a, Function);

julia> dispatchonly = Any[sin, muladd, nothing]; # untyped container can cause excessive runtime dispatch

julia> @code_typed call_func_itr(func_specialize, dispatchonly)
CodeInfo(
1%1  = π (0, Int64)
│   %2  = Base.arrayref(true, itr, 1)::Any%3  = (func)(%2)::Any%4  = (%1 + %3)::Any%5  = Base.arrayref(true, itr, 2)::Any%6  = (func)(%5)::Any%7  = (%4 + %6)::Any%8  = Base.arrayref(true, itr, 3)::Any%9  = (func)(%8)::Any%10 = (%7 + %9)::Any
└──       return %10
) => Any

julia> @code_typed call_func_itr(func_nospecialize, dispatchonly)
CodeInfo(
1%1  = π (0, Int64)
│   %2  = Base.arrayref(true, itr, 1)::Any%3  = invoke func(%2::Any)::Any%4  = (%1 + %3)::Any%5  = Base.arrayref(true, itr, 2)::Any%6  = invoke func(%5::Any)::Any%7  = (%4 + %6)::Any%8  = Base.arrayref(true, itr, 3)::Any%9  = invoke func(%8::Any)::Any%10 = (%7 + %9)::Any
└──       return %10
) => Any

The calls of func_specialize remain to be :call expression (so that
they are dispatched and compiled at runtime) while the calls of
func_nospecialize are resolved as :invoke expressions. This is
because @nospecialize requests the compiler to give up compiling
func_nospecialize with runtime argument types but with the declared
argument types, allowing call_func_itr(func_nospecialize, dispatchonly)
to avoid runtime dispatches and accompanying JIT compilations
(i.e. "excess code generation").

The difference is evident when checking specializations:

julia> call_func_itr(func_specialize, dispatchonly)
2

julia> length(Base.specializations(only(methods(func_specialize))))
3 # w/ runtime dispatch, multiple specializations

julia> call_func_itr(func_nospecialize, dispatchonly)
2

julia> length(Base.specializations(only(methods(func_nospecialize))))
1 # w/o runtime dispatch, the single specialization

The problem here is that it influences dispatch only, and does not
intervene into inference in anyway. So there is still a possibility of
"excess inference" when the compiler sees a considerable complexity of
argument types during inference:

julia> func_specialize(a) = _isa(a, Function); # redefine func to clear the specializations

julia> @assert length(Base.specializations(only(methods(func_specialize)))) == 0;

julia> func_nospecialize(@nospecialize a) = _isa(a, Function); # redefine func to clear the specializations

julia> @assert length(Base.specializations(only(methods(func_nospecialize)))) == 0;

julia> withinfernce = tuple(sin, muladd, "foo"); # typed container can cause excessive inference

julia> @time @code_typed call_func_itr(func_specialize, withinfernce);
  0.000812 seconds (3.77 k allocations: 217.938 KiB, 94.34% compilation time)

julia> length(Base.specializations(only(methods(func_specialize))))
4 # multiple method instances inferred

julia> @time @code_typed call_func_itr(func_nospecialize, withinfernce);
  0.000753 seconds (3.77 k allocations: 218.047 KiB, 92.42% compilation time)

julia> length(Base.specializations(only(methods(func_nospecialize))))
4 # multiple method instances inferred

The purpose of this PR is to implement a mechanism that allows us to
avoid excessive inference to reduce the compilation latency when
inference sees a considerable complexity of argument types.

Design

Here are some ideas to implement the functionality:

  1. make @nospecialize block inference
  2. add nospecializeinfer effect when @nospecialized method is annotated as @noinline
  3. implement as @pure-like boolean annotation to request nospecializeinfer effect on top of @nospecialize
  4. implement as annotation that is orthogonal to @nospecialize

After trying 1 ~ 3., I decided to submit 3.

1. make @nospecialize block inference

This is almost same as what Jameson has done at vtjnash@8ab7b6b.
It turned out that this approach performs very badly because some of
@nospecialize'd arguments still need inference to perform reasonably.
For example, it's obvious that the following definition of
getindex(@nospecialize(t::Tuple), i::Int) would perform very badly if
@nospecialize blocks inference, because of a lack of useful type
information for succeeding optimizations:

julia/base/tuple.jl

Lines 29 to 30 in 12d364e

@eval getindex(@nospecialize(t::Tuple), i::Int) = getfield(t, i, $(Expr(:boundscheck)))
@eval getindex(@nospecialize(t::Tuple), i::Real) = getfield(t, convert(Int, i), $(Expr(:boundscheck)))

2. add nospecializeinfer effect when @nospecialized method is annotated as @noinline

The important observation is that we often use @nospecialize even when
we expect inference to forward type and constant information.
Adversely, we may be able to exploit the fact that we usually don't
expect inference to forward information to a callee when we annotate it
with @noinline (i.e. when adding @noinline, we're usually fine with
disabling inter-procedural optimizations other than resolving dispatch).
So the idea is to enable the inference suppression when @nospecialize'd
method is annotated as @noinline too.

It's a reasonable choice and can be efficiently implemented with #41922.
But it sounds a bit weird to me to associate no infer effect with
@noinline, and I also think there may be some cases we want to inline
a method while partly avoiding inference, e.g.:

# the compiler will always infer with `f::Any`
@noinline function twof(@nospecialize(f), n) # this method body is very simple and should be eligible for inlining
    if occursin('+', string(typeof(f).name.name::Symbol))
        2 + n
    elseif occursin('*', string(typeof(f).name.name::Symbol))
        2n
    else
        zero(n)
    end
end

3. implement as @pure-like boolean annotation to request nospecializeinfer effect on top of @nospecialize

This is what this commit implements. It basically replaces the previous
@noinline flag with a newly-introduced annotation named @nospecializeinfer.
It is still associated with @nospecialize and it only has effect when
used together with @nospecialize, but now it is not associated to
@noinline, and it would help us reason about the behavior of @nospecializeinfer
and experiment its effect more safely:

# the compiler will always infer with `f::Any`
Base.@nospecializeinfer function twof(@nospecialize(f), n) # the compiler may or not inline this method
    if occursin('+', string(typeof(f).name.name::Symbol))
        2 + n
    elseif occursin('*', string(typeof(f).name.name::Symbol))
        2n
    else
        zero(n)
    end
end

4. implement as annotation that is orthogonal to @nospecialize

Actually, we can have @nospecialize and @nospecializeinfer separately, and it
would allow us to configure compilation strategies in a more
fine-grained way.

function noinfspec(Base.@nospecializeinfer(f), @nospecialize(g))
    ...
end

I'm fine with this approach but at the same time I'm afraid to have too
many annotations that are related to some sort (I expect we will
annotate both @nospecializeinfer and @nospecialize in this scheme).


Friendly pings:

  • @timholy do you have examples that might benefit from @nospecializeinfer and could be used for experiments ? I'm happy to experiment any of what you have in your mind :)
  • @vchuravy @maleadt I remember you said you want something like this in GPU stack ?

GitHub managements

@aviatesk aviatesk added compiler:inference Type inference compiler:latency Compiler latency labels Aug 19, 2021
@maleadt
Copy link
Member

maleadt commented Aug 19, 2021

@vchuravy @maleadt I remember you said you want something like this in GPU stack ?

The use case was JuliaGPU/GPUCompiler.jl#227, where I tried to avoid re-compilation of the GPU compiler when loading CUDA.jl (despite there being no invalidations). For example, methods like emit_llvm always get recompiled when invoked with a concrete ::CompilerJob{CUDATarget}, and we figured that @nospecialize wasn't powerful enough. I haven't had the time to revisit it though.

@Sacha0
Copy link
Member

Sacha0 commented Aug 19, 2021

Various folks at RelationalAI have been dreaming of this or similar functionality for a while. Thanks for working on it Shuhei! :)

base/expr.jl Outdated Show resolved Hide resolved
@timholy
Copy link
Sponsor Member

timholy commented Aug 20, 2021

You probably know this already, but it's worth mentioning that you can get your 4th strategy now by using Base.inferencebarrier in the caller:

julia> function invokef(f, itr)
           local r = 0
           r += f(itr[1])
           r += f(itr[2])
           r += f(itr[3])
           r
       end;

julia> _isa = isa
isa (built-in function)

julia> f(a) = _isa(a, Function);

julia> g(@nospecialize a) = _isa(a, Function);

julia> gbarrier(@nospecialize a) = g(Base.inferencebarrier(a))
gbarrier (generic function with 1 method)

julia> using MethodAnalysis

julia> dispatchonly = Any[sin, muladd, nothing];

julia> invokef(gbarrier, dispatchonly)
2

julia> methodinstances(g)
1-element Vector{Core.MethodInstance}:
 MethodInstance for g(::Any)

julia> methodinstances(gbarrier)
1-element Vector{Core.MethodInstance}:
 MethodInstance for gbarrier(::Any)

julia> withinfernce = tuple(sin, muladd, "foo");

julia> invokef(gbarrier, withinfernce)
2

julia> methodinstances(g)
1-element Vector{Core.MethodInstance}:
 MethodInstance for g(::Any)

julia> methodinstances(gbarrier)
4-element Vector{Core.MethodInstance}:
 MethodInstance for gbarrier(::Any)
 MethodInstance for gbarrier(::typeof(sin))
 MethodInstance for gbarrier(::typeof(muladd))
 MethodInstance for gbarrier(::String)

But I think it's quite likely that we'd also want to be able to (easily) enforce this in the callee too. I.e., both are valuable, much as you have done recently for inline annotations.

@aviatesk
Copy link
Sponsor Member Author

But I think it's quite likely that we'd also want to be able to (easily) enforce this in the callee too. I.e., both are valuable, much as you have done recently for inline annotations.

yeah, after #41328 it would be much easier to add nicer support of callsite @noinfer like:

let 
    t = maybe_complex_type
    r = @noinfer nospecf(t)
    return r
end

(same for @aggressive_constantprop)

We can reuse Base.inferencebarrier for that case, but one benefit of @noinfer over Base.inferencebarrier is it can still respect declared types while Base.inferencebarrier will always ascend it to Any (we can type-annotate it though).

@timholy
Copy link
Sponsor Member

timholy commented Aug 23, 2021

It also just seems to make more sense to handle this by compiler annotations than by runtime manipulations that (presumably, I've not checked) come with a small performance cost. That would be fixable, of course, and the main reason to use Base.inferencebarrier rather than Ref{Any} directly (since the implementation of the former could change).

Copy link
Sponsor Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, though others (including yourself!) may have more insight than I about whether you've changed all the necessary places.

In a couple of my suggestions I used a closing triple-backticks-julia because otherwise GitHub got confused about the of a triple-backticks-suggest block.

base/essentials.jl Outdated Show resolved Hide resolved
base/essentials.jl Outdated Show resolved Hide resolved
base/essentials.jl Outdated Show resolved Hide resolved
base/essentials.jl Outdated Show resolved Hide resolved
base/expr.jl Outdated Show resolved Hide resolved
test/compiler/inference.jl Outdated Show resolved Hide resolved
@aviatesk
Copy link
Sponsor Member Author

Thanks @timholy for your suggestions !


I've tried some experiments to find a good rule to automatically turn on this "no-inference" logic, but so far I couldn't succeed.
So far the following rule performs reasonable than other rules I tried, but it seems that it still has some performance regressions and it breaks inference idempotency very easily..

diff --git a/base/compiler/abstractinterpretation.jl b/base/compiler/abstractinterpretation.jl
index 2045b14d97..b6a091345d 100644
--- a/base/compiler/abstractinterpretation.jl
+++ b/base/compiler/abstractinterpretation.jl
@@ -350,7 +350,7 @@ function abstract_call_method(interp::AbstractInterpreter, method::Method, @nosp
         add_remark!(interp, sv, "Refusing to infer into `depwarn`")
         return MethodCallResult(Any, false, false, nothing)
     end
-    if is_noinfer(method)
+    if should_not_infer(code_cache(interp), inlining_policy(interp), method, sig, sparams)
         sig = get_nospecialize_sig(method, sig, sparams)
     end
     topmost = nothing
@@ -587,7 +587,8 @@ function maybe_get_const_prop_profitable(interp::AbstractInterpreter, result::Me
         end
     end
     force |= allconst
-    if is_noinfer(method)
+    if should_not_infer(code_cache(interp), inlining_policy(interp), method, match.spec_types, match.sparams)
+        return nothing
         mi = specialize_method_noinfer(match; preexisting=!force)
     else
         mi = specialize_method(match; preexisting=!force)
diff --git a/base/compiler/ssair/inlining.jl b/base/compiler/ssair/inlining.jl
index cf36358791..24d0d90cd4 100644
--- a/base/compiler/ssair/inlining.jl
+++ b/base/compiler/ssair/inlining.jl
@@ -808,7 +808,8 @@ function analyze_method!(match::MethodMatch, atypes::Vector{Any},
     end
 
     # See if there exists a specialization for this method signature
-    if is_noinfer(method)
+    # if is_noinfer(method)
+    if should_not_infer(state.mi_cache, state.policy, method, match.spec_types, match.sparams)
         mi = specialize_method_noinfer(match; preexisting=true)
     else
         mi = specialize_method(match; preexisting=true)
diff --git a/base/compiler/utilities.jl b/base/compiler/utilities.jl
index 8eb9e26d10..6eb03df533 100644
--- a/base/compiler/utilities.jl
+++ b/base/compiler/utilities.jl
@@ -98,6 +98,37 @@ end
 
 is_nospecialized(method::Method) = method.nospecialize ≠ 0
 
+# check if this `method` is `@nospecialize`d,
+# and if so further check if inference would be profitable for inlining
+# by peeking at the previously inferred method bodies
+# XXX this check is really not idempotent ...
+function should_not_infer(
+    cache::WorldView, inlining_policy::F,
+    method::Method, @nospecialize(sig), sparams::SimpleVector) where F
+    if is_nospecialized(method)
+        # TODO check if `method` is declared as `@noinline`
+        isdefined(method, :source) && ccall(:jl_ir_flag_inlineable, Bool, (Any,), method.source) && return false
+
+        local cacheexist = false
+        for mi in method.specializations
+            if isa(mi, MethodInstance)
+                code = get(cache, mi, nothing)
+                if isdefined(code, :inferred)
+                    cache_inf = code.inferred
+                    if !(cache_inf === nothing)
+                        if inlining_policy(cache_inf) !== nothing
+                            return false
+                        end
+                    end
+                end
+                cacheexist |= true
+            end
+        end
+        return cacheexist
+    end
+    return false
+end
+
 is_noinfer(method::Method) = method.noinfer && is_nospecialized(method)
 # is_noinfer(method::Method) = is_nospecialized(method) &&  is_declared_noinline(method)

@timholy
Copy link
Sponsor Member

timholy commented Aug 23, 2021

Yeah, I don't think you can rely on history, that's a recipe for a lot of head-scratching. Since currently the @nospecialize has to be added manually, I think we should make this one manual too.

In SnoopCompile, I have an experimental teh/auto_pgds branch with the aim of automating annotation of @nospecialize (and we could add @noinfer) which I'm happy to push in case you'd find it interesting. The idea is to (1) collect both inference- and runtime-profiling data, (2) pick Methods that have a poor ratio of runtime-to-inference-time, (3) identify those arguments that contribute most to their type diversity, and then (4) sneakily set the m.nospecialize field for those arguments. Still to go would be (5) to invalidate all the inferred methods and recollect inference- and runtime-profiling data with the new specialization settings, (6) correct any mistakes (anything that blew up in its runtime due to de-specialization) and perhaps iterate until the most favorable balance has been struck, and then (7) write out the results to the source code. We could skip 5&6 and just let the developer do it, of course. An experimental version of this was behind JuliaData/DataFrames.jl#2597.

@aviatesk
Copy link
Sponsor Member Author

Yeah I'm definitely interested in. Could you push it (it's okay not to make a PR if you prefer) ?

@timholy
Copy link
Sponsor Member

timholy commented Aug 24, 2021

https://github.com/timholy/SnoopCompile.jl/tree/teh/auto_pgds

@@ -584,7 +587,11 @@ function maybe_get_const_prop_profitable(interp::AbstractInterpreter, result::Me
end
end
force |= allconst
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we need to add a new condition here (to has_nontrivial_const_info for const_prop_argument_heuristic) to check for the case where:

  1. inlining is estimated or declared to be beneficial AND
  2. noinfer is set on the method AND
  3. !(mi.specTypes <: argtypes) # e.g. that match.spec_types was widened by specialize_method_noinfer

So that we will compute an optimized version of the method here, in preparation for faster inlining of that code later?

@vtjnash
Copy link
Sponsor Member

vtjnash commented Aug 25, 2021

Alright, I guess it is reasonable to separate this from @noinline, though perhaps it should often imply that flag also, unless the user explicitly marked it @inline?

@Tokazama
Copy link
Contributor

Is this still being actively developed? I think it still would be useful

@aviatesk
Copy link
Sponsor Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/noinfer branch 3 times, most recently from 27bc4d4 to bca7fea Compare April 13, 2023 14:27
@aviatesk
Copy link
Sponsor Member Author

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - no performance regressions were detected. A full report can be found here.

@aviatesk aviatesk force-pushed the avi/noinfer branch 2 times, most recently from 56526aa to 3efba74 Compare April 14, 2023 10:23
@aviatesk
Copy link
Sponsor Member Author

Okay, I think this PR is ready to go.

3efba74 adds @noinfer macro on various Core.Compiler functions
and achieves the following sysimage size reduction:

this commit master %
Core.Compiler compilation (sec) 66.4551 71.0846 0.935
corecompiler.jl (KB) 17638080 18407248 0.958
sys.jl (KB) 88736432 89361280 0.993
sys-o.a (KB) 189484400 189907096 0.998

And it seems there are no regressions in the "inference" benchmark.

@aviatesk aviatesk requested a review from vtjnash April 14, 2023 10:26
Copy link
Sponsor Member

@timholy timholy left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Thanks so much for finishing this.

src/julia.h Outdated
@@ -286,6 +286,7 @@ typedef struct _jl_code_info_t {
uint8_t inferred;
uint8_t propagate_inbounds;
uint8_t has_fcall;
uint8_t noinfer;
Copy link
Sponsor Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So is the intent here that we might eventually be able to have both inferred and noinferred compilations of the same method and thus support call-site @noinfer annotations? If so, 👍

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently we represent boolean properties as uint8_t field, including noinfer, but yes, it's possible to implement call-site @noinfer.

base/expr.jl Outdated Show resolved Hide resolved
@@ -544,6 +544,10 @@ function abstract_call_method(interp::AbstractInterpreter,
sigtuple = unwrap_unionall(sig)
sigtuple isa DataType || return MethodCallResult(Any, false, false, nothing, Effects())

if is_noinfer(method)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should make this controllable by the absint. GPUCompiler may choose to ignore this and force full inference.

cc: @maleadt

Copy link
Sponsor Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm fine with implementing a configuration here but maybe we can delay it until it turns out that GPUCompiler (or other external compilation pipeline) really needs it? Since @nospecializeinfer does not prohibit constant prop' and such, I'm still wondering if it can cause unwanted inference regression.

@vtjnash
Copy link
Sponsor Member

vtjnash commented May 9, 2023

SGTM

I think the name might be confusing though, as @noinfer makes it sounds like it won’t do any inference (like the inference barriers annotations we have), but really it just relaxes the heuristics a bit for when to do it. Maybe the name needs to mention that it applies to specific arguments, like @nospecializeinfer? Or to @infercompilesig?

@aviatesk
Copy link
Sponsor Member Author

aviatesk commented May 9, 2023

I think the name might be confusing though, as @noinfer makes it sounds like it won’t do any inference (like the inference barriers annotations we have), but really it just relaxes the heuristics a bit for when to do it. Maybe the name needs to mention that it applies to specific arguments, like @nospecializeinfer? Or to @infercompilesig?

Both @nospecializeinfer and @infercompilesig seem like good options. If I had to choose, I would prefer @nospecializeinfer as it more explicitly reflects the current design, indicating that the annotation is closely related to @nospecialize. If there are no objections, I will proceed with using @nospecializeinfer.

@aviatesk aviatesk force-pushed the avi/noinfer branch 2 times, most recently from 5324f3e to 6fdc9ba Compare May 10, 2023 02:44
@aviatesk aviatesk changed the title introduce @noinfer macro to tell the compiler to avoid excess inference introduce @nospecializeinfer macro to tell the compiler to avoid excess inference May 10, 2023
@timholy
Copy link
Sponsor Member

timholy commented May 11, 2023

I'm good with either name, they are both sensible.

aviatesk and others added 2 commits May 12, 2023 14:16
…cess inference

This commit introduces a new compiler annotation called `@nospecializeinfer`,
which allows us to request the compiler to avoid excessive inference.

\## `@nospecialize` mechanism

T discuss `@nospecializeinfer`, let's first understand the behavior of
`@nospecialize`.

Its docstring says that

> This is only a hint for the compiler to avoid excess code generation.

, and it works by suppressing dispatches with complex runtime
occurrences of the annotated arguments. This could be understood with
the example below:
```julia
julia> function call_func_itr(func, itr)
           local r = 0
           r += func(itr[1])
           r += func(itr[2])
           r += func(itr[3])
           r
       end;

julia> _isa = isa; # just for the sake of explanation, global variable to prevent inlining

julia> func_specialize(a) = _isa(a, Function);

julia> func_nospecialize(@nospecialize a) = _isa(a, Function);

julia> dispatchonly = Any[sin, muladd, nothing]; # untyped container can cause excessive runtime dispatch

julia> @code_typed call_func_itr(func_specialize, dispatchonly)
CodeInfo(
1 ─ %1  = π (0, Int64)
│   %2  = Base.arrayref(true, itr, 1)::Any
│   %3  = (func)(%2)::Any
│   %4  = (%1 + %3)::Any
│   %5  = Base.arrayref(true, itr, 2)::Any
│   %6  = (func)(%5)::Any
│   %7  = (%4 + %6)::Any
│   %8  = Base.arrayref(true, itr, 3)::Any
│   %9  = (func)(%8)::Any
│   %10 = (%7 + %9)::Any
└──       return %10
) => Any

julia> @code_typed call_func_itr(func_nospecialize, dispatchonly)
CodeInfo(
1 ─ %1  = π (0, Int64)
│   %2  = Base.arrayref(true, itr, 1)::Any
│   %3  = invoke func(%2::Any)::Any
│   %4  = (%1 + %3)::Any
│   %5  = Base.arrayref(true, itr, 2)::Any
│   %6  = invoke func(%5::Any)::Any
│   %7  = (%4 + %6)::Any
│   %8  = Base.arrayref(true, itr, 3)::Any
│   %9  = invoke func(%8::Any)::Any
│   %10 = (%7 + %9)::Any
└──       return %10
) => Any
```

The calls of `func_specialize` remain to be `:call` expression (so that
they are dispatched and compiled at runtime) while the calls of
`func_nospecialize` are resolved as `:invoke` expressions. This is
because `@nospecialize` requests the compiler to give up compiling
`func_nospecialize` with runtime argument types but with the declared
argument types, allowing `call_func_itr(func_nospecialize, dispatchonly)`
to avoid runtime dispatches and accompanying JIT compilations
(i.e. "excess code generation").

The difference is evident when checking `specializations`:
```julia
julia> call_func_itr(func_specialize, dispatchonly)
2

julia> length(Base.specializations(only(methods(func_specialize))))
3 # w/ runtime dispatch, multiple specializations

julia> call_func_itr(func_nospecialize, dispatchonly)
2

julia> length(Base.specializations(only(methods(func_nospecialize))))
1 # w/o runtime dispatch, the single specialization
```

The problem here is that it influences dispatch only, and does not
intervene into inference in anyway. So there is still a possibility of
"excess inference" when the compiler sees a considerable complexity of
argument types during inference:
```julia
julia> func_specialize(a) = _isa(a, Function); # redefine func to clear the specializations

julia> @Assert length(Base.specializations(only(methods(func_specialize)))) == 0;

julia> func_nospecialize(@nospecialize a) = _isa(a, Function); # redefine func to clear the specializations

julia> @Assert length(Base.specializations(only(methods(func_nospecialize)))) == 0;

julia> withinfernce = tuple(sin, muladd, "foo"); # typed container can cause excessive inference

julia> @time @code_typed call_func_itr(func_specialize, withinfernce);
  0.000812 seconds (3.77 k allocations: 217.938 KiB, 94.34% compilation time)

julia> length(Base.specializations(only(methods(func_specialize))))
4 # multiple method instances inferred

julia> @time @code_typed call_func_itr(func_nospecialize, withinfernce);
  0.000753 seconds (3.77 k allocations: 218.047 KiB, 92.42% compilation time)

julia> length(Base.specializations(only(methods(func_nospecialize))))
4 # multiple method instances inferred
```

The purpose of this PR is to implement a mechanism that allows us to
avoid excessive inference to reduce the compilation latency when
inference sees a considerable complexity of argument types.

\## Design

Here are some ideas to implement the functionality:
1. make `@nospecialize` block inference
2. add nospecializeinfer effect when `@nospecialize`d method is annotated as `@noinline`
3. implement as `@pure`-like boolean annotation to request nospecializeinfer effect on top of `@nospecialize`
4. implement as annotation that is orthogonal to `@nospecialize`

After trying 1 ~ 3., I decided to submit 3.

\### 1. make `@nospecialize` block inference

This is almost same as what Jameson has done at <vtjnash@8ab7b6b>.
It turned out that this approach performs very badly because some of
`@nospecialize`'d arguments still need inference to perform reasonably.
For example, it's obvious that the following definition of
`getindex(@nospecialize(t::Tuple), i::Int)` would perform very badly if
`@nospecialize` blocks inference, because of a lack of useful type
information for succeeding optimizations:
<https://github.com/JuliaLang/julia/blob/12d364e8249a07097a233ce7ea2886002459cc50/base/tuple.jl#L29-L30>

\### 2. add nospecializeinfer effect when `@nospecialize`d method is annotated as `@noinline`

The important observation is that we often use `@nospecialize` even when
we expect inference to forward type and constant information.
Adversely, we may be able to exploit the fact that we usually don't
expect inference to forward information to a callee when we annotate it
with `@noinline` (i.e. when adding `@noinline`, we're usually fine with
disabling inter-procedural optimizations other than resolving dispatch).
So the idea is to enable the inference suppression when `@nospecialize`'d
method is annotated as `@noinline` too.

It's a reasonable choice and can be efficiently implemented with #41922.
But it sounds a bit weird to me to associate no infer effect with
`@noinline`, and I also think there may be some cases we want to inline
a method while partly avoiding inference, e.g.:
```julia
\# the compiler will always infer with `f::Any`
@noinline function twof(@nospecialize(f), n) # this method body is very simple and should be eligible for inlining
    if occursin('+', string(typeof(f).name.name::Symbol))
        2 + n
    elseif occursin('*', string(typeof(f).name.name::Symbol))
        2n
    else
        zero(n)
    end
end
```

\### 3. implement as `@pure`-like boolean annotation to request nospecializeinfer effect on top of `@nospecialize`

This is what this commit implements. It basically replaces the previous
`@noinline` flag with a newly-introduced annotation named `@nospecializeinfer`.
It is still associated with `@nospecialize` and it only has effect when
used together with `@nospecialize`, but now it is not associated to
`@noinline`, and it would help us reason about the behavior of `@nospecializeinfer`
and experiment its effect more safely:
```julia
\# the compiler will always infer with `f::Any`
Base.@nospecializeinfer function twof(@nospecialize(f), n) # the compiler may or not inline this method
    if occursin('+', string(typeof(f).name.name::Symbol))
        2 + n
    elseif occursin('*', string(typeof(f).name.name::Symbol))
        2n
    else
        zero(n)
    end
end
```

\### 4. implement as annotation that is orthogonal to `@nospecialize`

Actually, we can have `@nospecialize` and `@nospecializeinfer` separately, and it
would allow us to configure compilation strategies in a more
fine-grained way.
```julia
function noinfspec(Base.@nospecializeinfer(f), @nospecialize(g))
    ...
end
```

I'm fine with this approach but at the same time I'm afraid to have too
many annotations that are related to some sort (I expect we will
annotate both `@nospecializeinfer` and `@nospecialize` in this scheme).

Co-authored-by: Mosè Giordano <giordano@users.noreply.github.com>
Co-authored-by: Tim Holy <tim.holy@gmail.com>
This commit adds `@nospecializeinfer` macro on various `Core.Compiler`
functions and achieves the following sysimage size reduction:

|                                   | this commit | master      | %       |
| --------------------------------- | ----------- | ----------- | ------- |
| `Core.Compiler` compilation (sec) | `66.4551`   | `71.0846`   | `0.935` |
| `corecompiler.jl` (KB)            | `17638080`  | `18407248`  | `0.958` |
| `sys.jl` (KB)                     | `88736432`  | `89361280`  | `0.993` |
| `sys-o.a` (KB)                    | `189484400` | `189907096` | `0.998` |
@aviatesk aviatesk merged commit f44be79 into master May 23, 2023
@aviatesk aviatesk deleted the avi/noinfer branch May 23, 2023 06:16
@timholy
Copy link
Sponsor Member

timholy commented May 24, 2023

Woot! Thanks so much!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:inference Type inference compiler:latency Compiler latency
Projects
None yet
10 participants