Make an inference hot-path slightly faster #44421

Keno · 2022-03-03T00:15:42Z

This aims to improve performance of inference slightly by removing
a dynamic dispatch from calls to widenwrappedconditional, which
appears in various hot paths and showed up in profiling of inference.

There's two changes here:

Improve inlining for calls to functions of the form

f(x::Int) = 1
f(@nospecialize(x::Any)) = 2

Previously, we would peel of the x::Int case and then
generate a dynamic dispatch for the x::Any case. After
this change, we directly emit an :invoke for the x::Any
case (as well as enabling inlining of it in general).

Refactor widenwrappedconditional itself to avoid a signature
with a union in it, since ironically union splitting cannot currently
deal with that (it can only split unions if they're manifest in the
call arguments).

aviatesk · 2022-03-03T00:42:00Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-03-03T01:16:00Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

Keno · 2022-03-03T04:59:00Z

I'm gonna look into the optimizer benchmark to see if there's anything interesting, but it's probably just doing real work to process the new case. That's expected.

This aims to improve performance of inference slightly by removing a dynamic dispatch from calls to `widenwrappedconditional`, which appears in various hot paths and showed up in profiling of inference. There's two changes here: 1. Improve inlining for calls to functions of the form ``` f(x::Int) = 1 f(@nospecialize(x::Any)) = 2 ``` Previously, we would peel of the `x::Int` case and then generate a dynamic dispatch for the `x::Any` case. After this change, we directly emit an `:invoke` for the `x::Any` case (as well as enabling inlining of it in general). 2. Refactor `widenwrappedconditional` itself to avoid a signature with a union in it, since ironically union splitting cannot currently deal with that (it can only split unions if they're manifest in the call arguments).

Keno · 2022-03-03T05:34:15Z

Yeah, I took a look and as far as I can tell, the optimizer is just doing more real work to provide that inference speedup. Also, my timing didn't show the timing differences as dramatic as what nanosolider shows, just a few percent (but directionally aligned with what nanosolider says).

Keno · 2022-03-03T05:34:31Z

@nanosoldier runbenchmarks("inference", vs=":master")

nanosoldier · 2022-03-03T06:08:38Z

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

#44421 changed the union-splitting to not generate a fallback dynamic dispatch call when there is any fully covered call. But It should generate it if there is any uncovered call candidate. fix #48397.

#44421 changed the union-splitting to skip generating unnecessary fallback dynamic dispatch call when there is any fully covered call. But it turned out that this is only valid when there is any fully covered call in matches for all signatures that inference split, and it is invalid if there is any union split signature against which any uncovered call is found. Consider the following example: # case 1 # def nosplit(::Any) = [...] nosplit(::Int) = [...] # call nosplit(a::Any) split1: a::Any ┬ nosplit(a::Int) └ nosplit(a::Any) # fully covers split1 # case 2 # def convert(::Type{T}, ::T) = T # call convert(::Type{Union{Bool,Tuple{Int,String}}}, a::Union{Bool,Tuple{Int,Any}}) split1: a::Bool ─ convert(::Type{Bool}, ::Bool) # fully covers split1 split2: a::Tuple{Int,Any} ─ convert(::Type{Tuple{Int,String}}, ::Tuple{Int,String}) # NOT fully covers split2 #44421 allows us to optimize the the first case, but handles the second case wrongly. This commit fixes it up while still optimizing the first case. fix #48397.

aviatesk self-requested a review March 3, 2022 01:33

Keno force-pushed the kf/inffaster branch from e0f5e7a to 0dbe6fd Compare March 3, 2022 04:47

Keno force-pushed the kf/inffaster branch from 0dbe6fd to bcccb90 Compare March 3, 2022 05:33

oscardssmith added the compiler:latency Compiler latency label Mar 3, 2022

vtjnash approved these changes Mar 3, 2022

View reviewed changes

Keno merged commit 96d6d86 into master Mar 3, 2022

Keno deleted the kf/inffaster branch March 3, 2022 23:23

aviatesk mentioned this pull request Mar 8, 2022

optimizer: inline abstract union-split callsite #44512

Merged

maleadt mentioned this pull request Jan 27, 2023

Julia 1.9: precompile prints fatal error in type inference (type bound) when mixing pkgimages=yes and pkgimages=no #48397

Closed

aviatesk mentioned this pull request Jan 30, 2023

inlining: make union splitting account for uncovered call #48455

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make an inference hot-path slightly faster #44421

Make an inference hot-path slightly faster #44421

Keno commented Mar 3, 2022

aviatesk commented Mar 3, 2022

nanosoldier commented Mar 3, 2022

Keno commented Mar 3, 2022

Keno commented Mar 3, 2022

Keno commented Mar 3, 2022

nanosoldier commented Mar 3, 2022

Make an inference hot-path slightly faster #44421

Make an inference hot-path slightly faster #44421

Conversation

Keno commented Mar 3, 2022

aviatesk commented Mar 3, 2022

nanosoldier commented Mar 3, 2022

Keno commented Mar 3, 2022

Keno commented Mar 3, 2022

Keno commented Mar 3, 2022

nanosoldier commented Mar 3, 2022