Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make an inference hot-path slightly faster #44421

Merged
merged 1 commit into from
Mar 3, 2022
Merged

Make an inference hot-path slightly faster #44421

merged 1 commit into from
Mar 3, 2022

Conversation

Keno
Copy link
Member

@Keno Keno commented Mar 3, 2022

This aims to improve performance of inference slightly by removing
a dynamic dispatch from calls to widenwrappedconditional, which
appears in various hot paths and showed up in profiling of inference.

There's two changes here:

  1. Improve inlining for calls to functions of the form
f(x::Int) = 1
f(@nospecialize(x::Any)) = 2

Previously, we would peel of the x::Int case and then
generate a dynamic dispatch for the x::Any case. After
this change, we directly emit an :invoke for the x::Any
case (as well as enabling inlining of it in general).

  1. Refactor widenwrappedconditional itself to avoid a signature
    with a union in it, since ironically union splitting cannot currently
    deal with that (it can only split unions if they're manifest in the
    call arguments).

@aviatesk
Copy link
Member

aviatesk commented Mar 3, 2022

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@aviatesk aviatesk self-requested a review March 3, 2022 01:33
@Keno
Copy link
Member Author

Keno commented Mar 3, 2022

I'm gonna look into the optimizer benchmark to see if there's anything interesting, but it's probably just doing real work to process the new case. That's expected.

This aims to improve performance of inference slightly by removing
a dynamic dispatch from calls to `widenwrappedconditional`, which
appears in various hot paths and showed up in profiling of inference.

There's two changes here:

1. Improve inlining for calls to functions of the form
```
f(x::Int) = 1
f(@nospecialize(x::Any)) = 2
```
Previously, we would peel of the `x::Int` case and then
generate a dynamic dispatch for the `x::Any` case. After
this change, we directly emit an `:invoke` for the `x::Any`
case (as well as enabling inlining of it in general).

2. Refactor `widenwrappedconditional` itself to avoid a signature
with a union in it, since ironically union splitting cannot currently
deal with that (it can only split unions if they're manifest in the
call arguments).
@Keno
Copy link
Member Author

Keno commented Mar 3, 2022

Yeah, I took a look and as far as I can tell, the optimizer is just doing more real work to provide that inference speedup. Also, my timing didn't show the timing differences as dramatic as what nanosolider shows, just a few percent (but directionally aligned with what nanosolider says).

@Keno
Copy link
Member Author

Keno commented Mar 3, 2022

@nanosoldier runbenchmarks("inference", vs=":master")

@nanosoldier
Copy link
Collaborator

Your benchmark job has completed - possible performance regressions were detected. A full report can be found here.

@oscardssmith oscardssmith added the compiler:latency Compiler latency label Mar 3, 2022
@Keno Keno merged commit 96d6d86 into master Mar 3, 2022
@Keno Keno deleted the kf/inffaster branch March 3, 2022 23:23
aviatesk added a commit that referenced this pull request Jan 30, 2023
#44421 changed the union-splitting to not generate a
fallback dynamic dispatch call when there is any fully covered call.
But It should generate it if there is any uncovered call candidate.

fix #48397.
aviatesk added a commit that referenced this pull request Jan 30, 2023
#44421 changed the union-splitting to skip generating
unnecessary fallback dynamic dispatch call when there is any fully
covered call.
But it turned out that this is only valid when there is any fully
covered call in matches for all signatures that inference split, and it
is invalid if there is any union split signature against which any
uncovered call is found.

Consider the following example:
    # case 1
        # def
        nosplit(::Any) = [...]
        nosplit(::Int) = [...]
        # call
        nosplit(a::Any)
            split1: a::Any ┬ nosplit(a::Int)
                           └ nosplit(a::Any) # fully covers split1

    # case 2
        # def
        convert(::Type{T}, ::T) = T
        # call
        convert(::Type{Union{Bool,Tuple{Int,String}}}, a::Union{Bool,Tuple{Int,Any}})
            split1: a::Bool           ─ convert(::Type{Bool}, ::Bool)                           # fully covers split1
            split2: a::Tuple{Int,Any} ─ convert(::Type{Tuple{Int,String}}, ::Tuple{Int,String}) # NOT fully covers split2

#44421 allows us to optimize the the first case, but
handles the second case wrongly. This commit fixes it up while still
optimizing the first case.

fix #48397.
aviatesk added a commit that referenced this pull request Jan 30, 2023
#44421 changed the union-splitting to skip generating
unnecessary fallback dynamic dispatch call when there is any fully
covered call.
But it turned out that this is only valid when there is any fully
covered call in matches for all signatures that inference split, and it
is invalid if there is any union split signature against which any
uncovered call is found.

Consider the following example:

    # case 1
        # def
        nosplit(::Any) = [...]
        nosplit(::Int) = [...]
        # call
        nosplit(a::Any)
            split1: a::Any ┬ nosplit(a::Int)
                           └ nosplit(a::Any) # fully covers split1

    # case 2
        # def
        convert(::Type{T}, ::T) = T
        # call
        convert(::Type{Union{Bool,Tuple{Int,String}}}, a::Union{Bool,Tuple{Int,Any}})
            split1: a::Bool           ─ convert(::Type{Bool}, ::Bool)                           # fully covers split1
            split2: a::Tuple{Int,Any} ─ convert(::Type{Tuple{Int,String}}, ::Tuple{Int,String}) # NOT fully covers split2

#44421 allows us to optimize the the first case, but
handles the second case wrongly. This commit fixes it up while still
optimizing the first case.

fix #48397.
aviatesk added a commit that referenced this pull request Jan 30, 2023
#44421 changed the union-splitting to skip generating
unnecessary fallback dynamic dispatch call when there is any fully
covered call.
But it turned out that this is only valid when there is any fully
covered call in matches for all signatures that inference split, and it
is invalid if there is any union split signature against which any
uncovered call is found.

Consider the following example:

    # case 1
        # def
        nosplit(::Any) = [...]
        nosplit(::Int) = [...]
        # call
        nosplit(a::Any)
            split1: a::Any ┬ nosplit(a::Int)
                           └ nosplit(a::Any) # fully covers split1

    # case 2
        # def
        convert(::Type{T}, ::T) = T
        # call
        convert(::Type{Union{Bool,Tuple{Int,String}}}, a::Union{Bool,Tuple{Int,Any}})
            split1: a::Bool           ─ convert(::Type{Bool}, ::Bool)                           # fully covers split1
            split2: a::Tuple{Int,Any} ─ convert(::Type{Tuple{Int,String}}, ::Tuple{Int,String}) # NOT fully covers split2

#44421 allows us to optimize the the first case, but
handles the second case wrongly. This commit fixes it up while still
optimizing the first case.

fix #48397.
aviatesk added a commit that referenced this pull request Jan 30, 2023
#44421 changed the union-splitting to skip generating
unnecessary fallback dynamic dispatch call when there is any fully
covered call.
But it turned out that this is only valid when there is any fully
covered call in matches for all signatures that inference split, and it
is invalid if there is any union split signature against which any
uncovered call is found.

Consider the following example:

    # case 1
        # def
        nosplit(::Any) = [...]
        nosplit(::Int) = [...]
        # call
        nosplit(a::Any)
            split1: a::Any ┬ nosplit(a::Int)
                           └ nosplit(a::Any) # fully covers split1

    # case 2
        # def
        convert(::Type{T}, ::T) = T
        # call
        convert(::Type{Union{Bool,Tuple{Int,String}}}, a::Union{Bool,Tuple{Int,Any}})
            split1: a::Bool           ─ convert(::Type{Bool}, ::Bool)                           # fully covers split1
            split2: a::Tuple{Int,Any} ─ convert(::Type{Tuple{Int,String}}, ::Tuple{Int,String}) # NOT fully covers split2

#44421 allows us to optimize the the first case, but
handles the second case wrongly. This commit fixes it up while still
optimizing the first case.

fix #48397.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
compiler:latency Compiler latency
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants