Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Devirtualization for generic virtual methods #112596

Open
7 tasks
hez2010 opened this issue Feb 15, 2025 · 3 comments
Open
7 tasks

JIT: Devirtualization for generic virtual methods #112596

hez2010 opened this issue Feb 15, 2025 · 3 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@hez2010
Copy link
Contributor

hez2010 commented Feb 15, 2025

Generic virtual methods

In .NET we have generic virtual methods, which stand for virtual methods that have method instantiations. For example,

IProcessor p = new MyValueProcessor();
p.Process(42);

public interface IProcessor
{
    void Process<T>(T item);
}

public class MyValueProcessor : IProcessor
{
    public void Process<T>(T item)
    {
        Console.WriteLine(item?.ToString());
    }
}

We made the method in the interface a generic method in order to avoid boxing, but unfortunately today all generic virtual methods go through a virtual function lookup followed by an indirect call:

mov      rdi, rbx                 ; this
mov      rsi, 0x7A468E5A8758      ; Program+IProcessor
mov      rdx, 0x7A468E6F0F88      ; token handle
call     [CORINFO_HELP_VIRTUAL_FUNC_PTR]
mov      rdi, rbx
mov      esi, 42
call     rax

This can make the code even slower than not making the method generic because we are not able to devirtualize any indirect call today.

Devirtualization story

Today we assert the base method desc must not have a method instantiation because with today's devirtualization, we will end up with a method desc with invalid method instantiations as they live in the base method desc. So we can create an associated method desc for it to put the instantiation information on the devirted method desc as well, after we find the exact method table:

pDevirtMD = pDevirtMD->FindOrCreateAssociatedMethodDesc(
    /* pDefMD */ pDevirtMD,
    /* pExactMT */ pExactMT,
    /* forceBoxedEntryPoint */ pExactMT->IsValueType() && !pDevirtMD->IsStatic(),
    /* methodInst */ pBaseMD->GetMethodInstantiation(),
    /* allowInstParam */ false
);

Note that we need to handle unboxing stub, so for instance struct receivers, we need to force boxed entry point. And because the method itself is generic, allowInstParam should be false.

Then with the devirted method desc, we can call it with InstParam directly without going through the virtual function pointer lookup. While there's a case where it can eventually end up with a canonical method table as the exact method table, in which case we need to bail.

But if we take a deeper look:

STMT00002 ( 0x005[--] ... ??? )
               [000012] DACXG------                         *  STORE_LCL_VAR long   V02 tmp2
               [000011] --CXG------                         \--*  CALL help long   CORINFO_HELP_VIRTUAL_FUNC_PTR
               [000005] ----------- arg0                       +--*  LCL_VAR   ref    V01 tmp1
               [000009] H---------- arg1                       +--*  CNS_INT(h) long   0x7ffe88447668 class IProcessor
               [000010] H---------- arg2                       \--*  CNS_INT(h) long   0x7ffe884479c0 token

STMT00003 ( ??? ... ??? )
               [000007] --CXG------                         *  CALL ind  void
               [000008] ----------- this                    +--*  LCL_VAR   ref    V01 tmp1
               [000006] ----------- arg1                    +--*  CNS_INT   int    42
               [000013] ----------- calli tgt               \--*  LCL_VAR   long   V02 tmp2

we can find that, although we have all the necessary information, we spilled the ldvirtftn so that we lost those information when we do the indirect call, so we don't have the method desc we want when we do the devirtualization.

And furthermore, even we have all the necessary information we need to devirtualize the call, the devirted method may be an instantiating stub that requires a runtime lookup, in this case we cannot use the instantiating stub from WrappedMethodDesc we created by FindOrCreateAssociatedMethodDesc before as the InstParam, so we still need to put the runtime lookup node as an InstParam arg.

The solution to this is to not spill it early, so that we can end up trees like

STMT00002 ( 0x005[--] ... ??? )
               [000007] --CXG------                         *  CALL ind  void
               [000008] ----------- this                    +--*  LCL_VAR   ref    V01 tmp1
               [000006] ----------- arg1                    +--*  CNS_INT   int    42
               [000011] --CXG------ calli tgt               \--*  CALL help long   CORINFO_HELP_VIRTUAL_FUNC_PTR
               [000005] ----------- arg0                       +--*  LCL_VAR   ref    V01 tmp1
               [000009] H---------- arg1                       +--*  CNS_INT(h) long   0x7ffed0cd7478 class IProcessor
               [000010] H---------- arg2                       \--*  CNS_INT(h) long   0x7ffed0cd79c0 token

Then we will have all the necessary information for devirtualization. After devirtualization, we can push the necessary method InstParam to the call. In the above case we don't need a method InstParam so it will end up

               [000007] --CXG------                         *  CALL nullcheck void   PrintProcessor:Process[int](int):this
               [000008] ----------- this                    +--*  LCL_VAR   ref    V01 tmp1
               [000006] ----------- arg1                    \--*  CNS_INT   int    42

for cases where we need a method InstParam, it may end up:

               [000007] --CXG------                         *  CALL nullcheck void   PrintProcessor:Process[System.__Canon](System.__Canon):this
               [000008] ----------- this                    +--*  LCL_VAR   ref    V01 tmp1
               [000012] H---------- gctx                    +--*  CNS_INT(h) long   0x7ffe9e4e7cf8 method PrintProcessor:Process[System.String](System.String):this
               [000006] ----------- arg2                    \--*  CNS_STR   ref   <string constant>

or when a runtime lookup is required (a real-world example):

               [000051] --CXG------                         *  CALL ind  ref
               [000052] ----------- this                    +--*  LCL_VAR   ref    V08 tmp5
               [000050] ----------- arg1                    +--*  LCL_VAR   ref    V06 tmp3
               [000060] --CXG------ calli tgt               \--*  CALL help long   CORINFO_HELP_VIRTUAL_FUNC_PTR
               [000049] ----------- arg0                       +--*  LCL_VAR   ref    V08 tmp5
               [000053] H---------- arg1                       +--*  CNS_INT(h) long   0x7ffe8942a520 class Microsoft.Extensions.Options.OptionsBuilder`1[Microsoft.Extensions.Options.StartupValidatorOptions]
               [000059] ----------- arg2                       \--*  RUNTIMELOOKUP long   0x7ffe8942ba50 method
               [000058] -----------                               \--*  LCL_VAR   long   V09 tmp6

we can devirt it into

               [000051] --CXG------                         *  CALL nullcheck ref    Microsoft.Extensions.Options.OptionsBuilder`1[System.__Canon]:Configure[System.__Canon](System.Action`2[System.__Canon,System.__Canon]):Microsoft.Extensions.Options.OptionsBuilder`1[System.__Canon]:this
               [000052] ----------- this                    +--*  LCL_VAR   ref    V08 tmp5
               [000059] ----------- gctx                    +--*  RUNTIMELOOKUP long   0x7ffe8942ba50 method
               [000058] -----------                         |  \--*  LCL_VAR   long   V09 tmp6
               [000050] ----------- arg2                    \--*  LCL_VAR   ref    V06 tmp3

So far we managed to devirtualize generic virtual methods, then we can unblock the inlining, even for cases where runtime lookup is needed for method inst.

However, the JIT backend doesn't handle well when the call address is a CALL in an indirect call, that is, if we failed to devirtualize a generic virtual call, we will end up

STMT00002 ( 0x005[--] ... ??? )
               [000007] --CXG------                         *  CALL ind  void
               [000008] ----------- this                    +--*  LCL_VAR   ref    V01 tmp1
               [000006] ----------- arg1                    +--*  CNS_INT   int    42
               [000011] --CXG------ calli tgt               \--*  CALL help long   CORINFO_HELP_VIRTUAL_FUNC_PTR
               [000005] ----------- arg0                       +--*  LCL_VAR   ref    V01 tmp1
               [000009] H---------- arg1                       +--*  CNS_INT(h) long   0x7ffed0cd7478 class IProcessor
               [000010] H---------- arg2                       \--*  CNS_INT(h) long   0x7ffed0cd79c0 token

where the backend is not handling it well.

As a temporary workaround, we can split the call so that the ldvirtftn will first be executed and stored into a local, and the call address can be replaced with the local. But the real fix here is to fix the backend handling so that we don't need to split it at all.

The prototype has done in #112353, and it shows many interesting optimization opportunities across logging, json parsing, LINQ/PLINQ, collections, hosting, dependency injection that being extensively adopted by all kinds of apps today and etc., see MihuBot/runtime-utils#1004 (code size regression are due to more inlining). And this work is also a prerequisites of enabling devirtualizing delegates that require a closure (capture locals).

NativeAOT uses a fat pointer for this so it need to be handled separately.

Taking the above code as an example,

before:

G_M24375_IG01:  ;; offset=0x0000
       push     rbp
       push     rbx
       push     rax
       lea      rbp, [rsp+0x10]
						;; size=8 bbWeight=1 PerfScore 3.50
G_M24375_IG02:  ;; offset=0x0008
       mov      rdi, 0x7F1A079C8828      ; Program+MyValueProcessor
       call     CORINFO_HELP_NEWSFAST
       mov      rbx, rax
       mov      rdi, rbx
       mov      rsi, 0x7F1A079C8758      ; Program+IProcessor
       mov      rdx, 0x7F1A07B10F88      ; token handle
       call     [CORINFO_HELP_VIRTUAL_FUNC_PTR]
       mov      rdi, rbx
       mov      esi, 42
       call     rax
       xor      eax, eax
						;; size=59 bbWeight=1 PerfScore 9.00
G_M24375_IG03:  ;; offset=0x0043
       add      rsp, 8
       pop      rbx
       pop      rbp
       ret      
						;; size=7 bbWeight=1 PerfScore 2.25

after:

G_M27646_IG01:  ;; offset=0x0000
       sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M27646_IG02:  ;; offset=0x0004
       mov      ecx, 42
       call     [System.Number:Int32ToDecStr(int):System.String]
       mov      rcx, rax
       call     [System.Console:WriteLine(System.String)]
       nop
                                                ;; size=21 bbWeight=1 PerfScore 6.75
G_M27646_IG03:  ;; offset=0x0019
       add      rsp, 40
       ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

Plans

  • 1. Stop splling ldvirtftn
  • 2. Handle CALL as call address of an indirect call in the backend
    • We may split the call as a temporary workaround before this is done
  • 3. Make changes to the VM to unblock devirtualization for virtual generics
  • 4. Enable late devirt for virtual generics
  • 5. Enable inlining for devirted virtual generics
  • 6. Support NativeAOT and R2R
  • 7. MethodDesc probing for virtual generics in PGO

#112353 covered 1~5, while 2 was handled by the workaround I mentioned before.

cc @dotnet/jit-contrib
cc @jkotas @MichalStrehovsky for review and suggestions on the VM part

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 15, 2025
@dotnet-policy-service dotnet-policy-service bot added the untriaged New issue has not been triaged by the area owner label Feb 15, 2025
Copy link
Contributor

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

@jkotas
Copy link
Member

jkotas commented Feb 18, 2025

@jkotas for review and suggestions on the VM part

I do not see fundamental problem with this. Changes like this have a high chance of bug tail and the bugs tend to involve very complicated generic constructs, so that something @dotnet/jit-contrib team needs to be prepared to deal with.

@JulieLeeMSFT
Copy link
Member

CC @AndyAyersMS, @dotnet/jit-contrib.

@JulieLeeMSFT JulieLeeMSFT removed the untriaged New issue has not been triaged by the area owner label Feb 19, 2025
@JulieLeeMSFT JulieLeeMSFT added this to the 10.0.0 milestone Feb 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

4 participants