Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Allow forwarding field accesses off of implicit byrefs #80852

Merged
merged 2 commits into from
Jan 23, 2023

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Jan 19, 2023

The JIT currently allows forwarding implicit byrefs at their last uses to calls, but only if the full implicit byref is used. This change allows the JIT to forward any such access off of an implicit byref parameter.

For example:

using System.Runtime.CompilerServices;

class Program
{
    public static void Main()
    {
        Foo(default);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Foo(S1 s1)
    {
        Bar(s1.B);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Bar(S2 s)
    {
    }

    private struct S1
    {
        public int A;
        public S2 B;
    }

    private struct S2
    {
        public int C, D, E, F;
    }
}

Codegen before:

; Assembly listing for method Program:Foo(Program+S1)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  6   )   byref  ->  rcx         single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V02 tmp1         [V02    ] (  2,  4   )  struct (16) [rsp+28H]   do-not-enreg[XS] addr-exposed "by-value struct argument"
;
; Lcl frame size = 56

G_M4574_IG01:              ;; offset=0000H
       4883EC38             sub      rsp, 56
       C5F877               vzeroupper
                                                ;; size=7 bbWeight=1 PerfScore 1.25
G_M4574_IG02:              ;; offset=0007H
       C5F8104104           vmovups  xmm0, xmmword ptr [rcx+04H]
       C5F811442428         vmovups  xmmword ptr [rsp+28H], xmm0
       488D4C2428           lea      rcx, [rsp+28H]
       FF1553A41C00         call     [Program:Bar(Program+S2)]
       90                   nop
                                                ;; size=23 bbWeight=1 PerfScore 8.75
G_M4574_IG03:              ;; offset=001EH
       4883C438             add      rsp, 56
       C3                   ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 35, prolog size 7, PerfScore 14.75, instruction count 9, allocated bytes for code 35 (MethodHash=a8e6ee21) for method Program:Test(Program+S1)
; ============================================================

Codegen after:

; Assembly listing for method Program:Foo(Program+S1)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  6   )   byref  ->  rcx         single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;
; Lcl frame size = 40

G_M4574_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M4574_IG02:              ;; offset=0004H
       4883C104             add      rcx, 4
       FF1532A11C00         call     [Program:Bar(Program+S2)]
       90                   nop
                                                ;; size=11 bbWeight=1 PerfScore 3.50
G_M4574_IG03:              ;; offset=000FH
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 20, prolog size 4, PerfScore 7.00, instruction count 6, allocated bytes for code 20 (MethodHash=a8e6ee21) for method Program:Test(Program+S1)
; ============================================================

(The latter would also be tailcalled without NoInlining attribute)

The JIT currently allows forwarding implicit byrefs at their last uses
to calls, but only if the full implicit byref is used. This change
allows the JIT to forward any such access off of an implicit byref
parameter.
@ghost ghost assigned jakobbotsch Jan 19, 2023
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jan 19, 2023
@ghost
Copy link

ghost commented Jan 19, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch, @kunalspathak
See info in area-owners.md if you want to be subscribed.

Issue Details

The JIT currently allows forwarding implicit byrefs at their last uses to calls, but only if the full implicit byref is used. This change allows the JIT to forward any such access off of an implicit byref parameter.

For example:

using System.Runtime.CompilerServices;

class Program
{
    public static void Main()
    {
        Foo(default);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Foo(S1 s1)
    {
        Bar(s1.B);
    }

    [MethodImpl(MethodImplOptions.NoInlining)]
    static void Bar(S2 s)
    {
    }

    private struct S1
    {
        public int A;
        public S2 B;
    }

    private struct S2
    {
        public int C, D, E, F;
    }
}

Codegen before:

; Assembly listing for method Program:Test(Program+S1)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  6   )   byref  ->  rcx         single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;  V02 tmp1         [V02    ] (  2,  4   )  struct (16) [rsp+28H]   do-not-enreg[XS] addr-exposed "by-value struct argument"
;
; Lcl frame size = 56

G_M4574_IG01:              ;; offset=0000H
       4883EC38             sub      rsp, 56
       C5F877               vzeroupper
                                                ;; size=7 bbWeight=1 PerfScore 1.25
G_M4574_IG02:              ;; offset=0007H
       C5F8104104           vmovups  xmm0, xmmword ptr [rcx+04H]
       C5F811442428         vmovups  xmmword ptr [rsp+28H], xmm0
       488D4C2428           lea      rcx, [rsp+28H]
       FF1553A41C00         call     [Program:Bar(Program+S2)]
       90                   nop
                                                ;; size=23 bbWeight=1 PerfScore 8.75
G_M4574_IG03:              ;; offset=001EH
       4883C438             add      rsp, 56
       C3                   ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 35, prolog size 7, PerfScore 14.75, instruction count 9, allocated bytes for code 35 (MethodHash=a8e6ee21) for method Program:Test(Program+S1)
; ============================================================

Codegen after:

; Assembly listing for method Program:Test(Program+S1)
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T00] (  3,  6   )   byref  ->  rcx         single-def
;  V01 OutArgs      [V01    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;
; Lcl frame size = 40

G_M4574_IG01:              ;; offset=0000H
       4883EC28             sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M4574_IG02:              ;; offset=0004H
       4883C104             add      rcx, 4
       FF1532A11C00         call     [Program:Bar(Program+S2)]
       90                   nop
                                                ;; size=11 bbWeight=1 PerfScore 3.50
G_M4574_IG03:              ;; offset=000FH
       4883C428             add      rsp, 40
       C3                   ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 20, prolog size 4, PerfScore 7.00, instruction count 6, allocated bytes for code 20 (MethodHash=a8e6ee21) for method Program:Test(Program+S1)
; ============================================================

(The latter would also be tailcalled without NoInlining attribute)

Author: jakobbotsch
Assignees: jakobbotsch
Labels:

area-CodeGen-coreclr

Milestone: -

@jakobbotsch
Copy link
Member Author

jakobbotsch commented Jan 19, 2023

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Small set of diffs. The main benefit is that it makes the copy elision work consistently for both implicit byrefs and normal locals.

@jakobbotsch
Copy link
Member Author

Ping @AndyAyersMS

@jakobbotsch jakobbotsch merged commit 5fda6fd into dotnet:main Jan 23, 2023
@jakobbotsch jakobbotsch deleted the forward-more-implicit-byrefs branch January 23, 2023 16:43
mdh1418 pushed a commit to mdh1418/runtime that referenced this pull request Jan 24, 2023
…0852)

The JIT currently allows forwarding implicit byrefs at their last uses
to calls, but only if the full implicit byref is used. This change
allows the JIT to forward any such access off of an implicit byref
parameter.
@ghost ghost locked as resolved and limited conversation to collaborators Feb 22, 2023
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants