Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Fix delegate GDV for loops with multiple cloning options #109407

Merged
merged 1 commit into from
Oct 31, 2024

Conversation

jakobbotsch
Copy link
Member

@jakobbotsch jakobbotsch commented Oct 31, 2024

For loops that we consider candidates for cloning with multiple cloning options there was a logic bug that caused cloning to exit early. This affected even canonical loops over arrays if they also had a delegate GDV cloning option.

Fix #109379

New codegen:

; Assembly listing for method C:AggregateDelegate[int](int[],System.Func`3[int,int,int],int):int (Tier1)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; Tier1 code
; optimized code
; optimized using Dynamic PGO
; rsp based frame
; fully interruptible
; with Dynamic PGO: fgCalledCount is 100
; 0 inlinees with PGO data; 1 single block inlinees; 0 inlinees without PGO data
; Final local variable assignments
;
;  V00 arg0         [V00,T06] (  5,  4.00)     ref  ->  rcx         class-hnd single-def <int[]>
;  V01 arg1         [V01,T05] (  7,  5.66)     ref  ->  rbx         class-hnd single-def <System.Func`3[int,int,int]>
;  V02 arg2         [V02,T03] (  8,337.13)     int  ->   r8
;* V03 loc0         [V03,T10] (  0,  0   )     int  ->  zero-ref
;  V04 OutArgs      [V04    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;  V05 tmp1         [V05,T04] (  5,334.13)     int  ->   r8         "guarded devirt return temp"
;  V06 tmp2         [V06,T00] (  5,668.27)     int  ->  rax         "guarded devirt arg temp"
;* V07 tmp3         [V07    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "guarded devirt this exact temp" <C+<>c>
;  V08 cse0         [V08,T09] (  4,  3.00)     int  ->  rdx         "CSE #01: aggressive"
;  V09 rat0         [V09,T07] (  4,  5.02)   byref  ->  rsi         "Strength reduced derived IV"
;  V10 rat1         [V10,T08] (  4,  5.02)     int  ->  rdi         "Trip count IV"
;  V11 rat2         [V11,T01] (  4,497.18)   byref  ->  rcx         "Strength reduced derived IV"
;  V12 rat3         [V12,T02] (  4,497.18)     int  ->  rdx         "Trip count IV"
;
; Lcl frame size = 32

G_M53109_IG01:  ;; offset=0x0000
       push     rdi
       push     rsi
       push     rbx
       sub      rsp, 32
       mov      rbx, rdx
                                                ;; size=10 bbWeight=1 PerfScore 3.50
G_M53109_IG02:  ;; offset=0x000A
       mov      edx, dword ptr [rcx+0x08]
       test     edx, edx
       jle      SHORT G_M53109_IG05
                                                ;; size=7 bbWeight=1 PerfScore 3.25
G_M53109_IG03:  ;; offset=0x0011
       test     rbx, rbx
       je       SHORT G_M53109_IG07
       mov      rax, 0x7FFA12D60828      ; code for C+<>c:<Main>b__0_0(int,int):int:this
       cmp      qword ptr [rbx+0x18], rax
       jne      SHORT G_M53109_IG07
       add      rcx, 16
       align    [0 bytes for IG04]
                                                ;; size=25 bbWeight=0.99 PerfScore 5.72
G_M53109_IG04:  ;; offset=0x002A
       mov      eax, dword ptr [rcx]
       add      r8d, eax
       add      rcx, 4
       dec      edx
       jne      SHORT G_M53109_IG04
                                                ;; size=13 bbWeight=165.40 PerfScore 620.24
G_M53109_IG05:  ;; offset=0x0037
       mov      eax, r8d
                                                ;; size=3 bbWeight=1.00 PerfScore 0.25
G_M53109_IG06:  ;; offset=0x003A
       add      rsp, 32
       pop      rbx
       pop      rsi
       pop      rdi
       ret
                                                ;; size=8 bbWeight=1.00 PerfScore 2.75
G_M53109_IG07:  ;; offset=0x0042
       lea      rsi, bword ptr [rcx+0x10]
       mov      edi, edx
                                                ;; size=6 bbWeight=0.01 PerfScore 0.01
G_M53109_IG08:  ;; offset=0x0048
       mov      eax, dword ptr [rsi]
       mov      rdx, 0x7FFA12D60828      ; function address
       cmp      qword ptr [rbx+0x18], rdx
       jne      SHORT G_M53109_IG10
       add      r8d, eax
                                                ;; size=21 bbWeight=1.67 PerfScore 10.86
G_M53109_IG09:  ;; offset=0x005D
       add      rsi, 4
       dec      edi
       jne      SHORT G_M53109_IG08
       jmp      SHORT G_M53109_IG05
                                                ;; size=10 bbWeight=1.67 PerfScore 5.85
G_M53109_IG10:  ;; offset=0x0067
       mov      edx, r8d
       mov      r8d, eax
       mov      rcx, gword ptr [rbx+0x08]
       call     [rbx+0x18]System.Func`3[int,int,int]:Invoke(int,int):int:this
       mov      r8d, eax
       jmp      SHORT G_M53109_IG09
                                                ;; size=18 bbWeight=0 PerfScore 0.00

; Total bytes of code 121, prolog size 10, PerfScore 652.42, instruction count 43, allocated bytes for code 121 (MethodHash=0e81308a) for method C:AggregateDelegate[int](int[],System.Func`3[int,int,int],int):int (Tier1)
; ============================================================

For loops that we consider candidates for cloning with multiple cloning
options there was a logic bug that caused cloning to exit early. This
affected even canonical loops over arrays.
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Oct 31, 2024
@hez2010
Copy link
Contributor

hez2010 commented Oct 31, 2024

Can we make it into .NET 9 servicing as well? It seems to be a bug instead of an optimization.

@jakobbotsch
Copy link
Member Author

Can we make it into .NET 9 servicing as well? It seems to be a bug instead of an optimization.

This is unlikely to meet the bar for servicing since it is just a missing optimization and no bad codegen.

@jakobbotsch jakobbotsch marked this pull request as ready for review October 31, 2024 12:11
@jakobbotsch
Copy link
Member Author

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Diffs. For win-x64: 34 affected contexts in benchmarks.run_pgo, and 182 (diffs) + 90 (misses) affected contexts in libraries_tests.run. Sadly we have no aspnet collection at the moment, so can't see how this affects it.

@AndyAyersMS
Copy link
Member

Sadly we have no aspnet collection at the moment, so can't see how this affects it

I'll have one up later today...

@jakobbotsch
Copy link
Member Author

/azp run runtime-coreclr superpmi-diffs

Copy link

Azure Pipelines successfully started running 1 pipeline(s).

@jakobbotsch
Copy link
Member Author

26 hits in the asp.net collection it appears.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

JIT: Delegate GDV check is not being hoisted outside the loop
3 participants