Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Extend escape analysis to account for arrays with non-gcref elements #104906

Open
wants to merge 78 commits into
base: main
Choose a base branch
from

Conversation

hez2010
Copy link
Contributor

@hez2010 hez2010 commented Jul 15, 2024

Positive case:

var chs = new char[42];
chs[1] = 'a';
Console.WriteLine((int)chs[1] + chs.Length);

Codegen:

; Assembly listing for method ArrayAllocator.Program:Main() (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact <short[]>
;  V01 OutArgs      [V01    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct (104) zero-ref    do-not-enreg[SF] "stack allocated array temp"
;* V03 tmp2         [V03    ] (  0,  0   )    long  ->  zero-ref    single-def "V02.[000..008)"
;* V04 tmp3         [V04    ] (  0,  0   )     int  ->  zero-ref    single-def "V02.[008..012)"
;* V05 tmp4         [V05    ] (  0,  0   )   short  ->  zero-ref    "V02.[018..020)"
;
; Lcl frame size = 40

G_M25548_IG01:  ;; offset=0x0000
       sub      rsp, 40
                                                ;; size=4 bbWeight=1 PerfScore 0.25
G_M25548_IG02:  ;; offset=0x0004
       mov      ecx, 84
       call     [System.Console:WriteLine(int)]
       nop
                                                ;; size=12 bbWeight=1 PerfScore 3.50
G_M25548_IG03:  ;; offset=0x0010
       add      rsp, 40
       ret
                                                ;; size=5 bbWeight=1 PerfScore 1.25

; Total bytes of code 21, prolog size 4, PerfScore 5.00, instruction count 6, allocated bytes for code 21 (MethodHash=5b0b9c33) for method ArrayAllocator.Program:Main() (FullOpts)

Negative case:

var chs = new char[42];
chs[1] = 'a';
Console.WriteLine((int)chs[42] + chs.Length);

Codegen:

; Assembly listing for method ArrayAllocator.Program:Main() (FullOpts)
; Emitting BLENDED_CODE for X64 with AVX - Windows
; FullOpts code
; optimized code
; rsp based frame
; partially interruptible
; No PGO data
; Final local variable assignments
;
;* V00 loc0         [V00    ] (  0,  0   )    long  ->  zero-ref    class-hnd exact <short[]>
;  V01 OutArgs      [V01    ] (  1,  1   )  struct (32) [rsp+0x00]  do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1         [V02    ] (  0,  0   )  struct (104) zero-ref    do-not-enreg[SF] "stack allocated array temp"
;  V03 tmp2         [V03,T00] (  1,  0   )   byref  ->  rbx         must-init "dummy temp of must thrown exception"
;* V04 tmp3         [V04    ] (  0,  0   )    long  ->  zero-ref    single-def "V02.[000..008)"
;* V05 tmp4         [V05    ] (  0,  0   )     int  ->  zero-ref    single-def "V02.[008..012)"
;* V06 tmp5         [V06    ] (  0,  0   )   short  ->  zero-ref    single-def "V02.[018..020)"
;
; Lcl frame size = 32

G_M25548_IG01:  ;; offset=0x0000
       push     rbx
       sub      rsp, 32
       xor      ebx, ebx
                                                ;; size=7 bbWeight=0 PerfScore 0.00
G_M25548_IG02:  ;; offset=0x0007
       call     CORINFO_HELP_RNGCHKFAIL
       movsx    rcx, word  ptr [rbx]
       call     [System.Console:WriteLine(int)]
       int3
                                                ;; size=16 bbWeight=0 PerfScore 0.00

; Total bytes of code 23, prolog size 5, PerfScore 0.00, instruction count 7, allocated bytes for code 23 (MethodHash=5b0b9c33) for method ArrayAllocator.Program:Main() (FullOpts)
; ============================================================

Benchmark on Mandelbrot:

Method Job Mean Error StdDev Code Size Allocated
MandelBrot NoStackAllocationArray 199.7 us 1.30 us 1.22 us 1,996 B 2.49 KB
MandelBrot StackAllocationArray 195.8 us 1.16 us 1.08 us 2,414 B 1.14 KB

Diff: https://www.diffchecker.com/bNP4qHdF/

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 15, 2024
@dotnet-policy-service dotnet-policy-service bot added the community-contribution Indicates that the PR has been added by a community member label Jul 15, 2024
Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For arrays (and also perhaps boxes and ref classes) we ought to have some kind of size limit... possibly similar to the one we use for stackallocs.

We need to be careful we don't allocate a lot of stack for an object that might not be heavily used, as we'll pay per-call prolog zeroing costs.

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 7, 2024

@AndyAyersMS Now all tests are green, and this is ready for merge, please take another look.
@MihuBot

@AndyAyersMS
Copy link
Member

I have some other changes to escape analysis which are going to conflict, so my plan is to merge those first and then pick this (or something like it) up later. Not sure how long that will take, hopefully not too long.

In the meantime, could you check if your changes to gtFoldExpr and morph resolve #107542, and if so, split those off separately?

Also if you want to peel off the change to always use a temp for newarr we could take that in advance too; it would be nice to see it go in as a zero diff prerequisite.

@AndyAyersMS
Copy link
Member

@hez2010 can you resolve conflicts? The work I was doing was held up so maybe we can work on this and get it in first.

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 14, 2024

@MihuBot

@AndyAyersMS
Copy link
Member

Hopefully #110787 unblocks this.

@AndyAyersMS
Copy link
Member

@hez2010 given the small number of diffs from MihuBot, it would be good to understand what changes might be needed elsewhere to make this more effective.

I'm guessing the main blocker is lack of inlining, but a quantitative analysis might reveal other things.

@AndyAyersMS
Copy link
Member

Some interesting diffs from SPMI

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 19, 2024

One of the regression coming from Array.ForEach(new int[1], null) here:

-       sub      rsp, 40
+       sub      rsp, 56
 						;; size=4 bbWeight=0 PerfScore 0.00
 G_M52314_IG02:        ; bbWeight=0, gcrefRegs=0000 {}, byrefRegs=0000 {}, byref
+       vxorps   xmm0, xmm0, xmm0
+       vmovdqu  xmmword ptr [rsp+0x20], xmm0
+       vmovdqu  xmmword ptr [rsp+0x24], xmm0
+       mov      rcx, 0xD1FFAB1E      ; int[]
+       mov      qword ptr [rsp+0x20], rcx
+       mov      dword ptr [rsp+0x28], 1
        mov      ecx, 28
        call     [System.ThrowHelper:ThrowArgumentNullException(int)]
        ; gcr arg pop 0
        int3     

The loop was originally unrolled but now it's no longer doing that. Seems that we need to propagate the assertion into loops so that the bound can be replaced by a constant, which is #110501

@AndyAyersMS
Copy link
Member

I will dig into some of these tomorrow. Need to look closely at the dumps.

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 23, 2024

@AndyAyersMS BTW we can mark Array.Copy and SpanHelper.Memmove as non-escaping to see if it can give us more opportunities.

@AndyAyersMS
Copy link
Member

@AndyAyersMS BTW we can mark Array.Copy and SpanHelper.Memmove as non-escaping to see if it can give us more opportunities.

If we start passing stack allocated ref classes to callees we also have to fix the GC reporting for those callee arguments to be managed (not object) pointers (and transitively, fix reporting for any place those arguments can propagate, including possibly in the native parts of the runtime). So there is (perhaps considerable) extra work.

@hez2010
Copy link
Contributor Author

hez2010 commented Dec 24, 2024

@EgorBo and me discussed on discord that we can probe the size argument using value probing, so that for unknown sized arrays we can do "guarded stack allocation" in the future.

It would effectively replace

Span<int> arr = new int[size];

with

Span<int> arr;
// dummy code below
if (size < 16)
{
    arr = stackalloc int[16];
    arr.Length = size;
}
else
{
    arr = new int[size];
}

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants