-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[JIT] ARM64 - Temporary fix for ldp
/stp
optimizations
#90534
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch Issue DetailsResolves #85765 With the latest, the code-gen is quite different from what was reported in the issue. Current code-gen: ; Assembly listing for method Program:Main() (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; 0 inlinees with PGO data; 2 single block inlinees; 0 inlinees without PGO data
; invoked as altjit
; Final local variable assignments
;
;* V00 loc0 [V00 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op <S1>
;# V01 OutArgs [V01 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
;* V02 tmp1 [V02 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op "Inline ldloca(s) first use temp" <S1>
;* V03 tmp2 [V03 ] ( 0, 0 ) struct ( 8) zero-ref ld-addr-op "Inline ldloca(s) first use temp" <S0>
;* V04 tmp3 [V04 ] ( 0, 0 ) ubyte -> zero-ref "field V00.F0 (fldOffset=0x0)" P-INDEP
;* V05 tmp4 [V05 ] ( 0, 0 ) bool -> zero-ref single-def "field V00.F1 (fldOffset=0x1)" P-INDEP
;* V06 tmp5 [V06 ] ( 0, 0 ) bool -> zero-ref "field V00.F2 (fldOffset=0x2)" P-INDEP
;* V07 tmp6 [V07 ] ( 0, 0 ) ubyte -> zero-ref single-def "field V02.F0 (fldOffset=0x0)" P-INDEP
;* V08 tmp7 [V08 ] ( 0, 0 ) bool -> zero-ref single-def "field V02.F1 (fldOffset=0x1)" P-INDEP
;* V09 tmp8 [V09 ] ( 0, 0 ) bool -> zero-ref single-def "field V02.F2 (fldOffset=0x2)" P-INDEP
;
; Lcl frame size = 0
G_M27646_IG01:
stp fp, lr, [sp, #-0x10]!
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M27646_IG02:
mov w0, wzr
movz x1, #0xD1FFAB1E // code for System.Console:WriteLine(bool)
movk x1, #0xD1FFAB1E LSL #16
movk x1, #0xD1FFAB1E LSL #32
ldr x1, [x1]
blr x1
;; size=24 bbWeight=1 PerfScore 6.00
G_M27646_IG03:
ldp fp, lr, [sp], #0x10
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 40, prolog size 8, PerfScore 13.50, instruction count 10, allocated bytes for code 40 (MethodHash=cb019401) for method Program:Main() (FullOpts)
|
@jakobbotsch What did you do to determine which PR could repo this? Did you just do a bisect? |
Yes. I keep Core_Roots compiled for all JIT commits so that I can quickly do that. |
That's a lot of Core_Roots :) |
I checked out e62cb64 and did a fresh/clean checked build and the codegen is the same. Maybe I should try to go back further. |
What if you mark |
Marking it as NoInline I was able to reproduce it, but only on that commit. Latest code-gen is: ; Assembly listing for method Program:Main() (FullOpts)
; Emitting BLENDED_CODE for generic ARM64 - Windows
; FullOpts code
; optimized code
; fp based frame
; partially interruptible
; No PGO data
; invoked as altjit
; Final local variable assignments
;
; V00 loc0 [V00 ] ( 5, 5 ) struct ( 8) [fp+0x18] do-not-enreg[SB] ld-addr-op <S1>
;# V01 OutArgs [V01 ] ( 1, 1 ) struct ( 0) [sp+0x00] do-not-enreg[XS] addr-exposed "OutgoingArgSpace"
; V02 tmp1 [V02,T03] ( 1, 1 ) ubyte -> [fp+0x18] do-not-enreg[] "field V00.F0 (fldOffset=0x0)" P-DEP
; V03 tmp2 [V03,T02] ( 2, 2 ) bool -> [fp+0x19] do-not-enreg[] "field V00.F1 (fldOffset=0x1)" P-DEP
; V04 tmp3 [V04,T00] ( 4, 4 ) bool -> [fp+0x1A] do-not-enreg[] single-def "field V00.F2 (fldOffset=0x2)" P-DEP
; V05 rat0 [V05,T01] ( 2, 4 ) struct ( 8) [fp+0x10] do-not-enreg[SF] "Return value temp for an odd struct return size" <S1>
;
; Lcl frame size = 16
G_M27646_IG01: ;; offset=0x0000
stp fp, lr, [sp, #-0x20]!
mov fp, sp
;; size=8 bbWeight=1 PerfScore 1.50
G_M27646_IG02: ;; offset=0x0008
movz x0, #0xC408 // code for Program:M4():S1
movk x0, #0x5CDC LSL #16
movk x0, #0x7FFD LSL #32
ldr x0, [x0]
blr x0
str w0, [fp, #0x10] // [V05 rat0]
ldrh w0, [fp, #0x10]
strh w0, [fp, #0x18]
ldrb w0, [fp, #0x12]
strb w0, [fp, #0x1A]
ldrb w0, [fp, #0x1A] // [V04 tmp3]
ldrb w1, [fp, #0x19] // [V03 tmp2]
orr w0, w0, w1
strb w0, [fp, #0x1A] // [V04 tmp3]
ldrb w0, [fp, #0x1A] // [V04 tmp3]
movz x1, #0x4CD8 // code for System.Console:WriteLine(bool)
movk x1, #0x5CFF LSL #16
movk x1, #0x7FFD LSL #32
ldr x1, [x1]
blr x1
;; size=80 bbWeight=1 PerfScore 25.50
G_M27646_IG03: ;; offset=0x0058
ldp fp, lr, [sp], #0x20
ret lr
;; size=8 bbWeight=1 PerfScore 2.00
; Total bytes of code 96, prolog size 8, PerfScore 38.60, instruction count 24, allocated bytes for code 96 (MethodHash=cb019401) for method Program:Main() (FullOpts)
; ============================================================ |
This is in the backend, so we should fix it even if it no longer repros with this specific example. #86491 is a change in morph so it did not fix the backend bug. |
Looks like it is, but all I can do is try to fix it based on this commit and hope that it is correct considering there are no other examples that reproduce it in the latest. |
Fixing the backend bug on a commit that is a few months older is just fine. You should be able to test your fix on that commit. I did the same in #90246. |
It doesn't leave me feeling confident knowing that this cannot be reproduced in latest. |
How would you know that the problem cannot be reproduced in |
Here is an example that reproduces the problem on main: using System;
using System.Runtime.CompilerServices;
public unsafe class Program
{
public static void Main()
{
byte* bytes = stackalloc byte[1024];
bytes[0x1A] = 1;
bytes[0x1B] = 2;
int sum = Foo(bytes);
Console.WriteLine(sum);
}
[MethodImpl(MethodImplOptions.NoInlining)]
public static int Foo(byte* b)
{
return Unsafe.ReadUnaligned<int>(ref b[0x1A]) + Unsafe.ReadUnaligned<int>(ref b[0x1B]);
}
} Expected: 515 |
That's the tricky part for this problem. I have no idea what I'm looking at as it's new to me. |
That new example, interestingly, it outputs 515 in e62cb64 . |
…h the optimized ldr/str pair
@jakobbotsch I made a quick fix, but I put in a "TODO". The problem is a little complicated, but the issue is that 'imm' and/or 'prevImm' are assumed to be "scaled" when attempting to do a |
@dotnet/jit-contrib @BruceForstall @jakobbotsch this is ready, pending CI. |
ldp
/stp
optimizations
src/tests/JIT/Regression/JitBlue/Runtime_85765/Runtime_85765.cs
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
We should back-port this to .NET 8.
In the future, I'd like to see the "peephole optimization" code deal in "actual" values, not "encoded" values.
/backport to release/8.0 |
Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/5881464622 |
@TIHan an error occurred while backporting to release/8.0, please check the run log for details! Error: @TIHan is not a repo collaborator, backporting is not allowed. If you're a collaborator please make sure your dotnet team membership visibility is set to Public on https://github.com/orgs/dotnet/people?query=TIHan |
/backport to release/8.0 |
Started backporting to release/8.0: https://github.com/dotnet/runtime/actions/runs/5882707392 |
@TIHan The test added here doesn't build -- seems like CI was red when this PR was merged. |
@TIHan the test added doesn't build: |
Ok, will make a quick PR to fix this. |
I am going to revert this. This can be introducing number of other problems since the CI was all read when this was merged. |
This is the same PR as this but with the test fix. My fault for not checking. |
Resolves #85765
With the latest, the code-gen is quite different from what was reported in the issue, and therefore doesn't reproduce. But the issue still exists and is able to be reproduced by a different sample: