Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Arm64] tailcall bad codegen #36911

Closed
BruceForstall opened this issue May 23, 2020 · 6 comments · Fixed by #37192
Closed

[Arm64] tailcall bad codegen #36911

BruceForstall opened this issue May 23, 2020 · 6 comments · Fixed by #37192
Assignees
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@BruceForstall
Copy link
Member

Failure in libraries test System.Private.Uri.Tests:

       public void Ctor_String_String_Int_String_String(string schemeName, string hostName, int port, string pathValue, string extraValue, string expectedScheme, string expectedHost, string expectedPath, string expecte
Query, string expectedFragment)
       {
           var uriBuilder = new UriBuilder(schemeName, hostName, port, pathValue, extraValue);
           VerifyUriBuilder(uriBuilder, expectedScheme, "", "", expectedHost, port, expectedPath, expectedQuery, expectedFragment);
       }

e.g.:

c:\bugs\System.Private.Uri.Functional.Tests\net5.0-Release>"C:\bugs\testhost\dotnet.exe" exec --runtimeconfig System.Private.Uri.Functional.Tests.runtimeconfig.json --depsfile System.Private.Uri.Functional.Tests.deps.json xunit.console.dll System.Private.Uri.Functional.Tests.dll -xml testResults.xml -nologo -nocolor -notrait category=IgnoreForCI -notrait category=OuterLoop -notrait category=failing -parallel none -maxthreads 1 -method "*.Ctor_String_String_Int_String_String"
Discovering: System.Private.Uri.Functional.Tests (method display = ClassAndMethod, method display options = None)
Discovered:  System.Private.Uri.Functional.Tests (found 1 of 382 test case)
Starting:    System.Private.Uri.Functional.Tests (parallel test collections = off, max threads = 1)
System.PrivateUri.Tests.UriBuilderTests.Ctor_String_String_Int_String_String(schemeName: "http", hostName: "host", port: 0, pathValue: "/path", extraValue: "?query#fragment", expectedScheme: "http", expectedHost: "host", expectedPath: "/path", expectedQuery: "?query", expectedFragment: "#fragment") [FAIL]
Assert.Equal() Failure
� (pos 0)
Expected: #fragment
Actual:   /path
� (pos 0)
Stack Trace:
C:\gh\runtime\src\libraries\System.Private.Uri\tests\FunctionalTests\UriBuilderTests.cs(391,0): at System.PrivateUri.Tests.UriBuilderTests.VerifyUriBuilder(UriBuilder uriBuilder, String scheme, String userName, String password, String host, Int32 port, String path, String query, String fragment)
System.PrivateUri.Tests.UriBuilderTests.Ctor_String_String_Int_String_String(schemeName: "HTTP", hostName: "host", port: 20, pathValue: "/path1/path2", extraValue: "?query&query2=value#fragment", expectedScheme: "http", expectedHost: "host", expectedPath: "/path1/path2", expectedQuery: "?query&query2=value", expectedFragment: "#fragment") [FAIL]

JIT generates a tailcall to VerifyUriBuilder. It needs to copy 3rd stack argument arg10 to be the 1st stack argument of the tailcall. It moves the argument on the stack, trashing the stack argument arg8 before it is loaded into a register for the tailcall.

G_M8925_IG03:        ; offs=000064H, size=0034H, bbWeight=1    PerfScore 14.00, nogc, extend

IN000d: 000064                    ldr     x20, [fp,#96] // [V10 arg10]
IN000e: 000068                    str     x20, [fp,#80] // [V08 arg8]
IN000f: 00006C                    movz    x2, #0x3060
IN0010: 000070                    movk    x2, #0xd0a0 LSL #16
IN0011: 000074                    movk    x2, #487 LSL #32
IN0012: 000078                    ldr     x3, [x2]
IN0013: 00007C                    mov     x2, x3
IN0014: 000080                    mov     x0, x26
IN0015: 000084                    mov     x1, x24
IN0016: 000088                    mov     x4, x25
IN0017: 00008C                    mov     w5, w19
IN0018: 000090                    ldr     x6, [fp,#80]  // [V08 arg8]
*************** reading the already trashed expectedPath!
IN0019: 000094                    ldr     x7, [fp,#88]  // [V09 arg9]

G_M8925_IG04:        ; offs=000098H, size=0018H, bbWeight=1    PerfScore 6.00, epilog, nogc, extend

IN0027: 000098                    ldp     x25, x26, [sp,#64]
IN0028: 00009C                    ldp     x23, x24, [sp,#48]
IN0029: 0000A0                    ldp     x21, x22, [sp,#32]
IN002a: 0000A4                    ldp     x19, x20, [sp,#16]
IN002b: 0000A8                    ldp     fp, lr, [sp],#80
IN002c: 0000AC                    b       System.PrivateUri.Tests.UriBuilderTests:VerifyUriBuilder(System.UriBuilder,System.String,System.String,System.String,System.String,int,System.String,System.String,System.String)
@BruceForstall BruceForstall added arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI labels May 23, 2020
@BruceForstall BruceForstall added this to the 5.0 milestone May 23, 2020
@BruceForstall BruceForstall self-assigned this May 23, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label May 23, 2020
@BruceForstall BruceForstall removed the untriaged New issue has not been triaged by the area owner label May 23, 2020
@BruceForstall
Copy link
Member Author

Here is a standalone repro:

using System;
using System.Runtime.CompilerServices;

namespace TestNamespace
{
    public class TestClass
    {
        public static int result;
        const int Pass = 100;
        const int Fail = -1;

        public static void AssertEqual(string a, string b)
        {
            if (a != b)
            {
                Console.WriteLine("Fail: {0} != {1}", a, b);
                result = Fail;
            }
        }

        public static void AssertEqual(int a, int b)
        {
            if (a != b)
            {
                Console.WriteLine("Fail: {0} != {1}", a, b);
                result = Fail;
            }
        }

        public void Ctor_String_String_Int_String_String(string schemeName, string hostName, int port, string pathValue, string extraValue, string expectedScheme, string expectedHost, string expectedPath, string expectedQuery, string expectedFragment)
        {
            var uriBuilder = new UriBuilder(schemeName, hostName, port, pathValue, extraValue);
            VerifyUriBuilder(uriBuilder, expectedScheme, "", "", expectedHost, port, expectedPath, expectedQuery, expectedFragment);
        }

        private static void VerifyUriBuilder(UriBuilder uriBuilder, string scheme, string userName, string password, string host, int port, string path, string query, string fragment)
        {
            AssertEqual(scheme, uriBuilder.Scheme);
            AssertEqual(userName, uriBuilder.UserName);
            AssertEqual(password, uriBuilder.Password);
            AssertEqual(host, uriBuilder.Host);
            AssertEqual(port, uriBuilder.Port);
            AssertEqual(path, uriBuilder.Path);
            AssertEqual(query, uriBuilder.Query);
            AssertEqual(fragment, uriBuilder.Fragment);
        }

        public static int Main()
        {
            result = Pass;

            TestClass tc = new TestClass();
            tc.Ctor_String_String_Int_String_String("http", "host", 0, "/path", "?query#fragment", "http", "host", "/path", "?query", "#fragment");

            if (result == Pass)
            {
                Console.WriteLine("Pass");
            }
            else
            {
                Console.WriteLine("Fail");
            }
            return result;
        }
    }
}

You need to run it with

set COMPlus_TieredCompilation=0
set COMPlus_JitNoInline=1

JitNoInline is needed because the inliner gets in the way of tailcall.

@BruceForstall
Copy link
Member Author

We have the following IR:

N041 (???,???) [000064] ------------                 START_NONGC void   REG NA
N043 (  3,  2) [000019] -----------z        t19 =    LCL_VAR   ref    V10 arg10        u:1 x20 (last use) REG x20 $88
                                                  /--*  t19    ref
N045 (???,???) [000055] ------------              *  PUTARG_STK [+0x00] void   (1 slots) REG NA
N047 (  1,  1) [000029] ------------        t29 =    CNS_INT(h) long   0xECAF3060 "" REG x2 $141
                                                  /--*  t29    long
N049 (  4,  3) [000030] n---G-------        t30 = *  IND       ref    REG x3 <l:$240, c:$18f>
                                                  /--*  t30    ref
N051 (  8,  6) [000042] DA--G-------              *  STORE_LCL_VAR ref    V13 cse0         d:1 x3 REG x3
N053 (  3,  2) [000043] ------------        t43 =    LCL_VAR   ref    V13 cse0         u:1 x3 REG x3 <l:$240, c:$18f>
                                                  /--*  t43    ref
N055 (???,???) [000056] ------------        t56 = *  PUTARG_REG ref    REG x2
N057 (  3,  2) [000045] ------------        t45 =    LCL_VAR   ref    V13 cse0         u:1 x3 (last use) REG x3 <l:$240, c:$18f>
                                                  /--*  t45    ref
N059 (???,???) [000057] ------------        t57 = *  PUTARG_REG ref    REG x3
N061 (  1,  1) [000011] ------------        t11 =    LCL_VAR   ref    V12 tmp1         u:2 x26 (last use) REG x26 $1c0
                                                  /--*  t11    ref
N063 (???,???) [000058] ------------        t58 = *  PUTARG_REG ref    REG x0
N065 (  1,  1) [000012] ------------        t12 =    LCL_VAR   ref    V06 arg6         u:1 x24 (last use) REG x24 $84
                                                  /--*  t12    ref
N067 (???,???) [000059] ------------        t59 = *  PUTARG_REG ref    REG x1
N069 (  1,  1) [000015] ------------        t15 =    LCL_VAR   ref    V07 arg7         u:1 x25 (last use) REG x25 $85
                                                  /--*  t15    ref
N071 (???,???) [000060] ------------        t60 = *  PUTARG_REG ref    REG x4
N073 (  1,  1) [000016] ------------        t16 =    LCL_VAR   int    V03 arg3         u:1 x19 (last use) REG x19 $c0
                                                  /--*  t16    int
N075 (???,???) [000061] ------------        t61 = *  PUTARG_REG int    REG x5
N077 (  3,  2) [000017] -----------z        t17 =    LCL_VAR   ref    V08 arg8         u:1 x6 (last use) REG x6 $86
                                                  /--*  t17    ref
N079 (???,???) [000062] ------------        t62 = *  PUTARG_REG ref    REG x6
N081 (  3,  2) [000018] -----------z        t18 =    LCL_VAR   ref    V09 arg9         u:1 x7 (last use) REG x7 $87
                                                  /--*  t18    ref
N083 (???,???) [000063] ------------        t63 = *  PUTARG_REG ref    REG x7
                                                  /--*  t56    ref    arg2 in x2
                                                  +--*  t57    ref    arg3 in x3
                                                  +--*  t58    ref    arg0 in x0
                                                  +--*  t59    ref    arg1 in x1
                                                  +--*  t60    ref    arg4 in x4
                                                  +--*  t61    int    arg5 in x5
                                                  +--*  t62    ref    arg6 in x6
                                                  +--*  t63    ref    arg7 in x7
N085 ( 44, 31) [000020] --CXG-------              *  CALL      void   TestNamespace.TestClass.VerifyUriBuilder REG NA $VN.Void


; Final local variable assignments
;
;* V00 this         [V00    ] (  0,  0   )     ref  ->  zero-ref    this class-hnd
;  V01 arg1         [V01,T02] (  3,  3   )     ref  ->  x20         class-hnd
;  V02 arg2         [V02,T03] (  3,  3   )     ref  ->  x21         class-hnd
;  V03 arg3         [V03,T00] (  4,  4   )     int  ->  x19
;  V04 arg4         [V04,T04] (  3,  3   )     ref  ->  x22         class-hnd
;  V05 arg5         [V05,T05] (  3,  3   )     ref  ->  x23         class-hnd
;  V06 arg6         [V06,T06] (  3,  3   )     ref  ->  x24         class-hnd
;  V07 arg7         [V07,T07] (  3,  3   )     ref  ->  x25         class-hnd
;  V08 arg8         [V08,T09] (  1,  1   )     ref  ->  [fp+0x50]   class-hnd
;  V09 arg9         [V09,T10] (  1,  1   )     ref  ->  [fp+0x58]   class-hnd
;  V10 arg10        [V10,T11] (  1,  1   )     ref  ->  [fp+0x60]   class-hnd
;# V11 OutArgs      [V11    ] (  1,  1   )  lclBlk ( 0) [sp+0x00]   "OutgoingArgSpace"
;  V12 tmp1         [V12,T01] (  3,  6   )     ref  ->  x26         class-hnd exact "NewObj constructor temp"
;  V13 cse0         [V13,T08] (  3,  3   )     ref  ->   x3         "CSE - aggressive"

Notice that for the tailcall arguments (the fast tailcall to VerifyUriBuilder), the last three are incoming stack arguments to Ctor_String_String_Int_String_String but are unused in the first call (to UriBuilder). The LCL_VAR uses are marked by the register allocator with GTF_SPILLED, so they remain in their incoming stack location until they are used, at which point they are loaded from their incoming stack location to a register and placed in their outgoing argument location for the tailcall. However, when they are used is during setup of the tailcall outgoing arguments, and the single outgoing stack argument is the same location as the incoming arg8. So, incoming arg8 gets trashed when the outgoing stack argument is established, then the unspill from arg8 loads the trashed value.

So, what is the proper way to prevent this? Should LSRA avoid creating unspills of stack arguments in the argument list of outgoing fast tailcalls? (Or more specifically, unspills of stack arguments within the outgoing argument stack region of the fast tailcall?)

Could a situation like this happen on other platforms?

@CarolEidt @erozenfeld ?
cc @jashook

@BruceForstall
Copy link
Member Author

fwiw, I changed the default for COMPlus_FastTailCalls from 1 to 0, and reran all the tests, hoping to work around this problem (at least temporarily). But that actually made my test runs worse, as every dotnet xunit invocation fails with:

System.ArgumentOutOfRangeException: Index was out of range. Must be non-negative and less than the size of the collection. (Parameter 'index')

Coreclr pri-0 tests all pass (trying pri-1 now). Manually setting this back to 1 in the environment makes the problem go away. This variable is only used in Compiler::fgCanFastTailCall to bail out quickly, so it would seem that we might have bad codegen in some case when not doing fast tailcall.

@BruceForstall
Copy link
Member Author

@erozenfeld Fixed a related bug here: dotnet/coreclr#12200.

@BruceForstall
Copy link
Member Author

Possibly related? #31729

@erozenfeld
Copy link
Member

#37192 fixes the standalone repro.

erozenfeld added a commit to erozenfeld/runtime that referenced this issue May 30, 2020
LowerFastTailCall has code that looks for incoming stack arguments
that may be overwritten by outgoing stack arguments and copies
them to a temp. The logic was incorrect for Windows Arm64 because it was
assuming shadow space, which only exists in Windows x64 abi.

Fixes dotnet#31729.
Fixes dotnet#36911.
jkotas pushed a commit that referenced this issue May 30, 2020
LowerFastTailCall has code that looks for incoming stack arguments
that may be overwritten by outgoing stack arguments and copies
them to a temp. The logic was incorrect for Windows Arm64 because it was
assuming shadow space, which only exists in Windows x64 abi.

Fixes #31729.
Fixes #36911.
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
arch-arm64 area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants