Infinite loop in stacklevelsetter #45580

CarolEidt · 2020-12-04T04:07:19Z

Running superpmi with a release clrjit runs into an infinite loop in stacklevelsetter due to bad IR. I’ve traced this back to the importer, when it generates two statements that reference the same CATCH_ARG node. In checked it runs into “Verification failure: F:\git5\runtime\src\coreclr\src\jit\importer.cpp:11980 : stack must be 1 on end of filter, while compiling App.Foo:VargFuncWrapper():App.Foo:this opcode endfilter, IL offset 13”, but doesn't run into the infinite loop. I haven’t determined why the behavior is different between checked and release.

This happens with \\jitrollingbuild\SuperPMI\Windows_NT.x64\tests.pmi.Windows_NT.x64.Release.mch from 10/14/2020, with method context 462164 (superpmi.exe -c 462164 clrjit.dll \SuperPmi\tests.pmi.Windows_NT.x64.Release.mch) from artifacts\bin\coreclr\windows.x64.Checked. (Adjusting as needed for where you've got the mch file.)

The text was updated successfully, but these errors were encountered:

JulieLeeMSFT · 2020-12-06T23:01:34Z

@BruceForstall for loop related issue.

AndyAyersMS · 2020-12-06T23:29:33Z

This is most likely an importer issue.

BruceForstall · 2021-03-19T04:51:13Z

This is the code that is failing:

runtime/src/tests/baseservices/varargs/varargsupport.il

Line 38 in 79ae74f

    
           .method family newslot virtual instance class App.Foo VargFuncWrapper() il managed

BruceForstall · 2021-03-19T05:03:00Z

The two tests here are disabled in issues.targets for Windows x86/x64/arm64.

BruceForstall · 2021-03-19T05:29:34Z

Related: #42673

And the code that invokes it:

#if !defined(OSX_ARM64_ABI)
    // Set stack levels; this information is necessary for x86
    // but on other platforms it is used only in asserts.
    // TODO: do not run it in release on other platforms, see https://github.com/dotnet/runtime/issues/42673.
    StackLevelSetter stackLevelSetter(this);
    stackLevelSetter.Run();
#endif // !OSX_ARM64_ABI

BruceForstall · 2021-03-19T06:48:36Z

We get into an infinite loop because the gtPrev of the first LIR node in BB06 should be nullptr, but is actually a STORE_LCL_VAR:

------------ BB06 [011..014) (throw), preds={} succs={}
// prev points to catch arg below [000004] -- loop
// extra: STORE_LCL_VAR ref 
               [000081] ------------                 IL_OFFSET void   IL offset: 0x11
N001 (  1,  1) [000004] -----O------         t4 =    CATCH_ARG ref
                                                  /--*  t4     ref
N003 (  5,  4) [000039] DA---O------              *  STORE_LCL_VAR ref    V08 tmp6
N002 (  1,  1) [000041] ------------        t41 =    CNS_INT   int    17
                                                  /--*  t41    int
               [000093] ------------        t93 = *  PUTARG_REG int    REG rcx
                                                  /--*  t93    int    arg0 in rcx
N003 ( 15,  7) [000042] --CXG-------              *  CALL help void   HELPER.CORINFO_HELP_VERIFICATION

The corruption is happening in the Rationalizer.

[Edit] Actually, it looks like the corruption happens in the trees before the rationalizer, where the same GT_CATCH_ARG gets in the trees twice.


◢ | bbStmtList | 0x0000014e19f20b90 {m_rootNode=0x0000014e19f20b40 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') ...} ...} | Statement *
  | ▶ m_rootNode | 0x0000014e19f20b40 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') gtCSEnum=0x00 '\0' ...} | GenTree *
  | ▶ m_treeList | 0x0000014e19f1fcd8 {gtOper=GT_CATCH_ARG (0x07 '\a') gtType=TYP_REF (0x0d '\r') gtCSEnum=0x00 '\0' ...} | GenTree *
  | ◢ m_next | 0x0000014e19f20d50 {m_rootNode=0x0000014e19f20d00 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') ...} ...} | Statement *
    | ▶ m_rootNode | 0x0000014e19f20d00 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') gtCSEnum=0x00 '\0' ...} | GenTree *
    | ▶ m_treeList | 0x0000014e19f1fcd8 {gtOper=GT_CATCH_ARG (0x07 '\a') gtType=TYP_REF (0x0d '\r') gtCSEnum=0x00 '\0' ...} | GenTree *
    | ◢ m_next | 0x0000014e19f20ed8 {m_rootNode=0x0000014e19f20e30 {gtOper=GT_CALL (0x62 'b') gtType=TYP_VOID (0x01 '\x1') ...} ...} | Statement *
      | ▶ m_rootNode | 0x0000014e19f20e30 {gtOper=GT_CALL (0x62 'b') gtType=TYP_VOID (0x01 '\x1') gtCSEnum=0x00 '\0' ...} | GenTree *
      | ▶ m_treeList | 0x0000014e19f21e20 {gtOper=GT_ARGPLACE (0x71 'q') gtType=TYP_INT (0x07 '\a') gtCSEnum=0x00 '\0' ...} | GenTree *
      | ▶ m_next | 0x0000000000000000 <NULL> | Statement *
      | ▶ m_prev | 0x0000014e19f20d50 {m_rootNode=0x0000014e19f20d00 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') ...} ...} | Statement *

It should look like:

------------ BB06 [011..014) (throw), preds={} succs={}

***** BB06
STMT00011 (IL 0x011...  ???)
N003 (  5,  4) [000039] -A---O--R---              *  ASG       ref
N002 (  3,  2) [000038] D------N----              +--*  LCL_VAR   ref    V08 tmp6
N001 (  1,  1) [000004] -----O------              \--*  CATCH_ARG ref

***** BB06
STMT00012 (IL   ???...  ???)
N003 ( 15,  7) [000042] --CXG-------              *  CALL help void   HELPER.CORINFO_HELP_VERIFICATION
N002 (  1,  1) [000041] ------------ arg0 in rcx  \--*  CNS_INT   int    17

BruceForstall · 2021-03-19T15:51:16Z

The corruption was created due to the following code in verConvertBBToThrowVerificationException:

#ifdef DEBUG
    // we need this since BeginTreeList asserts otherwise
    impStmtList = impLastStmt = nullptr;
    block->bbFlags &= ~BBF_IMPORTED;

this was only under DEBUG, but is needed for Release as well.

In this case, we had created an ASG(LCL_VAR, CATCH_ARG) in a filter before raising a verification exception (the stack was level 2 when reaching an endfilter). In release, we didn't clear the statement list, and the CATCH_ARG was still on the stack. We spilled side effects, and impSpillSpecialSideEff added another ASG(LCL_VAR, CATCH_ARG), using a new ASG and LCL_VAR, but re-using the same CATCH_ARG object. Much later, in Lower, we convert to LIR form and the gtNext/gtPrev links become primary, and because we re-used a CATCH_ARG node, we ended up with a loop. StackLevelSetter is the first phase that iterates over the gtPrev links, so saw the corruption by an infinite loop.

The problem manifested as an infinite loop during the StackLevelSetter phase in the release build SuperPMI replay of the tests, but also occurs as a normal release build test run of the varargsupport.il test. The issue is we had corrupt LIR gtPrev links, with a cycle. The problem had nothing to do with StackLevelSetter -- it just happened to be the first phase that iterated in reverse over the gtPrev links. The corruption was introduced in the importer, in `verConvertBBToThrowVerificationException`. It required a verification failure in a filter (possibly also catch) clause where the JIT would throw away the currently imported code and convert the block to a call to the verification failure helper. This was a classic case of important, functional code being under `#ifdef DEBUG` that is needed in non-DEBUG as well. The result was we would end up adding an `ASG(LCL_VAR, CATCH_ARG)` to the statement list twice, with the same `CATCH_ARG` node. Fixes dotnet#45580

The problem manifested as an infinite loop during the StackLevelSetter phase in the release build SuperPMI replay of the tests, but also occurs as a normal release build test run of the varargsupport.il test. The issue is we had corrupt LIR gtPrev links, with a cycle. The problem had nothing to do with StackLevelSetter -- it just happened to be the first phase that iterated in reverse over the gtPrev links. The corruption was introduced in the importer, in `verConvertBBToThrowVerificationException`. It required a verification failure in a filter (possibly also catch) clause where the JIT would throw away the currently imported code and convert the block to a call to the verification failure helper. This was a classic case of important, functional code being under `#ifdef DEBUG` that is needed in non-DEBUG as well. The result was we would end up adding an `ASG(LCL_VAR, CATCH_ARG)` to the statement list twice, with the same `CATCH_ARG` node. Fixes #45580

CarolEidt added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 4, 2020

Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 4, 2020

JulieLeeMSFT assigned BruceForstall Dec 6, 2020

JulieLeeMSFT added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Dec 6, 2020

BruceForstall mentioned this issue Mar 20, 2021

Fix release build infinite loop #49943

Merged

ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 20, 2021

BruceForstall closed this as completed in #49943 Mar 21, 2021

ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 21, 2021

JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 22, 2021

JulieLeeMSFT added this to the 6.0.0 milestone Mar 22, 2021

ghost locked as resolved and limited conversation to collaborators Apr 21, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Infinite loop in stacklevelsetter #45580

Infinite loop in stacklevelsetter #45580

CarolEidt commented Dec 4, 2020

JulieLeeMSFT commented Dec 6, 2020

AndyAyersMS commented Dec 6, 2020

BruceForstall commented Mar 19, 2021

BruceForstall commented Mar 19, 2021 •

edited

Loading

BruceForstall commented Mar 19, 2021

BruceForstall commented Mar 19, 2021 •

edited

Loading

BruceForstall commented Mar 19, 2021

Infinite loop in stacklevelsetter #45580

Infinite loop in stacklevelsetter #45580

Comments

CarolEidt commented Dec 4, 2020

JulieLeeMSFT commented Dec 6, 2020

AndyAyersMS commented Dec 6, 2020

BruceForstall commented Mar 19, 2021

BruceForstall commented Mar 19, 2021 • edited Loading

BruceForstall commented Mar 19, 2021

BruceForstall commented Mar 19, 2021 • edited Loading

BruceForstall commented Mar 19, 2021

BruceForstall commented Mar 19, 2021 •

edited

Loading

BruceForstall commented Mar 19, 2021 •

edited

Loading