Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Infinite loop in stacklevelsetter #45580

Closed
CarolEidt opened this issue Dec 4, 2020 · 7 comments · Fixed by #49943
Closed

Infinite loop in stacklevelsetter #45580

CarolEidt opened this issue Dec 4, 2020 · 7 comments · Fixed by #49943
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@CarolEidt
Copy link
Contributor

Running superpmi with a release clrjit runs into an infinite loop in stacklevelsetter due to bad IR. I’ve traced this back to the importer, when it generates two statements that reference the same CATCH_ARG node. In checked it runs into “Verification failure: F:\git5\runtime\src\coreclr\src\jit\importer.cpp:11980 : stack must be 1 on end of filter, while compiling App.Foo:VargFuncWrapper():App.Foo:this opcode endfilter, IL offset 13”, but doesn't run into the infinite loop. I haven’t determined why the behavior is different between checked and release.

This happens with \\jitrollingbuild\SuperPMI\Windows_NT.x64\tests.pmi.Windows_NT.x64.Release.mch from 10/14/2020, with method context 462164 (superpmi.exe -c 462164 clrjit.dll \SuperPmi\tests.pmi.Windows_NT.x64.Release.mch) from artifacts\bin\coreclr\windows.x64.Checked. (Adjusting as needed for where you've got the mch file.)

@CarolEidt CarolEidt added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 4, 2020
@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the untriaged New issue has not been triaged by the area owner label Dec 4, 2020
@JulieLeeMSFT
Copy link
Member

@BruceForstall for loop related issue.

@JulieLeeMSFT JulieLeeMSFT added needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration and removed untriaged New issue has not been triaged by the area owner labels Dec 6, 2020
@AndyAyersMS
Copy link
Member

This is most likely an importer issue.

@BruceForstall
Copy link
Member

This is the code that is failing:

.method family newslot virtual instance class App.Foo VargFuncWrapper() il managed

@BruceForstall
Copy link
Member

BruceForstall commented Mar 19, 2021

The two tests here are disabled in issues.targets for Windows x86/x64/arm64.

@BruceForstall
Copy link
Member

Related: #42673

And the code that invokes it:

#if !defined(OSX_ARM64_ABI)
    // Set stack levels; this information is necessary for x86
    // but on other platforms it is used only in asserts.
    // TODO: do not run it in release on other platforms, see https://github.com/dotnet/runtime/issues/42673.
    StackLevelSetter stackLevelSetter(this);
    stackLevelSetter.Run();
#endif // !OSX_ARM64_ABI

@BruceForstall
Copy link
Member

BruceForstall commented Mar 19, 2021

We get into an infinite loop because the gtPrev of the first LIR node in BB06 should be nullptr, but is actually a STORE_LCL_VAR:

------------ BB06 [011..014) (throw), preds={} succs={}
// prev points to catch arg below [000004] -- loop
// extra: STORE_LCL_VAR ref 
               [000081] ------------                 IL_OFFSET void   IL offset: 0x11
N001 (  1,  1) [000004] -----O------         t4 =    CATCH_ARG ref
                                                  /--*  t4     ref
N003 (  5,  4) [000039] DA---O------              *  STORE_LCL_VAR ref    V08 tmp6
N002 (  1,  1) [000041] ------------        t41 =    CNS_INT   int    17
                                                  /--*  t41    int
               [000093] ------------        t93 = *  PUTARG_REG int    REG rcx
                                                  /--*  t93    int    arg0 in rcx
N003 ( 15,  7) [000042] --CXG-------              *  CALL help void   HELPER.CORINFO_HELP_VERIFICATION

The corruption is happening in the Rationalizer.

[Edit] Actually, it looks like the corruption happens in the trees before the rationalizer, where the same GT_CATCH_ARG gets in the trees twice.


◢ | bbStmtList | 0x0000014e19f20b90 {m_rootNode=0x0000014e19f20b40 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') ...} ...} | Statement *
  | ▶ m_rootNode | 0x0000014e19f20b40 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') gtCSEnum=0x00 '\0' ...} | GenTree *
  | ▶ m_treeList | 0x0000014e19f1fcd8 {gtOper=GT_CATCH_ARG (0x07 '\a') gtType=TYP_REF (0x0d '\r') gtCSEnum=0x00 '\0' ...} | GenTree *
  | ◢ m_next | 0x0000014e19f20d50 {m_rootNode=0x0000014e19f20d00 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') ...} ...} | Statement *
    | ▶ m_rootNode | 0x0000014e19f20d00 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') gtCSEnum=0x00 '\0' ...} | GenTree *
    | ▶ m_treeList | 0x0000014e19f1fcd8 {gtOper=GT_CATCH_ARG (0x07 '\a') gtType=TYP_REF (0x0d '\r') gtCSEnum=0x00 '\0' ...} | GenTree *
    | ◢ m_next | 0x0000014e19f20ed8 {m_rootNode=0x0000014e19f20e30 {gtOper=GT_CALL (0x62 'b') gtType=TYP_VOID (0x01 '\x1') ...} ...} | Statement *
      | ▶ m_rootNode | 0x0000014e19f20e30 {gtOper=GT_CALL (0x62 'b') gtType=TYP_VOID (0x01 '\x1') gtCSEnum=0x00 '\0' ...} | GenTree *
      | ▶ m_treeList | 0x0000014e19f21e20 {gtOper=GT_ARGPLACE (0x71 'q') gtType=TYP_INT (0x07 '\a') gtCSEnum=0x00 '\0' ...} | GenTree *
      | ▶ m_next | 0x0000000000000000 <NULL> | Statement *
      | ▶ m_prev | 0x0000014e19f20d50 {m_rootNode=0x0000014e19f20d00 {gtOper=GT_ASG (0x45 'E') gtType=TYP_REF (0x0d '\r') ...} ...} | Statement *

It should look like:

------------ BB06 [011..014) (throw), preds={} succs={}

***** BB06
STMT00011 (IL 0x011...  ???)
N003 (  5,  4) [000039] -A---O--R---              *  ASG       ref
N002 (  3,  2) [000038] D------N----              +--*  LCL_VAR   ref    V08 tmp6
N001 (  1,  1) [000004] -----O------              \--*  CATCH_ARG ref

***** BB06
STMT00012 (IL   ???...  ???)
N003 ( 15,  7) [000042] --CXG-------              *  CALL help void   HELPER.CORINFO_HELP_VERIFICATION
N002 (  1,  1) [000041] ------------ arg0 in rcx  \--*  CNS_INT   int    17

@BruceForstall
Copy link
Member

The corruption was created due to the following code in verConvertBBToThrowVerificationException:

#ifdef DEBUG
    // we need this since BeginTreeList asserts otherwise
    impStmtList = impLastStmt = nullptr;
    block->bbFlags &= ~BBF_IMPORTED;

this was only under DEBUG, but is needed for Release as well.

In this case, we had created an ASG(LCL_VAR, CATCH_ARG) in a filter before raising a verification exception (the stack was level 2 when reaching an endfilter). In release, we didn't clear the statement list, and the CATCH_ARG was still on the stack. We spilled side effects, and impSpillSpecialSideEff added another ASG(LCL_VAR, CATCH_ARG), using a new ASG and LCL_VAR, but re-using the same CATCH_ARG object. Much later, in Lower, we convert to LIR form and the gtNext/gtPrev links become primary, and because we re-used a CATCH_ARG node, we ended up with a loop. StackLevelSetter is the first phase that iterates over the gtPrev links, so saw the corruption by an infinite loop.

BruceForstall added a commit to BruceForstall/runtime that referenced this issue Mar 20, 2021
The problem manifested as an infinite loop during the StackLevelSetter phase
in the release build SuperPMI replay of the tests, but also occurs as a
normal release build test run of the varargsupport.il test.

The issue is we had corrupt LIR gtPrev links, with a cycle. The problem had
nothing to do with StackLevelSetter -- it just happened to be the first phase
that iterated in reverse over the gtPrev links.

The corruption was introduced in the importer, in
`verConvertBBToThrowVerificationException`. It required a verification failure
in a filter (possibly also catch) clause where the JIT would throw away the
currently imported code and convert the block to a call to the verification
failure helper.

This was a classic case of important, functional code being under `#ifdef DEBUG`
that is needed in non-DEBUG as well.

The result was we would end up adding an `ASG(LCL_VAR, CATCH_ARG)` to the statement
list twice, with the same `CATCH_ARG` node.

Fixes dotnet#45580
@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Mar 20, 2021
BruceForstall added a commit that referenced this issue Mar 21, 2021
The problem manifested as an infinite loop during the StackLevelSetter phase
in the release build SuperPMI replay of the tests, but also occurs as a
normal release build test run of the varargsupport.il test.

The issue is we had corrupt LIR gtPrev links, with a cycle. The problem had
nothing to do with StackLevelSetter -- it just happened to be the first phase
that iterated in reverse over the gtPrev links.

The corruption was introduced in the importer, in
`verConvertBBToThrowVerificationException`. It required a verification failure
in a filter (possibly also catch) clause where the JIT would throw away the
currently imported code and convert the block to a call to the verification
failure helper.

This was a classic case of important, functional code being under `#ifdef DEBUG`
that is needed in non-DEBUG as well.

The result was we would end up adding an `ASG(LCL_VAR, CATCH_ARG)` to the statement
list twice, with the same `CATCH_ARG` node.

Fixes #45580
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Mar 21, 2021
@JulieLeeMSFT JulieLeeMSFT removed the needs-further-triage Issue has been initially triaged, but needs deeper consideration or reconsideration label Mar 22, 2021
@JulieLeeMSFT JulieLeeMSFT added this to the 6.0.0 milestone Mar 22, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Apr 21, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
Archived in project
Development

Successfully merging a pull request may close this issue.

5 participants