Consider CALLFINALLY block as pred of finally block during assertion props #55674

kunalspathak · 2021-07-14T19:11:20Z

In #160, we started considering only first and last block of try for calculating assertions of finally block. However, we should also consider the assertion out of CALLFINALLY block.

No asmdiffs.

Fixes: #55131

… assertion prop

kunalspathak · 2021-07-14T19:11:36Z

@dotnet/jit-contrib , @sandreenko

BruceForstall · 2021-07-15T00:56:44Z

It seems to me like the issue is that jump threading is creating unexpected flow. It seems like we should prevent it from doing so, instead of fixing the general dataflow algorithm.

In particular, there is an assumption that a handler is only reachable from the try region. The callfinally/always pair on x64 is an "extended" part of that try region: there should be no predecessors of that from outside the try region. Jump threading is breaking this assumption by branching directly from outside of the try region to the callfinally. (btw, how is it doing that? It knows that the same condition is used inside the try region first block?)

@AndyAyersMS

kunalspathak · 2021-07-15T00:59:21Z

I see what you mean. Let me investigate it further.

AndyAyersMS · 2021-07-15T02:27:06Z

Jump threading is already checking that the flow it creates lies entirely within the same EH region. I'd need to look closer to see why that's not a sufficient constraint.

BruceForstall · 2021-07-15T03:35:23Z

I think this is the FEATURE_EH_CALLFINALLY_THUNKS case (for non-duplicated finallys).

kunalspathak · 2021-07-15T15:47:04Z

I think this is the FEATURE_EH_CALLFINALLY_THUNKS case (for non-duplicated finallys).

That is correct. For FEATURE_EH_CALLFINALLY_THUNKS, BBJ_CALLFINALLY is part of the enclosing region of the try, hence jump from BB1 -> BB5 is possible.

runtime/src/coreclr/jit/jiteh.cpp

Lines 4115 to 4118 in bd9d4e0

    
            * Note that on AMD64/ARM64, the BBJ_CALLFINALLY block that calls a finally handler is not 
        
            * within the corresponding 'try' region: it is placed in the corresponding 'try' region's 
        
            * parent (which might be the main function body). This is how it is represented to the VM 
        
            * (with a special "cloned finally" EH table entry).

Relevant portion from JITDUMP:

BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..009)-> BB03 ( cond )                     i hascall gcsafe 
BB02 [0001]  1       BB01                  1       [009..013)                                     i hascall gcsafe 
BB03 [0003]  2  0    BB01,BB02             1       [013..017)-> BB05 ( cond ) T0      try {       keep i try gcsafe 
BB04 [0004]  1  0    BB03                  1       [017..023)                 T0      }           i hascall gcsafe 
BB05 [0010]  2       BB03,BB04             1       [???..???)-> BB08 (callf )                     i internal gcsafe 
BB06 [0011]  1       BB10                  1       [???..???)-> BB07 (ALWAYS)                     i internal gcsafe KEEP 
BB07 [0009]  1       BB06                  1       [031..032)        (return)                     i gcsafe 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB08 [0006]  2     0 BB05                  1       [023..026)-> BB10 ( cond )    H0 F finally {   keep i gcsafe flet 
BB09 [0007]  1     0 BB08                  1       [026..030)                    H0               i hascall gcsafe 
BB10 [0008]  2     0 BB08,BB09             1       [030..031)        (finret)    H0   }           i gcsafe 
-----------------------------------------------------------------------------------------------------------------------------------------

Dominator BB01 of BB03 has relop with same liberal VN:
N003 (  3,  3) [000005] J------N----              *  EQ        int    $140
N001 (  1,  1) [000003] ------------              +--*  LCL_VAR   int    V00 loc0         u:2 $c0
N002 (  1,  1) [000004] ------------              \--*  CNS_INT   int    0 $40
 Redundant compare; current relop:
N003 (  3,  3) [000009] J------N----              *  EQ        int    $140
N001 (  1,  1) [000007] ------------              +--*  LCL_VAR   int    V00 loc0         u:2 $c0
N002 (  1,  1) [000008] ------------              \--*  CNS_INT   int    0 $40
Both successors of idom BB01 reach BB03 -- attempting jump threading
BB01 is a true pred
BB02 is an eh constrained pred
Optimizing via jump threading
Jump flow from pred BB01 -> BB03 implies predicate true; we can safely redirect flow to be BB01 -> BB05
Setting edge weights for BB01 -> BB05 to [0 .. 3.402823e+38]

fixing the general dataflow algorithm.

Earlier, we used to check all the preds of a finally block, but #160 optimized it to just restrict it to try's first/last block. I briefly try to see if it eliminates the range check which the PR claimed to do, but I don't see any difference. @sandreenko , is my repro code correct? It's a different issue why range check is not eliminated here.

private static int Issue_55131()
{
    int[] array = new int[100];
    int i = 0;
    int result = 0;
    while (i < 100)
    {
        try
        {
            result += i;
        }
        catch (IndexOutOfRangeException)
        {
            array[i] = i; // < -that bounds check can't be eliminated without that change.
        }
        array[i] = i; // < -that bounds check can't be eliminated without that change.
        i++;
    }
    return result;
}

Assembly code

; Assembly listing for method MiniBench.Program:Issue_55131():int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rbp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
;  V00 loc0         [V00,T02] (  3,  8   )     ref  ->  [rbp-10H]   class-hnd exact EH-live single-def
;  V01 loc1         [V01,T00] ( 11, 53   )     int  ->  [rbp-04H]   do-not-enreg[Z] EH-live
;  V02 loc2         [V02,T01] (  4, 10   )     int  ->  [rbp-08H]   do-not-enreg[Z] EH-live
;  V03 OutArgs      [V03    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V04 tmp1         [V04    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "impSpillSpecialSideEff"
;  V05 PSPSym       [V05    ] (  1,  1   )    long  ->  [rbp-20H]   do-not-enreg[X] addr-exposed "PSPSym"
;
; Lcl frame size = 64

G_M42278_IG01:
       push     rbp
       sub      rsp, 64
       lea      rbp, [rsp+40H]
       mov      qword ptr [rbp-20H], rsp
                                                ;; bbWeight=1    PerfScore 2.75
G_M42278_IG02:
       mov      rcx, 0xD1FFAB1E
       mov      edx, 100
       call     CORINFO_HELP_NEWARR_1_VC
       mov      gword ptr [rbp-10H], rax
       xor      edx, edx
       mov      dword ptr [rbp-04H], edx
       mov      dword ptr [rbp-08H], edx
                                                ;; bbWeight=1    PerfScore 4.75
G_M42278_IG03:
       mov      edx, dword ptr [rbp-08H]
       add      edx, dword ptr [rbp-04H]
       mov      dword ptr [rbp-08H], edx
                                                ;; bbWeight=4    PerfScore 12.00
G_M42278_IG04:
       cmp      dword ptr [rbp-04H], 100
       jae      SHORT G_M42278_IG07
       mov      eax, dword ptr [rbp-04H]
       movsxd   rax, eax
       mov      edx, dword ptr [rbp-04H]
       mov      rcx, gword ptr [rbp-10H]
       mov      dword ptr [rcx+4*rax+16], edx
       mov      eax, dword ptr [rbp-04H]
       inc      eax
       mov      dword ptr [rbp-04H], eax
       cmp      dword ptr [rbp-04H], 100
       mov      rax, rcx
       jl       SHORT G_M42278_IG03
                                                ;; bbWeight=8    PerfScore 86.00
G_M42278_IG05:
       mov      eax, dword ptr [rbp-08H]
                                                ;; bbWeight=1    PerfScore 1.00
G_M42278_IG06:
       add      rsp, 64
       pop      rbp
       ret
                                                ;; bbWeight=1    PerfScore 1.75
G_M42278_IG07:
       call     CORINFO_HELP_RNGCHKFAIL
       int3
                                                ;; bbWeight=0    PerfScore 0.00
G_M42278_IG08:
       push     rbp
       sub      rsp, 48
       mov      rbp, qword ptr [rcx+32]
       mov      qword ptr [rsp+20H], rbp
       lea      rbp, [rbp+40H]
                                                ;; bbWeight=0    PerfScore 0.00
G_M42278_IG09:
       cmp      dword ptr [rbp-04H], 100
       jae      SHORT G_M42278_IG11
       mov      eax, dword ptr [rbp-04H]
       movsxd   rax, eax
       mov      edx, dword ptr [rbp-04H]
       mov      rcx, gword ptr [rbp-10H]
       mov      dword ptr [rcx+4*rax+16], edx
       lea      rax, G_M42278_IG04
                                                ;; bbWeight=0    PerfScore 0.00
G_M42278_IG10:
       add      rsp, 48
       pop      rbp
       ret
                                                ;; bbWeight=0    PerfScore 0.00
G_M42278_IG11:
       call     CORINFO_HELP_RNGCHKFAIL
       int3
                                                ;; bbWeight=0    PerfScore 0.00

; Total bytes of code 170, prolog size 14, PerfScore 125.25, instruction count 51, allocated bytes for code 170 (MethodHash=b4be5ad9) for method MiniBench.Program:Issue_55131():int

This PR still retains the optimization(?) made in #160 but also fix the correctness issue to consider BBJ_FINALLY as the pred. Let me know if I missed anything.

AndyAyersMS · 2021-07-15T19:13:03Z

We could also try and leverage ehGetCallFinallyBlockRange to block jump threading from introducing a branch into these ranges.

Or perhaps more simply, disable jump threading for the first block of a try (either always, or only under FEATURE_EH_CALLFINALLY_THUNKS).

AndyAyersMS · 2021-07-15T19:27:03Z

(btw, how is it doing that? It knows that the same condition is used inside the try region first block?

Yes, it effectively realizes the try is empty and so all that will happen on entry to the try is a handler invocation, and so that's the flow it sets up.

AndyAyersMS · 2021-07-16T00:18:38Z

I'm fine with this change or with inhibiting jump threading.

I have a slight preference for this change as it seems risky to simply assume flow can't reach a handler if we actually see that it can. If we go the inhibit route we should add suitable assertion checking to make sure nothing else can cause this.

kunalspathak · 2021-07-16T21:07:33Z

I disable the jump threading for first block of try. However, we should continue to keep the changes in dataflow because there might be other codepaths that makes the flowgraph of having a jump to CALLFINALLY. So, if we see it to be there, just consider it during assertion prop calculation.

BruceForstall

We don't need (and shouldn't have) the dataflow changes.

Instead, I've written an assert that should cover preventing the problematic case: #55858

src/coreclr/jit/redundantbranchopts.cpp

kunalspathak · 2021-07-16T22:37:38Z

We don't need (and shouldn't have) the dataflow changes.

I think in that case, I should just add an assert in dataflow to make sure we don't see such flows.

kunalspathak added 2 commits July 14, 2021 11:29

Consider CALLFINALLY block as one of the pred of handler block during…

ec286c8

… assertion prop

Add test case

b9f9d39

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 14, 2021

BruceForstall self-requested a review July 15, 2021 00:58

Do not perform jump threading for first block of try-region

3f009b3

BruceForstall requested changes Jul 16, 2021

View reviewed changes

src/coreclr/jit/redundantbranchopts.cpp Outdated Show resolved Hide resolved

fix bbNum

6d5cedd

revert dataflow changes

0c02acd

BruceForstall mentioned this pull request Jul 17, 2021

Add an assert concerning branch to CALLFINALLY blocks #55858

Merged

BruceForstall approved these changes Jul 17, 2021

View reviewed changes

kunalspathak merged commit 3906d0d into dotnet:main Jul 18, 2021

ManickaP mentioned this pull request Jul 20, 2021

[QUIC] Remove AppContext switch from S.N.Quic #56027

Merged

ghost locked as resolved and limited conversation to collaborators Aug 17, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Consider CALLFINALLY block as pred of finally block during assertion props #55674

Consider CALLFINALLY block as pred of finally block during assertion props #55674

kunalspathak commented Jul 14, 2021 •

edited

Loading

kunalspathak commented Jul 14, 2021

BruceForstall commented Jul 15, 2021

kunalspathak commented Jul 15, 2021

AndyAyersMS commented Jul 15, 2021

BruceForstall commented Jul 15, 2021

kunalspathak commented Jul 15, 2021 •

edited

Loading

AndyAyersMS commented Jul 15, 2021 •

edited

Loading

AndyAyersMS commented Jul 15, 2021 •

edited

Loading

AndyAyersMS commented Jul 16, 2021

kunalspathak commented Jul 16, 2021

BruceForstall left a comment

kunalspathak commented Jul 16, 2021

Consider CALLFINALLY block as pred of finally block during assertion props #55674

Consider CALLFINALLY block as pred of finally block during assertion props #55674

Conversation

kunalspathak commented Jul 14, 2021 • edited Loading

kunalspathak commented Jul 14, 2021

BruceForstall commented Jul 15, 2021

kunalspathak commented Jul 15, 2021

AndyAyersMS commented Jul 15, 2021

BruceForstall commented Jul 15, 2021

kunalspathak commented Jul 15, 2021 • edited Loading

AndyAyersMS commented Jul 15, 2021 • edited Loading

AndyAyersMS commented Jul 15, 2021 • edited Loading

AndyAyersMS commented Jul 16, 2021

kunalspathak commented Jul 16, 2021

BruceForstall left a comment

Choose a reason for hiding this comment

kunalspathak commented Jul 16, 2021

kunalspathak commented Jul 14, 2021 •

edited

Loading

kunalspathak commented Jul 15, 2021 •

edited

Loading

AndyAyersMS commented Jul 15, 2021 •

edited

Loading

AndyAyersMS commented Jul 15, 2021 •

edited

Loading