Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Consider CALLFINALLY block as pred of finally block during assertion props #55674

Merged
merged 5 commits into from
Jul 18, 2021

Conversation

kunalspathak
Copy link
Member

@kunalspathak kunalspathak commented Jul 14, 2021

In #160, we started considering only first and last block of try for calculating assertions of finally block. However, we should also consider the assertion out of CALLFINALLY block.

No asmdiffs.

Fixes: #55131

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 14, 2021
@kunalspathak
Copy link
Member Author

@dotnet/jit-contrib , @sandreenko

@BruceForstall
Copy link
Member

It seems to me like the issue is that jump threading is creating unexpected flow. It seems like we should prevent it from doing so, instead of fixing the general dataflow algorithm.

In particular, there is an assumption that a handler is only reachable from the try region. The callfinally/always pair on x64 is an "extended" part of that try region: there should be no predecessors of that from outside the try region. Jump threading is breaking this assumption by branching directly from outside of the try region to the callfinally. (btw, how is it doing that? It knows that the same condition is used inside the try region first block?)

@AndyAyersMS

@BruceForstall BruceForstall self-requested a review July 15, 2021 00:58
@kunalspathak
Copy link
Member Author

I see what you mean. Let me investigate it further.

@AndyAyersMS
Copy link
Member

Jump threading is already checking that the flow it creates lies entirely within the same EH region. I'd need to look closer to see why that's not a sufficient constraint.

@BruceForstall
Copy link
Member

I think this is the FEATURE_EH_CALLFINALLY_THUNKS case (for non-duplicated finallys).

@kunalspathak
Copy link
Member Author

kunalspathak commented Jul 15, 2021

I think this is the FEATURE_EH_CALLFINALLY_THUNKS case (for non-duplicated finallys).

That is correct. For FEATURE_EH_CALLFINALLY_THUNKS, BBJ_CALLFINALLY is part of the enclosing region of the try, hence jump from BB1 -> BB5 is possible.

* Note that on AMD64/ARM64, the BBJ_CALLFINALLY block that calls a finally handler is not
* within the corresponding 'try' region: it is placed in the corresponding 'try' region's
* parent (which might be the main function body). This is how it is represented to the VM
* (with a special "cloned finally" EH table entry).

Relevant portion from JITDUMP:

BBnum BBid ref try hnd preds           weight    lp [IL range]     [jump]      [EH region]         [flags]
-----------------------------------------------------------------------------------------------------------------------------------------
BB01 [0000]  1                             1       [000..009)-> BB03 ( cond )                     i hascall gcsafe 
BB02 [0001]  1       BB01                  1       [009..013)                                     i hascall gcsafe 
BB03 [0003]  2  0    BB01,BB02             1       [013..017)-> BB05 ( cond ) T0      try {       keep i try gcsafe 
BB04 [0004]  1  0    BB03                  1       [017..023)                 T0      }           i hascall gcsafe 
BB05 [0010]  2       BB03,BB04             1       [???..???)-> BB08 (callf )                     i internal gcsafe 
BB06 [0011]  1       BB10                  1       [???..???)-> BB07 (ALWAYS)                     i internal gcsafe KEEP 
BB07 [0009]  1       BB06                  1       [031..032)        (return)                     i gcsafe 
+++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++ funclets follow
BB08 [0006]  2     0 BB05                  1       [023..026)-> BB10 ( cond )    H0 F finally {   keep i gcsafe flet 
BB09 [0007]  1     0 BB08                  1       [026..030)                    H0               i hascall gcsafe 
BB10 [0008]  2     0 BB08,BB09             1       [030..031)        (finret)    H0   }           i gcsafe 
-----------------------------------------------------------------------------------------------------------------------------------------

Dominator BB01 of BB03 has relop with same liberal VN:
N003 (  3,  3) [000005] J------N----              *  EQ        int    $140
N001 (  1,  1) [000003] ------------              +--*  LCL_VAR   int    V00 loc0         u:2 $c0
N002 (  1,  1) [000004] ------------              \--*  CNS_INT   int    0 $40
 Redundant compare; current relop:
N003 (  3,  3) [000009] J------N----              *  EQ        int    $140
N001 (  1,  1) [000007] ------------              +--*  LCL_VAR   int    V00 loc0         u:2 $c0
N002 (  1,  1) [000008] ------------              \--*  CNS_INT   int    0 $40
Both successors of idom BB01 reach BB03 -- attempting jump threading
BB01 is a true pred
BB02 is an eh constrained pred
Optimizing via jump threading
Jump flow from pred BB01 -> BB03 implies predicate true; we can safely redirect flow to be BB01 -> BB05
Setting edge weights for BB01 -> BB05 to [0 .. 3.402823e+38]

fixing the general dataflow algorithm.

Earlier, we used to check all the preds of a finally block, but #160 optimized it to just restrict it to try's first/last block. I briefly try to see if it eliminates the range check which the PR claimed to do, but I don't see any difference. @sandreenko , is my repro code correct? It's a different issue why range check is not eliminated here.

private static int Issue_55131()
{
    int[] array = new int[100];
    int i = 0;
    int result = 0;
    while (i < 100)
    {
        try
        {
            result += i;
        }
        catch (IndexOutOfRangeException)
        {
            array[i] = i; // < -that bounds check can't be eliminated without that change.
        }
        array[i] = i; // < -that bounds check can't be eliminated without that change.
        i++;
    }
    return result;
}
Assembly code
; Assembly listing for method MiniBench.Program:Issue_55131():int
; Emitting BLENDED_CODE for X64 CPU with AVX - Windows
; optimized code
; rbp based frame
; fully interruptible
; No PGO data
; Final local variable assignments
;
;  V00 loc0         [V00,T02] (  3,  8   )     ref  ->  [rbp-10H]   class-hnd exact EH-live single-def
;  V01 loc1         [V01,T00] ( 11, 53   )     int  ->  [rbp-04H]   do-not-enreg[Z] EH-live
;  V02 loc2         [V02,T01] (  4, 10   )     int  ->  [rbp-08H]   do-not-enreg[Z] EH-live
;  V03 OutArgs      [V03    ] (  1,  1   )  lclBlk (32) [rsp+00H]   "OutgoingArgSpace"
;* V04 tmp1         [V04    ] (  0,  0   )     ref  ->  zero-ref    class-hnd "impSpillSpecialSideEff"
;  V05 PSPSym       [V05    ] (  1,  1   )    long  ->  [rbp-20H]   do-not-enreg[X] addr-exposed "PSPSym"
;
; Lcl frame size = 64

G_M42278_IG01:
       push     rbp
       sub      rsp, 64
       lea      rbp, [rsp+40H]
       mov      qword ptr [rbp-20H], rsp
                                                ;; bbWeight=1    PerfScore 2.75
G_M42278_IG02:
       mov      rcx, 0xD1FFAB1E
       mov      edx, 100
       call     CORINFO_HELP_NEWARR_1_VC
       mov      gword ptr [rbp-10H], rax
       xor      edx, edx
       mov      dword ptr [rbp-04H], edx
       mov      dword ptr [rbp-08H], edx
                                                ;; bbWeight=1    PerfScore 4.75
G_M42278_IG03:
       mov      edx, dword ptr [rbp-08H]
       add      edx, dword ptr [rbp-04H]
       mov      dword ptr [rbp-08H], edx
                                                ;; bbWeight=4    PerfScore 12.00
G_M42278_IG04:
       cmp      dword ptr [rbp-04H], 100
       jae      SHORT G_M42278_IG07
       mov      eax, dword ptr [rbp-04H]
       movsxd   rax, eax
       mov      edx, dword ptr [rbp-04H]
       mov      rcx, gword ptr [rbp-10H]
       mov      dword ptr [rcx+4*rax+16], edx
       mov      eax, dword ptr [rbp-04H]
       inc      eax
       mov      dword ptr [rbp-04H], eax
       cmp      dword ptr [rbp-04H], 100
       mov      rax, rcx
       jl       SHORT G_M42278_IG03
                                                ;; bbWeight=8    PerfScore 86.00
G_M42278_IG05:
       mov      eax, dword ptr [rbp-08H]
                                                ;; bbWeight=1    PerfScore 1.00
G_M42278_IG06:
       add      rsp, 64
       pop      rbp
       ret
                                                ;; bbWeight=1    PerfScore 1.75
G_M42278_IG07:
       call     CORINFO_HELP_RNGCHKFAIL
       int3
                                                ;; bbWeight=0    PerfScore 0.00
G_M42278_IG08:
       push     rbp
       sub      rsp, 48
       mov      rbp, qword ptr [rcx+32]
       mov      qword ptr [rsp+20H], rbp
       lea      rbp, [rbp+40H]
                                                ;; bbWeight=0    PerfScore 0.00
G_M42278_IG09:
       cmp      dword ptr [rbp-04H], 100
       jae      SHORT G_M42278_IG11
       mov      eax, dword ptr [rbp-04H]
       movsxd   rax, eax
       mov      edx, dword ptr [rbp-04H]
       mov      rcx, gword ptr [rbp-10H]
       mov      dword ptr [rcx+4*rax+16], edx
       lea      rax, G_M42278_IG04
                                                ;; bbWeight=0    PerfScore 0.00
G_M42278_IG10:
       add      rsp, 48
       pop      rbp
       ret
                                                ;; bbWeight=0    PerfScore 0.00
G_M42278_IG11:
       call     CORINFO_HELP_RNGCHKFAIL
       int3
                                                ;; bbWeight=0    PerfScore 0.00

; Total bytes of code 170, prolog size 14, PerfScore 125.25, instruction count 51, allocated bytes for code 170 (MethodHash=b4be5ad9) for method MiniBench.Program:Issue_55131():int

This PR still retains the optimization(?) made in #160 but also fix the correctness issue to consider BBJ_FINALLY as the pred. Let me know if I missed anything.

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jul 15, 2021

We could also try and leverage ehGetCallFinallyBlockRange to block jump threading from introducing a branch into these ranges.

Or perhaps more simply, disable jump threading for the first block of a try (either always, or only under FEATURE_EH_CALLFINALLY_THUNKS).

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Jul 15, 2021

(btw, how is it doing that? It knows that the same condition is used inside the try region first block?

Yes, it effectively realizes the try is empty and so all that will happen on entry to the try is a handler invocation, and so that's the flow it sets up.

@AndyAyersMS
Copy link
Member

I'm fine with this change or with inhibiting jump threading.

I have a slight preference for this change as it seems risky to simply assume flow can't reach a handler if we actually see that it can. If we go the inhibit route we should add suitable assertion checking to make sure nothing else can cause this.

@kunalspathak
Copy link
Member Author

I disable the jump threading for first block of try. However, we should continue to keep the changes in dataflow because there might be other codepaths that makes the flowgraph of having a jump to CALLFINALLY. So, if we see it to be there, just consider it during assertion prop calculation.

Copy link
Member

@BruceForstall BruceForstall left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need (and shouldn't have) the dataflow changes.

Instead, I've written an assert that should cover preventing the problematic case: #55858

src/coreclr/jit/redundantbranchopts.cpp Outdated Show resolved Hide resolved
@kunalspathak
Copy link
Member Author

We don't need (and shouldn't have) the dataflow changes.

I think in that case, I should just add an assert in dataflow to make sure we don't see such flows.

@kunalspathak kunalspathak merged commit 3906d0d into dotnet:main Jul 18, 2021
@ghost ghost locked as resolved and limited conversation to collaborators Aug 17, 2021
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Invalid assertion prop in finally
3 participants