Examples where CSE does not handle trees marked `GTF_MAKE_CSE` #92170

AndyAyersMS · 2023-09-16T01:55:59Z

Hoisting will mark any trees that it hoists with GTF_MAKE_CSE. Oddly CSE never looks for this flag. I was curious how many of these cases end up not getting hoisted because CSE does not play along.

In asp.net I see around 1850 cases total. Here's some analysis.

Constants (~1400)

These never even become CSE candidates. Hoisting invokes optIsCSEcandidate to screen things, but this generally allows all constants to be considered hoistable. CSE does additional checks and disables constant hoisting except on arm64. We should either stop hosting these constants or make them candidates.

Hoisting a copy of [000510] $2c7 from BB41 into PreHeader BB40 for loop L03 <BB41..BB45>:
N002 (  2, 10) [000510] -----------                         *  CNS_INT   long   -0x1000000000000 $2c7

This hoisted copy placed in PreHeader (BB40):
               [000788] -----------                         *  COMMA     void  
     (  2, 10) [000786] -------H---                         +--*  CNS_INT   long   -0x1000000000000 $2c7
               [000787] -----------                         \--*  NOP       void  

Sorted CSE candidates:
CSE #03, {$1e7, $2  } useCnt=1: [def=1432.123973, use=286.424795, cost=  8      ]
        :: N005 (  8,  5) CSE #03 (def)[000400] --------R--                         *  LSH       int    $1e7
CSE #02, {$1e6, $2  } useCnt=1: [def=1432.123973, use=286.424795, cost=  3      ]
        :: N003 (  3,  3) CSE #02 (def)[000399] -----------                         *  AND       int    $1e6
CSE #01, {$301, $2  } useCnt=4: [def=100.000000, use=250.451620, cost=  3, call]
        :: N003 (  3,  3) CSE #01 (def)[000781] -----O-H---                         *  ADD       byref  $301

Missed MAKE_CSE at [000786] in 410

Hoisted tree is part of a bigger CSE

Hoisting a copy of [000742] $107 from BB37 into PreHeader BB36 for loop L01 <BB37..BB38>:
N005 ( 25,  8) [000742] ---------U-                         *  CAST      long <- ulong <- uint $107
N004 ( 24,  6) [000741] M----------                         \--*  UDIV      int    $44
N002 (  3,  2) [000739] -----------                            +--*  LCL_VAR   int    V41 tmp32        u:1 $45
N003 (  1,  1) [000740] -------N---                            \--*  CNS_INT   int    2 $42

This hoisted copy placed in PreHeader (BB36):
               [001148] -----------                         *  COMMA     void  
     ( 25,  8) [001143] -------H-U-                         +--*  CAST      long <- ulong <- uint $107
     ( 24,  6) [001144] M----------                         |  \--*  UDIV      int    $44
     (  3,  2) [001145] -----------                         |     +--*  LCL_VAR   int    V41 tmp32        u:1 $45
     (  1,  1) [001146] -------N---                         |     \--*  CNS_INT   int    2 $42
               [001147] -----------                         \--*  NOP       void

Here the tree [000742] that inspired hoisting is part of a bigger CSE, and the subtrees are never considered as candidates.

N007 ( 29, 11) CSE #12 (def)[000743] -----------                                  +--*  ADD       long   $30f
N005 ( 25,  8)              [000742] ---------U-                                  |  +--*  CAST      long <- ulong <- uint $107
N004 ( 24,  6)              [000741] M----------                                  |  |  \--*  UDIV      int    $44
N002 (  3,  2)              [000739] -----------                                  |  |     +--*  LCL_VAR   int    V41 tmp32        u:1 $45
N003 (  1,  1)              [000740] -------N---                                  |  |     \--*  CNS_INT   int    2 $42
N006 (  3,  2)              [000738] -----------                                  |  \--*  LCL_VAR   long   V47 tmp38        u:4 $281 

Missed MAKE_CSE at [001143] in 1292

There is perhaps a tension deciding which is better, but it seems like we can actually do both: CSE the subtree,
then CSE the add sitting on top.

ValueNum is constant (~25)

Hoisting a copy of [000194] $102 from BB04 into PreHeader BB03 for loop L00 <BB04..BB04>:
N003 (  2,  3) [000194] ---------U-                         *  CAST      long <- ulong <- uint $102
N002 (  1,  1) [000193] -----------                         \--*  LCL_VAR   int    V03 loc0         u:1 $44

This hoisted copy placed in PreHeader (BB03):
               [000241] -----------                         *  COMMA     void  
     (  2,  3) [000238] -------H-U-                         +--*  CAST      long <- ulong <- uint $102
     (  1,  1) [000239] -----------                         |  \--*  LCL_VAR   int    V03 loc0         u:1 $44
               [000240] -----------                         \--*  NOP       void

The value number for the tree is constant, and CSE defers to Assertion Prop.

This may be pragmatic, as Assertion Prop has the logic to materialize the constant. But it may mean "expensive" constructed constants might not get hoisted or CSEd, so it seems like some rethinking here is warranted.

Hoist of a side-effecting tree (~260)

Hoisting a copy of [000977] $2c6 from BB09 into PreHeader BB08 for loop L01 <BB09..BB10>:
N002 (  2,  2) [000977] ---X-O-----                         *  NULLCHECK byte   $2c6
N001 (  1,  1) [000976] -----------                         \--*  LCL_VAR   ref    V00 loc0         u:1 $80

This hoisted copy placed in PreHeader (BB08):
               [001208] ---X-O-----                         *  COMMA     void  
     (  2,  2) [001205] ---X-O-H---                         +--*  NULLCHECK byte   $2c6
     (  1,  1) [001206] -----------                         |  \--*  LCL_VAR   ref    V00 loc0         u:1 $80
               [001207] -----------                         \--*  NOP       void

Here presumably we don't need to mark the hoisted tree with GTF_MAKE_CSE as there is nothing CSE can do, and the preheader tree is not going to be removed. However if the tree also had interesting and expensive subtrees, we might want to mark those to make sure we CSE them.

The current JIT code marks the entire subtree; I changed that for this experiment. Perhaps instead we could mark all subtrees that qualify under optIsCSEcandidate, or the "side effect free" leader subtrees.

Failed the CSE profitability heuristic

Considering CSE #01 {$140, $2  } [def=83.914511, use=83.925749, cost=  3, call]
CSE Expression : 
N003 (  3,  6) CSE #01 (def)[000064] -----O-H---                         *  ADD       byref  $140
N001 (  1,  1)              [000065] -----O-----                         +--*  LCL_VAR   ref    V00 arg0         u:1 $80
N002 (  1,  4)              [000066] -----------                         \--*  CNS_INT   long   360 Fseq[<unknown field>, 256] $100

Aggressive CSE Promotion (251.754771 >= 200.000000)
cseRefCnt=251.754771, aggressiveRefCnt=200.000000, moderateRefCnt=100.000000
defCnt=83.914511, useCnt=83.925749, cost=3, size=6, LiveAcrossCall
def_cost=1, use_cost=1, extra_no_cost=10, extra_yes_cost=100
CSE cost savings check (261.777248 >= 267.840260) fails
Did Not promote this CSE
Missed MAKE_CSE at [000064] in 408

This may be reasonable. The use and def are in blocks with relative weight 0.84, so we're weighing the benefit of the
CSE versus the apparent cost of burining a callee save (since this is live across call CSE).

On the other hand if the loop iterates a bit more often that PGO data indicates, we will wish we had done the CSE.
This is an example with asymmetric upside/downside and we likely should be more aggressive in such cases.

Other

I only looked at 10 or so methods, so there might be other explanations.

category:cq
theme:cse
skill-level:intermediate
cost:small
impact:small

The text was updated successfully, but these errors were encountered:

ghost · 2023-09-16T01:56:06Z

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Hoisting will mark any trees that hit hoists with GTF_MAKE_CSE. I was curious how many of these cases end up not getting hoisted because CSE does not play along.

In asp.net I see around 1850 cases total. Here's some analysis.

Constants (~1400)

These never even become CSE candidates. Hoisting invokes optIsCSEcandidate to screen things, but this generally allows all constants to be considered hoistable. CSE does additional checks and disables constant hoisting except on arm64. We should either stop hosting these constants or make them candidates.

Hoisting a copy of [000510] $2c7 from BB41 into PreHeader BB40 for loop L03 <BB41..BB45>:
N002 (  2, 10) [000510] -----------                         *  CNS_INT   long   -0x1000000000000 $2c7

This hoisted copy placed in PreHeader (BB40):
               [000788] -----------                         *  COMMA     void  
     (  2, 10) [000786] -------H---                         +--*  CNS_INT   long   -0x1000000000000 $2c7
               [000787] -----------                         \--*  NOP       void  

Sorted CSE candidates:
CSE #03, {$1e7, $2  } useCnt=1: [def=1432.123973, use=286.424795, cost=  8      ]
        :: N005 (  8,  5) CSE #03 (def)[000400] --------R--                         *  LSH       int    $1e7
CSE #02, {$1e6, $2  } useCnt=1: [def=1432.123973, use=286.424795, cost=  3      ]
        :: N003 (  3,  3) CSE #02 (def)[000399] -----------                         *  AND       int    $1e6
CSE #01, {$301, $2  } useCnt=4: [def=100.000000, use=250.451620, cost=  3, call]
        :: N003 (  3,  3) CSE #01 (def)[000781] -----O-H---                         *  ADD       byref  $301

Missed MAKE_CSE at [000786] in 410

Hoisted tree is part of a bigger CSE

Hoisting a copy of [000742] $107 from BB37 into PreHeader BB36 for loop L01 <BB37..BB38>:
N005 ( 25,  8) [000742] ---------U-                         *  CAST      long <- ulong <- uint $107
N004 ( 24,  6) [000741] M----------                         \--*  UDIV      int    $44
N002 (  3,  2) [000739] -----------                            +--*  LCL_VAR   int    V41 tmp32        u:1 $45
N003 (  1,  1) [000740] -------N---                            \--*  CNS_INT   int    2 $42

This hoisted copy placed in PreHeader (BB36):
               [001148] -----------                         *  COMMA     void  
     ( 25,  8) [001143] -------H-U-                         +--*  CAST      long <- ulong <- uint $107
     ( 24,  6) [001144] M----------                         |  \--*  UDIV      int    $44
     (  3,  2) [001145] -----------                         |     +--*  LCL_VAR   int    V41 tmp32        u:1 $45
     (  1,  1) [001146] -------N---                         |     \--*  CNS_INT   int    2 $42
               [001147] -----------                         \--*  NOP       void

Here the tree [000742] that inspired hoisting is part of a bigger CSE, and the subtrees are never considered as candidates.

N007 ( 29, 11) CSE #12 (def)[000743] -----------                                  +--*  ADD       long   $30f
N005 ( 25,  8)              [000742] ---------U-                                  |  +--*  CAST      long <- ulong <- uint $107
N004 ( 24,  6)              [000741] M----------                                  |  |  \--*  UDIV      int    $44
N002 (  3,  2)              [000739] -----------                                  |  |     +--*  LCL_VAR   int    V41 tmp32        u:1 $45
N003 (  1,  1)              [000740] -------N---                                  |  |     \--*  CNS_INT   int    2 $42
N006 (  3,  2)              [000738] -----------                                  |  \--*  LCL_VAR   long   V47 tmp38        u:4 $281 

Missed MAKE_CSE at [001143] in 1292

There is perhaps a tension deciding which is better, but it seems like we can actually do both: CSE the subtree,
then CSE the add sitting on top.

ValueNum is constant (~25)

Hoisting a copy of [000194] $102 from BB04 into PreHeader BB03 for loop L00 <BB04..BB04>:
N003 (  2,  3) [000194] ---------U-                         *  CAST      long <- ulong <- uint $102
N002 (  1,  1) [000193] -----------                         \--*  LCL_VAR   int    V03 loc0         u:1 $44

This hoisted copy placed in PreHeader (BB03):
               [000241] -----------                         *  COMMA     void  
     (  2,  3) [000238] -------H-U-                         +--*  CAST      long <- ulong <- uint $102
     (  1,  1) [000239] -----------                         |  \--*  LCL_VAR   int    V03 loc0         u:1 $44
               [000240] -----------                         \--*  NOP       void

The value number for the tree is constant, and CSE defers to Assertion Prop.

This may be pragmatic, as Assertion Prop has the logic to materialize the constant. But it may mean "expensive" constructed constants might not get hoisted or CSEd, so it seems like some rethinking here is warranted.

Hoist of a side-effecting tree (~260)

Hoisting a copy of [000977] $2c6 from BB09 into PreHeader BB08 for loop L01 <BB09..BB10>:
N002 (  2,  2) [000977] ---X-O-----                         *  NULLCHECK byte   $2c6
N001 (  1,  1) [000976] -----------                         \--*  LCL_VAR   ref    V00 loc0         u:1 $80

This hoisted copy placed in PreHeader (BB08):
               [001208] ---X-O-----                         *  COMMA     void  
     (  2,  2) [001205] ---X-O-H---                         +--*  NULLCHECK byte   $2c6
     (  1,  1) [001206] -----------                         |  \--*  LCL_VAR   ref    V00 loc0         u:1 $80
               [001207] -----------                         \--*  NOP       void

Here presumably we don't need to mark the hoisted tree with GTF_MAKE_CSE as there is nothing CSE can do, and the preheader tree is not going to be removed. However if the tree also had interesting and expensive subtrees, we might want to mark those to make sure we CSE them.

The current JIT code marks the entire subtree; I changed that for this experiment. Perhaps instead we could mark all subtrees that qualify under optIsCSEcandidate, or the "side effect free" leader subtrees.

Failed the CSE profitability heuristic

Considering CSE #01 {$140, $2  } [def=83.914511, use=83.925749, cost=  3, call]
CSE Expression : 
N003 (  3,  6) CSE #01 (def)[000064] -----O-H---                         *  ADD       byref  $140
N001 (  1,  1)              [000065] -----O-----                         +--*  LCL_VAR   ref    V00 arg0         u:1 $80
N002 (  1,  4)              [000066] -----------                         \--*  CNS_INT   long   360 Fseq[<unknown field>, 256] $100

Aggressive CSE Promotion (251.754771 >= 200.000000)
cseRefCnt=251.754771, aggressiveRefCnt=200.000000, moderateRefCnt=100.000000
defCnt=83.914511, useCnt=83.925749, cost=3, size=6, LiveAcrossCall
def_cost=1, use_cost=1, extra_no_cost=10, extra_yes_cost=100
CSE cost savings check (261.777248 >= 267.840260) fails
Did Not promote this CSE
Missed MAKE_CSE at [000064] in 408

This may be reasonable. The use and def are in blocks with relative weight 0.84, so we're weighing the benefit of the
CSE versus the apparent cost of burining a callee save (since this is live across call CSE).

On the other hand if the loop iterates a bit more often that PGO data indicates, we will wish we had done the CSE.
This is an example with asymmetric upside/downside and we likely should be more aggressive in such cases.

Other

I only looked at 10 or so methods, so there might be other explanations.

Author:	AndyAyersMS
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

AndyAyersMS · 2023-09-16T01:57:35Z

FYI @dotnet/jit-contrib

AndyAyersMS · 2024-05-03T21:04:28Z

Will defer working on this until after .NET 9, as we hope the ML heuristics effort can capture this.

jakobbotsch · 2024-08-19T09:10:33Z

#97042 (comment) has another example. IMO we should look at improving hoisting here to stop relying on CSE (and, in my opinion, invariance through VNs).

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 16, 2023

ghost added the untriaged New issue has not been triaged by the area owner label Sep 16, 2023

AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Sep 16, 2023

AndyAyersMS added this to the 9.0.0 milestone Sep 16, 2023

AndyAyersMS mentioned this issue Jan 23, 2024

Ensure loop hoisting recognizes and tracks mask types #97398

Merged

JulieLeeMSFT assigned AndyAyersMS Apr 4, 2024

AndyAyersMS modified the milestones: 9.0.0, Future May 3, 2024

jakobbotsch mentioned this issue Aug 19, 2024

[Perf] Linux/x64: 5 Regressions on 1/11/2024 10:37:40 PM #97042

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Examples where CSE does not handle trees marked `GTF_MAKE_CSE` #92170

Examples where CSE does not handle trees marked `GTF_MAKE_CSE` #92170

AndyAyersMS commented Sep 16, 2023 •

edited

Loading

ghost commented Sep 16, 2023

Constants (~1400)

Hoisted tree is part of a bigger CSE

ValueNum is constant (~25)

Hoist of a side-effecting tree (~260)

Failed the CSE profitability heuristic

Other

AndyAyersMS commented Sep 16, 2023

AndyAyersMS commented May 3, 2024

jakobbotsch commented Aug 19, 2024

Examples where CSE does not handle trees marked GTF_MAKE_CSE #92170

Examples where CSE does not handle trees marked GTF_MAKE_CSE #92170

Comments

AndyAyersMS commented Sep 16, 2023 • edited Loading

Constants (~1400)

Hoisted tree is part of a bigger CSE

ValueNum is constant (~25)

Hoist of a side-effecting tree (~260)

Failed the CSE profitability heuristic

Other

ghost commented Sep 16, 2023

Constants (~1400)

Hoisted tree is part of a bigger CSE

ValueNum is constant (~25)

Hoist of a side-effecting tree (~260)

Failed the CSE profitability heuristic

Other

AndyAyersMS commented Sep 16, 2023

AndyAyersMS commented May 3, 2024

jakobbotsch commented Aug 19, 2024

Examples where CSE does not handle trees marked `GTF_MAKE_CSE` #92170

Examples where CSE does not handle trees marked `GTF_MAKE_CSE` #92170

AndyAyersMS commented Sep 16, 2023 •

edited

Loading