Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Examples where CSE does not handle trees marked GTF_MAKE_CSE #92170

Open
AndyAyersMS opened this issue Sep 16, 2023 · 4 comments
Open

Examples where CSE does not handle trees marked GTF_MAKE_CSE #92170

AndyAyersMS opened this issue Sep 16, 2023 · 4 comments
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Milestone

Comments

@AndyAyersMS
Copy link
Member

AndyAyersMS commented Sep 16, 2023

Hoisting will mark any trees that it hoists with GTF_MAKE_CSE. Oddly CSE never looks for this flag. I was curious how many of these cases end up not getting hoisted because CSE does not play along.

In asp.net I see around 1850 cases total. Here's some analysis.

Constants (~1400)

These never even become CSE candidates. Hoisting invokes optIsCSEcandidate to screen things, but this generally allows all constants to be considered hoistable. CSE does additional checks and disables constant hoisting except on arm64. We should either stop hosting these constants or make them candidates.

Hoisting a copy of [000510] $2c7 from BB41 into PreHeader BB40 for loop L03 <BB41..BB45>:
N002 (  2, 10) [000510] -----------                         *  CNS_INT   long   -0x1000000000000 $2c7

This hoisted copy placed in PreHeader (BB40):
               [000788] -----------                         *  COMMA     void  
     (  2, 10) [000786] -------H---                         +--*  CNS_INT   long   -0x1000000000000 $2c7
               [000787] -----------                         \--*  NOP       void  

Sorted CSE candidates:
CSE #03, {$1e7, $2  } useCnt=1: [def=1432.123973, use=286.424795, cost=  8      ]
        :: N005 (  8,  5) CSE #03 (def)[000400] --------R--                         *  LSH       int    $1e7
CSE #02, {$1e6, $2  } useCnt=1: [def=1432.123973, use=286.424795, cost=  3      ]
        :: N003 (  3,  3) CSE #02 (def)[000399] -----------                         *  AND       int    $1e6
CSE #01, {$301, $2  } useCnt=4: [def=100.000000, use=250.451620, cost=  3, call]
        :: N003 (  3,  3) CSE #01 (def)[000781] -----O-H---                         *  ADD       byref  $301

Missed MAKE_CSE at [000786] in 410

Hoisted tree is part of a bigger CSE

Hoisting a copy of [000742] $107 from BB37 into PreHeader BB36 for loop L01 <BB37..BB38>:
N005 ( 25,  8) [000742] ---------U-                         *  CAST      long <- ulong <- uint $107
N004 ( 24,  6) [000741] M----------                         \--*  UDIV      int    $44
N002 (  3,  2) [000739] -----------                            +--*  LCL_VAR   int    V41 tmp32        u:1 $45
N003 (  1,  1) [000740] -------N---                            \--*  CNS_INT   int    2 $42

This hoisted copy placed in PreHeader (BB36):
               [001148] -----------                         *  COMMA     void  
     ( 25,  8) [001143] -------H-U-                         +--*  CAST      long <- ulong <- uint $107
     ( 24,  6) [001144] M----------                         |  \--*  UDIV      int    $44
     (  3,  2) [001145] -----------                         |     +--*  LCL_VAR   int    V41 tmp32        u:1 $45
     (  1,  1) [001146] -------N---                         |     \--*  CNS_INT   int    2 $42
               [001147] -----------                         \--*  NOP       void  

Here the tree [000742] that inspired hoisting is part of a bigger CSE, and the subtrees are never considered as candidates.

N007 ( 29, 11) CSE #12 (def)[000743] -----------                                  +--*  ADD       long   $30f
N005 ( 25,  8)              [000742] ---------U-                                  |  +--*  CAST      long <- ulong <- uint $107
N004 ( 24,  6)              [000741] M----------                                  |  |  \--*  UDIV      int    $44
N002 (  3,  2)              [000739] -----------                                  |  |     +--*  LCL_VAR   int    V41 tmp32        u:1 $45
N003 (  1,  1)              [000740] -------N---                                  |  |     \--*  CNS_INT   int    2 $42
N006 (  3,  2)              [000738] -----------                                  |  \--*  LCL_VAR   long   V47 tmp38        u:4 $281 

Missed MAKE_CSE at [001143] in 1292

There is perhaps a tension deciding which is better, but it seems like we can actually do both: CSE the subtree,
then CSE the add sitting on top.

ValueNum is constant (~25)

Hoisting a copy of [000194] $102 from BB04 into PreHeader BB03 for loop L00 <BB04..BB04>:
N003 (  2,  3) [000194] ---------U-                         *  CAST      long <- ulong <- uint $102
N002 (  1,  1) [000193] -----------                         \--*  LCL_VAR   int    V03 loc0         u:1 $44

This hoisted copy placed in PreHeader (BB03):
               [000241] -----------                         *  COMMA     void  
     (  2,  3) [000238] -------H-U-                         +--*  CAST      long <- ulong <- uint $102
     (  1,  1) [000239] -----------                         |  \--*  LCL_VAR   int    V03 loc0         u:1 $44
               [000240] -----------                         \--*  NOP       void  

The value number for the tree is constant, and CSE defers to Assertion Prop.

This may be pragmatic, as Assertion Prop has the logic to materialize the constant. But it may mean "expensive" constructed constants might not get hoisted or CSEd, so it seems like some rethinking here is warranted.

Hoist of a side-effecting tree (~260)

Hoisting a copy of [000977] $2c6 from BB09 into PreHeader BB08 for loop L01 <BB09..BB10>:
N002 (  2,  2) [000977] ---X-O-----                         *  NULLCHECK byte   $2c6
N001 (  1,  1) [000976] -----------                         \--*  LCL_VAR   ref    V00 loc0         u:1 $80

This hoisted copy placed in PreHeader (BB08):
               [001208] ---X-O-----                         *  COMMA     void  
     (  2,  2) [001205] ---X-O-H---                         +--*  NULLCHECK byte   $2c6
     (  1,  1) [001206] -----------                         |  \--*  LCL_VAR   ref    V00 loc0         u:1 $80
               [001207] -----------                         \--*  NOP       void  

Here presumably we don't need to mark the hoisted tree with GTF_MAKE_CSE as there is nothing CSE can do, and the preheader tree is not going to be removed. However if the tree also had interesting and expensive subtrees, we might want to mark those to make sure we CSE them.

The current JIT code marks the entire subtree; I changed that for this experiment. Perhaps instead we could mark all subtrees that qualify under optIsCSEcandidate, or the "side effect free" leader subtrees.

Failed the CSE profitability heuristic

Considering CSE #01 {$140, $2  } [def=83.914511, use=83.925749, cost=  3, call]
CSE Expression : 
N003 (  3,  6) CSE #01 (def)[000064] -----O-H---                         *  ADD       byref  $140
N001 (  1,  1)              [000065] -----O-----                         +--*  LCL_VAR   ref    V00 arg0         u:1 $80
N002 (  1,  4)              [000066] -----------                         \--*  CNS_INT   long   360 Fseq[<unknown field>, 256] $100

Aggressive CSE Promotion (251.754771 >= 200.000000)
cseRefCnt=251.754771, aggressiveRefCnt=200.000000, moderateRefCnt=100.000000
defCnt=83.914511, useCnt=83.925749, cost=3, size=6, LiveAcrossCall
def_cost=1, use_cost=1, extra_no_cost=10, extra_yes_cost=100
CSE cost savings check (261.777248 >= 267.840260) fails
Did Not promote this CSE
Missed MAKE_CSE at [000064] in 408

This may be reasonable. The use and def are in blocks with relative weight 0.84, so we're weighing the benefit of the
CSE versus the apparent cost of burining a callee save (since this is live across call CSE).

On the other hand if the loop iterates a bit more often that PGO data indicates, we will wish we had done the CSE.
This is an example with asymmetric upside/downside and we likely should be more aggressive in such cases.

Other

I only looked at 10 or so methods, so there might be other explanations.

category:cq
theme:cse
skill-level:intermediate
cost:small
impact:small

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 16, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Sep 16, 2023
@ghost
Copy link

ghost commented Sep 16, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Hoisting will mark any trees that hit hoists with GTF_MAKE_CSE. I was curious how many of these cases end up not getting hoisted because CSE does not play along.

In asp.net I see around 1850 cases total. Here's some analysis.

Constants (~1400)

These never even become CSE candidates. Hoisting invokes optIsCSEcandidate to screen things, but this generally allows all constants to be considered hoistable. CSE does additional checks and disables constant hoisting except on arm64. We should either stop hosting these constants or make them candidates.

Hoisting a copy of [000510] $2c7 from BB41 into PreHeader BB40 for loop L03 <BB41..BB45>:
N002 (  2, 10) [000510] -----------                         *  CNS_INT   long   -0x1000000000000 $2c7

This hoisted copy placed in PreHeader (BB40):
               [000788] -----------                         *  COMMA     void  
     (  2, 10) [000786] -------H---                         +--*  CNS_INT   long   -0x1000000000000 $2c7
               [000787] -----------                         \--*  NOP       void  

Sorted CSE candidates:
CSE #03, {$1e7, $2  } useCnt=1: [def=1432.123973, use=286.424795, cost=  8      ]
        :: N005 (  8,  5) CSE #03 (def)[000400] --------R--                         *  LSH       int    $1e7
CSE #02, {$1e6, $2  } useCnt=1: [def=1432.123973, use=286.424795, cost=  3      ]
        :: N003 (  3,  3) CSE #02 (def)[000399] -----------                         *  AND       int    $1e6
CSE #01, {$301, $2  } useCnt=4: [def=100.000000, use=250.451620, cost=  3, call]
        :: N003 (  3,  3) CSE #01 (def)[000781] -----O-H---                         *  ADD       byref  $301

Missed MAKE_CSE at [000786] in 410

Hoisted tree is part of a bigger CSE

Hoisting a copy of [000742] $107 from BB37 into PreHeader BB36 for loop L01 <BB37..BB38>:
N005 ( 25,  8) [000742] ---------U-                         *  CAST      long <- ulong <- uint $107
N004 ( 24,  6) [000741] M----------                         \--*  UDIV      int    $44
N002 (  3,  2) [000739] -----------                            +--*  LCL_VAR   int    V41 tmp32        u:1 $45
N003 (  1,  1) [000740] -------N---                            \--*  CNS_INT   int    2 $42

This hoisted copy placed in PreHeader (BB36):
               [001148] -----------                         *  COMMA     void  
     ( 25,  8) [001143] -------H-U-                         +--*  CAST      long <- ulong <- uint $107
     ( 24,  6) [001144] M----------                         |  \--*  UDIV      int    $44
     (  3,  2) [001145] -----------                         |     +--*  LCL_VAR   int    V41 tmp32        u:1 $45
     (  1,  1) [001146] -------N---                         |     \--*  CNS_INT   int    2 $42
               [001147] -----------                         \--*  NOP       void  

Here the tree [000742] that inspired hoisting is part of a bigger CSE, and the subtrees are never considered as candidates.

N007 ( 29, 11) CSE #12 (def)[000743] -----------                                  +--*  ADD       long   $30f
N005 ( 25,  8)              [000742] ---------U-                                  |  +--*  CAST      long <- ulong <- uint $107
N004 ( 24,  6)              [000741] M----------                                  |  |  \--*  UDIV      int    $44
N002 (  3,  2)              [000739] -----------                                  |  |     +--*  LCL_VAR   int    V41 tmp32        u:1 $45
N003 (  1,  1)              [000740] -------N---                                  |  |     \--*  CNS_INT   int    2 $42
N006 (  3,  2)              [000738] -----------                                  |  \--*  LCL_VAR   long   V47 tmp38        u:4 $281 

Missed MAKE_CSE at [001143] in 1292

There is perhaps a tension deciding which is better, but it seems like we can actually do both: CSE the subtree,
then CSE the add sitting on top.

ValueNum is constant (~25)

Hoisting a copy of [000194] $102 from BB04 into PreHeader BB03 for loop L00 <BB04..BB04>:
N003 (  2,  3) [000194] ---------U-                         *  CAST      long <- ulong <- uint $102
N002 (  1,  1) [000193] -----------                         \--*  LCL_VAR   int    V03 loc0         u:1 $44

This hoisted copy placed in PreHeader (BB03):
               [000241] -----------                         *  COMMA     void  
     (  2,  3) [000238] -------H-U-                         +--*  CAST      long <- ulong <- uint $102
     (  1,  1) [000239] -----------                         |  \--*  LCL_VAR   int    V03 loc0         u:1 $44
               [000240] -----------                         \--*  NOP       void  

The value number for the tree is constant, and CSE defers to Assertion Prop.

This may be pragmatic, as Assertion Prop has the logic to materialize the constant. But it may mean "expensive" constructed constants might not get hoisted or CSEd, so it seems like some rethinking here is warranted.

Hoist of a side-effecting tree (~260)

Hoisting a copy of [000977] $2c6 from BB09 into PreHeader BB08 for loop L01 <BB09..BB10>:
N002 (  2,  2) [000977] ---X-O-----                         *  NULLCHECK byte   $2c6
N001 (  1,  1) [000976] -----------                         \--*  LCL_VAR   ref    V00 loc0         u:1 $80

This hoisted copy placed in PreHeader (BB08):
               [001208] ---X-O-----                         *  COMMA     void  
     (  2,  2) [001205] ---X-O-H---                         +--*  NULLCHECK byte   $2c6
     (  1,  1) [001206] -----------                         |  \--*  LCL_VAR   ref    V00 loc0         u:1 $80
               [001207] -----------                         \--*  NOP       void  

Here presumably we don't need to mark the hoisted tree with GTF_MAKE_CSE as there is nothing CSE can do, and the preheader tree is not going to be removed. However if the tree also had interesting and expensive subtrees, we might want to mark those to make sure we CSE them.

The current JIT code marks the entire subtree; I changed that for this experiment. Perhaps instead we could mark all subtrees that qualify under optIsCSEcandidate, or the "side effect free" leader subtrees.

Failed the CSE profitability heuristic

Considering CSE #01 {$140, $2  } [def=83.914511, use=83.925749, cost=  3, call]
CSE Expression : 
N003 (  3,  6) CSE #01 (def)[000064] -----O-H---                         *  ADD       byref  $140
N001 (  1,  1)              [000065] -----O-----                         +--*  LCL_VAR   ref    V00 arg0         u:1 $80
N002 (  1,  4)              [000066] -----------                         \--*  CNS_INT   long   360 Fseq[<unknown field>, 256] $100

Aggressive CSE Promotion (251.754771 >= 200.000000)
cseRefCnt=251.754771, aggressiveRefCnt=200.000000, moderateRefCnt=100.000000
defCnt=83.914511, useCnt=83.925749, cost=3, size=6, LiveAcrossCall
def_cost=1, use_cost=1, extra_no_cost=10, extra_yes_cost=100
CSE cost savings check (261.777248 >= 267.840260) fails
Did Not promote this CSE
Missed MAKE_CSE at [000064] in 408

This may be reasonable. The use and def are in blocks with relative weight 0.84, so we're weighing the benefit of the
CSE versus the apparent cost of burining a callee save (since this is live across call CSE).

On the other hand if the loop iterates a bit more often that PGO data indicates, we will wish we had done the CSE.
This is an example with asymmetric upside/downside and we likely should be more aggressive in such cases.

Other

I only looked at 10 or so methods, so there might be other explanations.

Author: AndyAyersMS
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@AndyAyersMS AndyAyersMS removed the untriaged New issue has not been triaged by the area owner label Sep 16, 2023
@AndyAyersMS AndyAyersMS added this to the 9.0.0 milestone Sep 16, 2023
@AndyAyersMS
Copy link
Member Author

FYI @dotnet/jit-contrib

@AndyAyersMS
Copy link
Member Author

Will defer working on this until after .NET 9, as we hope the ML heuristics effort can capture this.

@jakobbotsch
Copy link
Member

#97042 (comment) has another example. IMO we should look at improving hoisting here to stop relying on CSE (and, in my opinion, invariance through VNs).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

No branches or pull requests

2 participants