Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

JIT: Constant is not propagated through a struct #87072

Closed
EgorBo opened this issue Jun 2, 2023 · 7 comments · Fixed by #102808
Closed

JIT: Constant is not propagated through a struct #87072

EgorBo opened this issue Jun 2, 2023 · 7 comments · Fixed by #102808
Assignees
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release
Milestone

Comments

@EgorBo
Copy link
Member

EgorBo commented Jun 2, 2023

Minimal repro for a CQ issue @stephentoub noticed in #87067: a constant doesn't succesfully propagate through a struct:

struct Awaitable
{
    int Opts;

    Awaitable(bool value)
    {
        Opts = value ? 1 : 2;

        //
        //if (value)
        //    Opts = 1;
        //else
        //    Opts = 2;
    }
}


public static int Test() => new Awaitable(false).Opts;

Codegen for Test:

; Assembly listing for method Awaitable:Test():int
       push     rax
       xor      eax, eax
       mov      dword ptr [rsp], eax
       mov      dword ptr [rsp], 2
       mov      eax, dword ptr [rsp]
       add      rsp, 8
       ret      
; Total bytes of code 21

Now replace that ternary operation with if-else (uncomment it and remove the ternary):

; Method Awaitable:Test():int
       mov      eax, 2
       ret
; Total bytes of code: 6

JitDump diff: https://www.diffchecker.com/ld33PGnr (left - if-esle, right - ternary). So, apparently, we don't enregister the Opts field due to address exposure, @SingleAccretion suggested that when we spill all stack entries to locals at the end of a block, we should not spill constants and local addresses. @AndyAyersMS noted that it might be a big amount of work to do so.

@dotnet/jit-contrib @jakobbotsch

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 2, 2023
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jun 2, 2023
@ghost
Copy link

ghost commented Jun 2, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

Minimal repro for a CQ issue @stephentoub noticed in #87067: a constant doesn't succesfully propagate through a struct:

struct Awaitable
{
    int Opts;

    Awaitable(bool value)
    {
        Opts = value ? 1 : 2;

        //
        //if (value)
        //    Opts = 1;
        //else
        //    Opts = 2;
    }
}


public static int Test() => new Awaitable(false).Opts;

Codegen for Test:

; Assembly listing for method Awaitable:Test():int
       push     rax
       xor      eax, eax
       mov      dword ptr [rsp], eax
       mov      dword ptr [rsp], 2
       mov      eax, dword ptr [rsp]
       add      rsp, 8
       ret      
; Total bytes of code 21

Now replace that ternary operation with if-else (uncomment it and remove the ternary):

; Method Awaitable:Test():int
       mov      eax, 2
       ret
; Total bytes of code: 6

JitDump diff: https://www.diffchecker.com/ld33PGnr (left - if-esle, right - ternary). So, apparently, we don't enregister the Opts field due to address exposure, @SingleAccretion suggested that when we spill all stack entries to locals at the end of a block, we should not spill constants and local addresses. @AndyAyersMS noted that it might be a big amount of work to do so.

@dotnet/jit-contrib @jakobbotsch

Author: EgorBo
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@EgorBo EgorBo removed the untriaged New issue has not been triaged by the area owner label Jun 2, 2023
@EgorBo
Copy link
Member Author

EgorBo commented Jun 2, 2023

Assigning Jakob to triage it (as a struct promotion owner)

@AndyAyersMS
Copy link
Member

when we spill all stack entries to locals at the end of a block, we should not spill constants and local addresses

If the successor block is not a join this might be easier to enable -- the complicated case is one where a block's successor has other predecessors, since all predecessors must produce identical stacks of temps so that any one predecessor's exit state represents all predecessors' exit state.

The rough idea would be something like this: in impImportBlock if all the block's successors have only this block as predecessor, then only spill the expensive trees to temps (and if there is just one successor, maybe don't spill anything). We already compute the right property via multRef but don't use it for anything, so perhaps sometime in the past there was a similar optimization and proved unworkable or got removed.

Also it seems like we are always cloning the exit state trees when importing a block. Sems like we could optimize this to use the trees directly for one successor and only clone for the others (this happens in verInitBBEntryState). If we did that it would save a bit of time and space for cases where we see lots of non-empty stacks at block ends.

@jakobbotsch
Copy link
Member

If the successor block is not a join this might be easier to enable -- the complicated case is one where a block's successor has other predecessors, since all predecessors must produce identical stacks of temps so that any one predecessor's exit state represents all predecessors' exit state.

In this case we manage to fold the join away due to inlining, so it seems conceivable to handle this particular case in that way. However, I think in most natural cases produced by Roslyn (assuming this pattern mainly is produced for ternaries), we are going to see both a split and a join. E.g.:

[MethodImpl(MethodImplOptions.NoInlining)]
private static void Foo(bool b)
{
    S s;
    s.Opts = b ? 1 : 2;
}

public struct S
{
    public int Opts;
}

also ends up with the unfortunate address exposure, and in this case there is both a split and join. Allowing the propagation when there is a unique predecessor would change the IR from:

***** BB01
STMT00001 ( ??? ... 0x003 )
N002 (  7,  6) [000005] DA---------STORE_LCL_VAR byref  V03 tmp1                        // store of IL stack entries for clique BB01 -> BB02, BB03
N001 (  3,  3) [000000] -----------                         └──▌  LCL_ADDR  byref  V01 loc0         [+0]
                                                               ▌    int    V01.Program+S:Opts (offs=0x00) -> V06 tmp4         

***** BB01
STMT00000 ( 0x000[E-] ... 0x003 )
N005 (  7,  8) [000004] -----------JTRUE     void  
N004 (  5,  6) [000003] J------N---                         └──▌  NE        int   
N002 (  3,  4) [000024] -----------                            ├──▌  CAST      int <- bool <- int
N001 (  2,  2) [000001] -----------                            │  └──▌  LCL_VAR   int    V00 arg0         
N003 (  1,  1) [000002] -----------                            └──▌  CNS_INT   int    0

------------ BB02 [005..008) -> BB04 (always), preds={BB01} succs={BB04}

***** BB02
STMT00006 ( ??? ... 0x006 )
N002 (  7,  5) [000020] DA---------STORE_LCL_VAR byref  V04 tmp2         // store 1 of IL stack entries for clique BB02, BB03 -> BB04
N001 (  3,  2) [000007] -----------                         └──▌  LCL_VAR   byref  V03 tmp1         

***** BB02
STMT00007 ( ??? ... ??? )
N002 (  5,  4) [000022] DA---------STORE_LCL_VAR int    V05 tmp3         // store 2 of IL stack entries for clique BB02, BB03 -> BB04
N001 (  1,  1) [000019] -----------                         └──▌  CNS_INT   int    2


------------ BB03 [008..009), preds={BB01} succs={BB04}

***** BB03
STMT00002 ( ??? ... 0x008 )
N002 (  7,  5) [000010] DA---------STORE_LCL_VAR byref  V04 tmp2         // store 1 of IL stack entries for clique BB02, BB03 -> BB04
N001 (  3,  2) [000008] -----------                         └──▌  LCL_VAR   byref  V03 tmp1         

***** BB03
STMT00003 ( ??? ... ??? )
N002 (  5,  4) [000012] DA---------STORE_LCL_VAR int    V05 tmp3         // store 2 of IL stack entries for clique BB02, BB03 -> BB04
N001 (  1,  1) [000009] -----------                         └──▌  CNS_INT   int    1

------------ BB04 [009..00F) (return), preds={BB02,BB03} succs={}

***** BB04
STMT00004 ( ??? ... 0x009 )
N003 ( 10,  7) [000017] -A-XG------STOREIND  int   
N001 (  3,  2) [000014] -----------                         ├──▌  LCL_VAR   byref  V04 tmp2         
N002 (  3,  2) [000015] -----------                         └──▌  LCL_VAR   int    V05 tmp3         

to

***** BB01
STMT00000 ( 0x000[E-] ... 0x003 )
N005 (  7,  8) [000004] -----------JTRUE     void  
N004 (  5,  6) [000003] J------N---                         └──▌  NE        int   
N002 (  3,  4) [000024] -----------                            ├──▌  CAST      int <- bool <- int
N001 (  2,  2) [000001] -----------                            │  └──▌  LCL_VAR   int    V00 arg0         
N003 (  1,  1) [000002] -----------                            └──▌  CNS_INT   int    0

------------ BB02 [005..008) -> BB04 (always), preds={BB01} succs={BB04}

***** BB02
STMT00006 ( ??? ... 0x006 )
N002 (  7,  5) [000020] DA---------STORE_LCL_VAR byref  V04 tmp2         // new store of IL stack entries for clique BB02, BB03 -> BB04
N001 (  3,  3) [000000] -----------                         └──▌  LCL_ADDR  byref  V01 loc0         [+0]
                                                               ▌    int    V01.Program+S:Opts (offs=0x00) -> V06 tmp4        

***** BB02
STMT00007 ( ??? ... ??? )
N002 (  5,  4) [000022] DA---------STORE_LCL_VAR int    V05 tmp3         
N001 (  1,  1) [000019] -----------                         └──▌  CNS_INT   int    2

------------ BB03 [008..009), preds={BB01} succs={BB04}

***** BB03
STMT00002 ( ??? ... 0x008 )
N002 (  7,  5) [000010] DA---------STORE_LCL_VAR byref  V04 tmp2         // new store of IL stack entries for clique BB02, BB03 -> BB04
N001 (  3,  3) [000000] -----------                         └──▌  LCL_ADDR  byref  V01 loc0         [+0]
                                                               ▌    int    V01.Program+S:Opts (offs=0x00) -> V06 tmp4      

***** BB03
STMT00003 ( ??? ... ??? )
N002 (  5,  4) [000012] DA---------STORE_LCL_VAR int    V05 tmp3         
N001 (  1,  1) [000009] -----------                         └──▌  CNS_INT   int    1

------------ BB04 [009..00F) (return), preds={BB02,BB03} succs={}

***** BB04
STMT00004 ( ??? ... 0x009 )
N003 ( 10,  7) [000017] -A-XG------STOREIND  int   
N001 (  3,  2) [000014] -----------                         ├──▌  LCL_VAR   byref  V04 tmp2         
N002 (  3,  2) [000015] -----------                         └──▌  LCL_VAR   int    V05 tmp3         

We could potentially handle this when importing BB04 by checking the predecessor assignments into the spill temps. But we would need to try harder to import blocks in RPO as we don't import these in the right order to do this today (the import order is BB01 -> BB03 -> BB04 -> BB02).

@AndyAyersMS
Copy link
Member

Seems like it would be useful here and in morph (and maybe in more phases) to build a general worklist driven algorithm for processing in RPO. The rough idea would be to first run a DFS to identify a DAG (or adopt some other DAG-identifying convention like all lexically backward edges are back edges) and then set up a priority queue (or similar) on blocks with the priority being the number of unvisited preds.

Then process all the priority 0 nodes. As each node finishes decrement the priority of each successor reached along a non-backedge, until the queue is empty. If the phase trims away an edge because of optimization, then also decrement the successor count (if a non-backedge).

If new blocks are added or flow is altered things get tricker. Not a problem for the importer but morph can do various flow edits so we'd need to be careful.

We would need a priority queue implementation that is not to allocation-happy, but we can likely use a free list for the internal data structures so can keep the allocation proportional to the number of blocks and perhaps amortize that over multiple traversals. Or since we really only care about nodes of priority zero, just keep a block->count map and whenever a block's count gets to zero, move it to the worklist.

@jakobbotsch
Copy link
Member

I have a desire to fix this in 9.0 by introducing some limited flow-sensitive propagation of LCL_ADDR values as part of local morph. It would fix this issue and it would improve many cases where inlining can end up spilling LCL_ADDR nodes (e.g. #69254, many of the Matrix4x4 and Matrix3x2 methods).

@AndyAyersMS
Copy link
Member

This could also be handled via tail duplication, basically move (and duplicate) the storeind backwards into its preds, based on the intersection of the variables it reads and the constants available for those locals in the preds. I am playing around with this idea in some other contexts...

@jakobbotsch jakobbotsch added the Priority:2 Work that is important, but not critical for the release label May 3, 2024
jakobbotsch added a commit to jakobbotsch/runtime that referenced this issue May 29, 2024
This changes local morph to run in RPO when optimizations are enabled.
It adds infrastructure to track and propagate LCL_ADDR values assigned
to locals (or struct fields) during local morph. This allows us to avoid
address exposure in cases where the destination local does not actually
end up escaping in any way.

Example:
```csharp
public struct Awaitable
{
    public int Opts;

    public Awaitable(bool value)
    {
        Opts = value ? 1 : 2;
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
public static int Test() => new Awaitable(false).Opts;
```

Before:
```asm
G_M59043_IG01:  ;; offset=0x0000
       push     rax
						;; size=1 bbWeight=1 PerfScore 1.00

G_M59043_IG02:  ;; offset=0x0001
       xor      eax, eax
       mov      dword ptr [rsp], eax
       mov      dword ptr [rsp], 2
       mov      eax, dword ptr [rsp]
						;; size=15 bbWeight=1 PerfScore 3.25

G_M59043_IG03:  ;; offset=0x0010
       add      rsp, 8
       ret
						;; size=5 bbWeight=1 PerfScore 1.25
; Total bytes of code: 21

```

After:
```asm
G_M59043_IG02:  ;; offset=0x0000
       mov      eax, 2
						;; size=5 bbWeight=1 PerfScore 0.25

G_M59043_IG03:  ;; offset=0x0005
       ret
```

Propagating the addresses works much like local assertion prop in morph
does. Proving that the locations that were stored to do not escape
afterwards is done with a simplistic approach: we check globally that no
reads of the location exists, and if so, we replace the `LCL_ADDR`
stored to them by a constant 0. We leave it up to liveness to clean up
the stores themselves.

This could be more sophisticated, but in practice this handles the
reported cases just fine.

If we were able to remove any `LCL_ADDR` in this way then we run an
additional pass over the locals of the IR to compute the final set of
exposed locals.

Fix dotnet#87072
Fix dotnet#102273
Fix dotnet#102518

This is still not sufficient to handle dotnet#69254. To handle that case we
need to handle transferring assertions for struct copies, and also
handle proving that specific struct fields containing local addresses do
not escape. It is probably doable, but for now I will leave it as future
work.
@github-actions github-actions bot locked and limited conversation to collaborators Jul 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI Priority:2 Work that is important, but not critical for the release
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants