JIT: Clean up liveness #103809

jakobbotsch · 2024-06-21T11:15:25Z

We have a DFS tree available in both early liveness and SSA's liveness, and we can use it to make the data flow cheaper by running in a post-order over the DFS tree. This allows us to propagate the maximal amount of knowledge in each iteration and also to stop the data flow early when there is no cycle in the DFS tree.

We do not have the DFS tree available in lowering where we also call liveness. However, lowering was already iterating all blocks to remove dead blocks; switch this to fgDfsBlocksAndRemove to remove dead blocks and compute the DFS tree in one go, and remove the old code doing this.

Additionally there was a bunch of logic in liveness to consider debug scopes for debug codegen. I'm not sure what this code is doing; we do not run liveness in debug codegen, so seems like it can just be removed.

We have a DFS tree available in both early liveness and SSA's liveness, and we can use it to make the data flow cheaper by running in an RPO over the DFS tree. This allows us to propagate the maximal amount of knowledge in each iteration and also to stop the data flow early when there is no cycle in the DFS tree. We do not have the DFS tree available in lowering where we also call liveness. However, lowering was already iterating all blocks to remove dead blocks; switch this to `fgDfsBlocksAndRemove` to remove dead blocks and compute the DFS tree in one go, and remove the old code doing this. Additionally there was a bunch of logic in liveness to consider debug scopes for debug codegen. I'm not sure what this code is doing; we do not run liveness in debug codegen, so seems like it can just be removed.

jakobbotsch · 2024-06-21T11:50:07Z

We have a problem with switching lowering's liveness to use the DFS tree: we do not model throw helpers properly in the flow graph, which means that they are considered unreachable. With this PR that means they do not get this marked as live when lvaKeepAliveAndReportThis is true, or the generic context marked live when lvaReportParamTypeArg is true. This would not be a problem if they were truly unreachable, but we end up materializing edges to them in the backend, making this a problem.

One possibility is to do another pass over the throw helper blocks in liveness and then mark the volatile vars in them. I'm not a big fan of that. Ideally VisitAllSuccs would return the throw helpers for the blocks that may get edges to them, but I'm not sure how easy that is to do.

cc @AndyAyersMS, any suggestions here?

jakobbotsch · 2024-06-21T12:37:29Z

One possibility is to do another pass over the throw helper blocks in liveness and then mark the volatile vars in them. I'm not a big fan of that.

Pushed a commit that does this, but it is not sufficient; these blocks also need to consider their EH successor's vars as live.

Even doing it this hacky way is a correctness issue; not modelling it means we won't build proper SSA that takes this kind of flow into account. I'll see if I can create an example.

jakobbotsch · 2024-06-21T12:47:04Z

Even doing it this hacky way is a correctness issue; not modelling it means we won't build proper SSA that takes this kind of flow into account. I'll see if I can create an example.

Hmm, this will probably be ok since we should add the necessary PHIs as part of the predecessors that will be in the try region.

jakobbotsch · 2024-06-21T20:08:39Z

Diffs seem to be because removing dead blocks before the first liveness has some benefits in some cases; particularly if it results in liveness no longer marking a local as EH-live because we removed a dead try region.

There are also cases where it allows fgUpdateFlowGraph to make more changes, e.g. by removing some blocks that now allow more compaction to take place. Before this change we wouldn't get rid of the blocks until after fgUpdateFlowGraph and would end up with uncompacted blocks; this can result in more native <-> IL mappings. This was the cause of one 0 size diff I checked.

AndyAyersMS · 2024-06-21T20:32:35Z

cc @AndyAyersMS, any suggestions here?

Not really... I wonder sometimes if we should just make the throw helper flow explicit earlier and optimize it away when not needed.

jakobbotsch · 2024-06-21T23:17:38Z

Not really... I wonder sometimes if we should just make the throw helper flow explicit earlier and optimize it away when not needed.

I think an improvement on the situation would be to at least create the BasicBlocks themselves as late as possible so that it can be that code's responsibility to get all their state (like their liveness) right, and so that we minimize the time we have these non-modelled basic blocks around.

jakobbotsch · 2024-06-24T10:55:28Z

cc @dotnet/jit-contrib PTAL @AndyAyersMS

Diffs. They come from removing unreachable blocks before the liveness pass, see above. Some nice TP improvements.

AndyAyersMS · 2024-06-24T15:17:00Z

Additionally there was a bunch of logic in liveness to consider debug scopes for debug codegen. I'm not sure what this code is doing; we do not run liveness in debug codegen, so seems like it can just be removed.

Once upon a time we tracked lifetimes in debug code instead of reporting all gc refs as untracked. This changed in dotnet/coreclr#9869. Then dotnet/coreclr#12665 simplified minopts liveness accordingly.

You should remove JitMinOptsTrackGCrefs and related code; there is no going back.

AndyAyersMS

Changes LGTM, but I recommend you do a bit more cleanup and remove the option to enable tracked GC reporting in minopts.

jakobbotsch · 2024-06-24T20:18:23Z

Hmm, looks like there's a bunch of new 0-sized diffs from removing JitMinOptsTrackGCrefs (it was defaulted to 1 before outside xarch). Need to look into what those diffs are -- presumably something in GC info given that it had a use in gcMakeRegPtrTable.

jakobbotsch · 2024-06-24T21:41:53Z

The diff in the jitdump from an example (****** START compiling System.Numerics.BitOperations:IsPow2(ulong):ubyte):

-Defining 2 call sites:
-    Offset 0x2c, size 4.
-    Offset 0x50, size 4.
+Defining 0 call sites:

Looks like it's coming from here:

runtime/src/coreclr/jit/gcencode.cpp

Lines 4497 to 4500 in 71ab8f1

    
           if (noTrackedGCSlots && regMask == 0) 
        
           { 
        
               // No live GC refs in regs at the call -> don't record the call. 
        
           }

That is, it seems with noTrackedGCSlots we sometimes do not even bother recording in the GC info that there are some call sites. Before this PR this bool is true in MinOpts compilations, but only when compiling for x86 or x64. After this PR it is true for MinOpts compilations for all targets.

I checked on benchmarks.run_pgo.windows.arm64 which has 49901 MinOpts contexts (out of ~103k total contexts). In 34188 of those the optimization kicks in, and overall it reduces the total GC info size for those MinOpts contexts from 1303717 bytes to 1034336 bytes. The total size of the GC info including the optimized contexts goes from 5091149 bytes to 4821768 bytes.

The downside is that the new logic to allow suspension on return addresses does not kick in if we do not record the calls in the GC info. Regardless I don't think this PR should be changing this behavior here, so I am going to change it back to what it was (with the difference between xarch and other platforms), but definitely something to reconsider separately.

cc @VSadov @jkotas

jkotas · 2024-06-24T21:55:42Z

the difference between xarch and other platforms

This came from https://github.com/dotnet/coreclr/pull/9869/files#diff-081cd7404db539556560f8746ba2f1928ec14424fd64192aba02316bedac4183R219 . I do not see a reason for having platform differences in GC encoding like this one. I think we should enable the x64 path everywhere.

I think it is fine to have worse GC suspension behavior with minopts. Minops means suboptimal performance, and GC suspension tail latency is part of the bundle.

jakobbotsch · 2024-06-24T21:57:38Z

the difference between xarch and other platforms

This came from https://github.com/dotnet/coreclr/pull/9869/files#diff-081cd7404db539556560f8746ba2f1928ec14424fd64192aba02316bedac4183R219 . I do not see a reason for having platform differences in GC encoding like this one. I think we should enable the x64 path everywhere.

I think it is fine to have worse GC suspension behavior with minopts. Minops means suboptimal performance, and GC suspension tail latency is part of the bundle.

Sounds good to me, I'll submit a separate PR to enable the optimization globally once this one makes it in.

VSadov · 2024-06-24T21:58:54Z

The downside is that the new logic to allow suspension on return addresses does not kick in if we do not record the calls in the GC info.

It is something to think about. Thanks for pointing at this!

MinOpts is generally not a problem with suspension latencies. I think that is because the code is not very tight. If we do not inline much there should be lots of opportunity for hijacking as well. If some call sites are not recorded and thus we lose some interruptibility points it might not be a big deal.

Not having calls recorded could be inconvenient for some other scenarios. Like putting hijacking on the same plan as asynchronous suspension. It would be still be doable, but without call sites recorded, it will be a bit of a leap of faith - "please suspend me at the caller location, believe me the location is GC safe, even though there is no callsite reported for it...".
Maybe it just means that some asserts will not be as precise as they could be.

dotnet-policy-service bot assigned jakobbotsch Jun 21, 2024

Fix comment

802560a

Remove BasicBlock::bbScope, handle throw helper blocks separately

f9143b4

Fix keep alive set

81c9db2

build-analysis bot mentioned this pull request Jun 21, 2024

The Operation will be canceled. The next steps may not contain expected logs. dotnet/dnceng#3008

Open

3 tasks

jakobbotsch added 2 commits June 21, 2024 21:56

Do dump in lexical order

fab44bd

Merge branch 'main' of github.com:dotnet/runtime into liveness-dfs-tree

9c9d485

jakobbotsch marked this pull request as ready for review June 24, 2024 10:55

jakobbotsch requested a review from AndyAyersMS June 24, 2024 10:55

This was referenced Jun 24, 2024

Build failure: Static graph-based restore failed with exit code .* but did not log an error. #103526

Open

Build failure: Static graph-based restore failed with exit code .* but did not log an error. dotnet/dnceng#3139

Closed

AndyAyersMS reviewed Jun 24, 2024

View reviewed changes

Remove more stuff

90bc634

AndyAyersMS approved these changes Jun 24, 2024

View reviewed changes

jakobbotsch mentioned this pull request Jun 24, 2024

JIT differences in reporting call sites in GC info #103917

Open

Revert change in where GC info optimization is enabled

9e1a6c3

build-analysis bot mentioned this pull request Jun 25, 2024

Test failure: GC\\Scenarios\\FinalizeTimeout\\FinalizeTimeout\\FinalizeTimeout.cmd #103874

Closed

jakobbotsch merged commit 3bcc947 into dotnet:main Jun 25, 2024
105 of 107 checks passed

jakobbotsch deleted the liveness-dfs-tree branch June 25, 2024 08:40

This was referenced Jun 25, 2024

JIT: Fix formatting after liveness change #103954

Merged

JIT: Check for call interference when placing PUTARG nodes #103301

Merged

This was referenced Jul 5, 2024

JIT: Try more optimal fixpoint computation strategies for liveness #86527

Closed

JIT: Propagate physical promotion liveness in post-order #104554

Merged

jakobbotsch mentioned this pull request Jul 23, 2024

x86: Assertion failed '!"Too many unreachable block removal loops"' during 'Global local var liveness' #70786

Closed

github-actions bot locked and limited conversation to collaborators Jul 26, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

JIT: Clean up liveness #103809

JIT: Clean up liveness #103809

jakobbotsch commented Jun 21, 2024 •

edited

Loading

jakobbotsch commented Jun 21, 2024

jakobbotsch commented Jun 21, 2024

jakobbotsch commented Jun 21, 2024

jakobbotsch commented Jun 21, 2024

AndyAyersMS commented Jun 21, 2024

jakobbotsch commented Jun 21, 2024

jakobbotsch commented Jun 24, 2024

AndyAyersMS commented Jun 24, 2024

AndyAyersMS left a comment

jakobbotsch commented Jun 24, 2024

jakobbotsch commented Jun 24, 2024

jkotas commented Jun 24, 2024

jakobbotsch commented Jun 24, 2024

VSadov commented Jun 24, 2024 •

edited

Loading

JIT: Clean up liveness #103809

JIT: Clean up liveness #103809

Conversation

jakobbotsch commented Jun 21, 2024 • edited Loading

jakobbotsch commented Jun 21, 2024

jakobbotsch commented Jun 21, 2024

jakobbotsch commented Jun 21, 2024

jakobbotsch commented Jun 21, 2024

AndyAyersMS commented Jun 21, 2024

jakobbotsch commented Jun 21, 2024

jakobbotsch commented Jun 24, 2024

AndyAyersMS commented Jun 24, 2024

AndyAyersMS left a comment

Choose a reason for hiding this comment

jakobbotsch commented Jun 24, 2024

jakobbotsch commented Jun 24, 2024

jkotas commented Jun 24, 2024

jakobbotsch commented Jun 24, 2024

VSadov commented Jun 24, 2024 • edited Loading

jakobbotsch commented Jun 21, 2024 •

edited

Loading

VSadov commented Jun 24, 2024 •

edited

Loading