-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: Have lowering set up IR for post-indexed addressing and make strength reduced IV updates amenable to post-indexed addressing #105185
Conversation
Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch |
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress |
Azure Pipelines successfully started running 2 pipeline(s). |
e907ed1
to
9598fdd
Compare
Ran jitstress in https://dev.azure.com/dnceng-public/public/_build/results?buildId=749010&view=results and libraries-jitstress in https://dev.azure.com/dnceng-public/public/_build/results?buildId=749011&view=results. jitstress failures were #105186. libraries-jitstress failures were #105092, #105189, #102370. |
cc @dotnet/jit-contrib PTAL @AndyAyersMS Diffs. Some cool diffs: |
Interesting diff in windows arm benchmarks pgo +4 (+0.09%) : 7422.dasm - System.Text.RegularExpressions.RegexPrefixAnalyzer:<FindPrefixes>g__FindPrefixesCore|0_1(System.Text.RegularExpressions.RegexNode,System.Collections.Generic.List`1[System.Text.StringBuilder],ubyte):ubyte (Tier0-FullOpts)
@@ -1600,10 +1600,12 @@ G_M12455_IG128: ; bbWeight=0.05, gcrefRegs=180000 {x19 x20}, byrefRegs=C0
;; size=4 bbWeight=0.05 PerfScore 0.02
G_M12455_IG129: ; bbWeight=0.49, gcrefRegs=180000 {x19 x20}, byrefRegs=C00000 {x22 x23}, byref, isz
ldrh w1, [x23, x21]
- stp wzr, w1, [fp, #0x50] // [V16 loc13], [V15 loc12]
+ str w1, [fp, #0x54] // [V15 loc12]
+ add x21, x21, #2
+ str wzr, [fp, #0x50] // [V16 loc13] |
int maxCount = min(m_blockIndirs.Height(), POST_INDEXED_ADDRESSING_MAX_DISTANCE / 2); | ||
for (int i = 0; i < maxCount; i++) | ||
{ | ||
SavedIndir& prev = m_blockIndirs.TopRef(i); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be more efficient to start checking with the last indir instead of the first?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This does start with the last indir (since it is using TopRef
instead of BottomRef
)
assert((prevIndir->gtLIRFlags & LIR::Flags::Mark) == 0); | ||
m_scratchSideEffects.Clear(); | ||
|
||
for (GenTree* cur = prevIndir->gtNext; cur != store; cur = cur->gtNext) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if this could be cheaper if you computed two side effect sets and then checked for interference. But it probably doesn't make much difference.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, possibly -- although it would be a bit less precise than what's here since not all nodes that are part of store
's dataflow necessarily happen after all the nodes we are moving.
This adds a transformation in lowering that tries to set up the IR to be amenable to post-indexed addressing in the backend. It does so by looking for RMW additions/subtractions of a local that was also recently used as the address to an indirection.
…sing On arm64 have strength reduction try to insert IV updates after the last use if that last use is a legal insertion point. This often allows the backend to use post-indexed addressing to combine the use with the IV update.
9598fdd
to
c5cc900
Compare
We end up with this IR after strength reduction: ***** BB135 [0056]
STMT00207 ( 0x2B1[E-] ... 0x2BB )
N007 ( 10, 7) [000666] DA-XGO----- ▌ STORE_LCL_VAR int V15 loc12 d:1 $d1a
N006 ( 10, 7) [000664] ---XGO-N--- └──▌ COMMA ushort <l:$b0d, c:$b0c>
N001 ( 0, 0) [000657] ----------- ├──▌ NOP void
N005 ( 10, 7) [003041] ---XGO----- └──▌ IND ushort <l:$b0a, c:$b0b>
N004 ( 7, 5) [000663] ----GO-N--- └──▌ ADD byref $c12
N002 ( 3, 2) [000662] ----------- ├──▌ LCL_VAR byref V165 tmp129 u:2 $c11
N003 ( 3, 2) [003994] ----------- └──▌ LCL_VAR long V247 rat2
***** BB135 [0056]
STMT00720 ( ??? ... ??? )
N004 ( 9, 8) [003993] DA--------- ▌ STORE_LCL_VAR long V247 rat2
N003 ( 5, 5) [003992] ----------- └──▌ ADD long
N001 ( 3, 2) [003991] ----------- ├──▌ LCL_VAR long V247 rat2
N002 ( 1, 2) [003989] ----------- └──▌ CNS_INT long 2 $34d
***** BB135 [0056]
STMT00208 ( 0x2BD[E-] ... 0x2BE )
N002 ( 1, 3) [000668] DA--------- ▌ STORE_LCL_VAR int V16 loc13 d:1 $VN.Void
N001 ( 1, 2) [000667] ----------- └──▌ CNS_INT int 0 $c0 Lowering does not try to make the indirection Strength reduction doesn't do much (any) sanity checking of whether we actually expect to be able to do post-indexed after moving the IV update. That would require us to check that the use is of a supported pattern. But I figure that complication is unnecessary since the exact place we update the IV at shouldn't matter much here -- it is live throughout the loop anyway. It might even be better for scheduling purposes to update it as soon as possible after that last use. |
This adds a transformation in lowering that tries to set up the IR to be
amenable to post-indexed addressing in the backend. It does so by
looking for RMW additions/subtractions of a local that was also recently
used as the address to an indirection, and making them adjacent.
Additionally, have strength reduction try to insert IV updates after the last
use if that last use is a legal insertion point. This allows the lowering transformation
to kick in.
For a simple loop:
this results in:
The .NET 8 vs .NET 9 codegen diff for this loop becomes: