-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: limited version of forward substitution for some relops #61023
Conversation
Add a new optimization to redundant branch opts that looks at prior statements in the same block for redundant relop trees. If one is found, we check to see if it can be forward-substituted down to the terminal jump tree. And if it can we duplicate the computation down at the jump. This removes many of the cases we see in our generated code where we materialize a boolean value in a register and then immedately test that register to see if it is true/false, and then use that second test to drive branching -- instead we now use the initial test logic to drive the jump and so the boolean value only exists in the flags.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsAdd a new optimization to redundant branch opts that looks at prior statements This removes many of the cases we see in our generated code where we materialize
|
cc @dotnet/jit-contrib This gets rid of many cases like the following ;; Assembly listing for method Microsoft.CodeAnalysis.CSharp.SyntaxFacts:IsKeywordKind(ushort):bool
;; before
G_M47757_IG03: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz
cmp ecx, 0x20C0
setle al
movzx rax, al
test eax, eax
je SHORT G_M47757_IG05
mov eax, 1
;; after
G_M47757_IG03: ; gcrefRegs=00000000 {}, byrefRegs=00000000 {}, byref, isz
cmp ecx, 0x20C0
jg SHORT G_M47757_IG05
mov eax, 1 and some "reverse" cases like ;; Assembly listing for method System.Runtime.CompilerServices.DefaultInterpolatedStringHandler:AppendFormatted(double,System.String):this
;; before
G_M58045_IG06: ; gcrefRegs=00000080 {rdi}, byrefRegs=00000040 {rsi}, byref, isz
movsd qword ptr [rsp+50H], xmm6
lea rcx, bword ptr [rsi+24]
; byrRegs +[rcx]
mov ebx, dword ptr [rsi+16]
mov ebp, dword ptr [rcx+8]
cmp ebx, ebp
ja G_M58045_IG17
mov r14, bword ptr [rcx]
; byrRegs +[r14]
sub ebp, ebx
mov ecx, ebp
; byrRegs -[rcx]
not ecx
shr ecx, 31
mov rdx, qword ptr [(reloc)]
mov rdx, gword ptr [rdx]
; gcrRegs +[rdx]
mov rax, rdx
; gcrRegs +[rax]
test ecx, ecx
jne SHORT G_M58045_IG08
;; after
G_M58045_IG06: ; gcrefRegs=00000080 {rdi}, byrefRegs=00000040 {rsi}, byref, isz
movsd qword ptr [rsp+50H], xmm6
lea rcx, bword ptr [rsi+24]
; byrRegs +[rcx]
mov ebx, dword ptr [rsi+16]
mov ebp, dword ptr [rcx+8]
cmp ebx, ebp
ja G_M58045_IG17
mov r14, bword ptr [rcx]
; byrRegs +[r14]
sub ebp, ebx
mov rcx, qword ptr [(reloc)]
; byrRegs -[rcx]
mov rdx, gword ptr [rcx]
; gcrRegs +[rdx]
mov rcx, rdx
; gcrRegs +[rcx]
test ebp, ebp
jge SHORT G_M58045_IG08 aspnet.run.windows.x64.checked.mch:
Detail diffs
benchmarks.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
I looked at a few of the bigger regressions and they all look like changes in RA spilling. I used Kunal's asm screener from jitutils to check roughly how many cases of (setcc; movzx; test) remain and there are still some where the tree to substitute is too complex for this code to reason about (eg an array bounds check or call). Eg there were 700 or so in libraries pmi. Many of those can be safely forward substituted but it requires knowing for instance that an SSA def has a single use, which we don't know right now. In some ways this change is crying out for a general code motion safety checcker -- that is a utility that given a tree X and a desired location Y, tells us if it is safe/correct to move X to Y. Note I currently duplicate the tree, I don't move it (but we still need to check that the copy will evaluate to the same thing at its new location). I use heuristics to try ensure that if I copy the tree the original will end up being dead, and this seems to largely work out. I think this optimization will also allow us to remove most of the |
// | ||
substituteTree->gtFlags |= (GTF_DONT_CSE | GTF_RELOP_JMP_USED); | ||
|
||
gtReplaceTree(stmt, tree, substituteTree); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, gtReplaceTree
was dead code before this change and as such is sort of in the process of being deleted (in #59912)...
It appears here we have the use edge, so maybe just replace the operand and resequence the statement explicitly? It should be a bit cheaper I think.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe just replace the operand and resequence the statement explicitly?
Sure.
/azp run runtime-coreclr outerloop |
Azure Pipelines successfully started running 1 pipeline(s). |
Looks like some issues in Regex -- going to run outerloop to see if any simpler tests get tripped up. |
Interference check was broken, leading to an invalid fwd sub that broke regexp tests...
|
if ((prevTree->gtFlags & GTF_SIDE_EFFECT) != (prevTree->gtFlags & (GTF_EXCEPT | GTF_ASG))) | ||
{ | ||
JITDUMP(" -- prev tree has side effects\n"); | ||
break; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It appears we only keep track of locals assigned at the root.
Do we need to check (prevTreeRHS->gtFlags & GTF_ASG) == 0
to guard against nested definitions?
@dotnet/jit-contrib ping |
if (!domIsSameRelop && !domIsRevRelop) | ||
{ | ||
JITDUMP(" -- prev tree VN is not related\n"); | ||
continue; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't understand how this works. Doesn't this effectively make the interference check below useless? E.g.
Foo(3, 4);
void Foo(int a, int b)
{
bool foo = a < b;
a = 15;
if (foo)
Console.WriteLine(a);
else
Console.WriteLine(b);
}
15 does not have the same VN as a < b
, so the assignment to a
will be skipped in the interference check and the program will always print 4
when it should print 15
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, we should be adding a
to the interfering set here before continuing the search.
break; | ||
} | ||
|
||
// If the VN of RHS is the VN of the current tree, or is "related", consider foward sub. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: typo: foward
@@ -691,6 +692,307 @@ bool Compiler::optJumpThread(BasicBlock* const block, BasicBlock* const domBlock | |||
return true; | |||
} | |||
|
|||
//------------------------------------------------------------------------ | |||
// optRedundantRelop: see if the value of tree is redundant given earlier | |||
// relops in this block. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might be useful if you give an example here of the type of transformations done
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added one -- was that what you had in mind?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's great! Much more detailed even than what I imagined :-)
No impact to SPMI from the various edits above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably should run JitStress job before merging
/azp run runtime-coreclr jitstress |
Azure Pipelines successfully started running 1 pipeline(s). |
CI seems to be having issues restoring packages
|
Might be worth to run a fuzzer on this as well? |
/azp run Fuzzlyn |
Azure Pipelines successfully started running 1 pipeline(s). |
I don't see anything that looks related in the Fuzzlyn runs. |
Are the Fuzzlyn runs supposed to come back green, or are there "known failures"? |
There are several known failures. Looks like there are a couple of new unknown ones I haven't seen before in this run as well, but I'm pretty sure those aren't related to this PR (one of them is an ARM32 NYI, the other one related to #61037). |
Jitstress now green after rerunning a few build tasks that failed with CI issues. Per Jakob, Fuzzlyn failures seem unrelated. |
windowsx86 improvements - dotnet/perf-autofiling-issues#2212 |
windows/x64 improvements - dotnet/perf-autofiling-issues#2224 |
Add a new optimization to redundant branch opts that looks at prior statements
in the same block for redundant relop trees. If one is found, we check to see
if it can be forward-substituted down to the terminal jump tree. And if it can
we duplicate the computation down at the jump.
This removes many of the cases we see in our generated code where we materialize
a boolean value in a register and then immedately test that register to see if
it is true/false, and then use that second test to drive branching -- instead we
now use the initial test logic to drive the jump and so the boolean value only
exists in the flags.