-
Notifications
You must be signed in to change notification settings - Fork 2.7k
Improve const return block placement #13903
Conversation
@AndyAyersMS / @briansull PTAL This sample diff from Bytemark shows what most of the diffs look like -- note the eliminated diff --git "a/.\\base\\parse_arg.dasm" "b/.\\diff\\parse_arg.dasm"
index 81e8b85..c0d148d 100644
--- "a/.\\base\\parse_arg.dasm"
+++ "b/.\\diff\\parse_arg.dasm"
@@ -30,7 +30,7 @@ G_M15108_IG02:
cmp dword ptr [rcx+8], 0
jbe SHORT G_M15108_IG12
cmp word ptr [rcx+12], 45
- jne SHORT G_M15108_IG07
+ jne SHORT G_M15108_IG08
G_M15108_IG03:
lea r11, [(reloc)]
cmp dword ptr [rcx], ecx
@@ -40,9 +40,9 @@ G_M15108_IG03:
jbe SHORT G_M15108_IG12
movzx rdx, word ptr [rcx+14]
cmp edx, 63
- je SHORT G_M15108_IG07
+ je SHORT G_M15108_IG08
cmp edx, 67
- jne SHORT G_M15108_IG07
+ jne SHORT G_M15108_IG08
G_M15108_IG04:
mov edx, 2
lea r11, [(reloc)]
@@ -55,22 +55,19 @@ G_M15108_IG05:
mov rcx, rax
G_M15108_IG06:
call [ByteMark:read_comfile(ref)]
- jmp SHORT G_M15108_IG09
-G_M15108_IG06B:
- mov eax, -1
- jmp SHORT G_M15108_IG11
+ xor eax, eax
G_M15108_IG07:
- mov eax, -1
-G_M15108_IG08:
lea rsp, [rbp]
pop rbp
ret
+G_M15108_IG08:
+ mov eax, -1
G_M15108_IG09:
- xor eax, eax
-G_M15108_IG10:
lea rsp, [rbp]
pop rbp
ret
+G_M15108_IG10:
+ mov eax, -1
G_M15108_IG11:
lea rsp, [rbp]
pop rbp
@@ -89,9 +86,9 @@ G_M15108_IG14:
mov rcx, gword ptr [rcx]
mov rdx, gword ptr [rbp-10H]
call [Console:WriteLine(ref,ref)]
- lea rax, G_M15108_IG06B
+ lea rax, G_M15108_IG10
G_M15108_IG15:
add rsp, 48
pop rbp
ret
-; Total bytes of code 196, prolog size 17 for method ByteMark:parse_arg(ref):int:this
+; Total bytes of code 192, prolog size 17 for method ByteMark:parse_arg(ref):int:this The same issue shows up in one of the motivating cases from https://github.com/dotnet/coreclr/issues/13466#issuecomment-323831444, and this fixes it. jit-diff results (win x64 release --frameworks --tests):
The "regression" in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Changes look good.
While you're looking at this code, can you double-check what happens in minopts/debug modes? I'm not sure what the right behavior is there, come to think of it...
The code to merge down to no more than four returns originally came from a correctness restriction on x86, so it's been firing unconditonally. Before #13792, therefore, the behavior under minopts/debug was the same as when optimizing -- to merge to one return if we have sync block etc., or if we have more than four returns. After #13792 and continuing with this change as currently coded up, the behavior under minopts/debug is still the same as when optimizing -- merge to one return if we have sync block etc., otherwise merge down to no more than four returns, segregating constants where possible. Arguably the minopts/debug behavior should be more like the previous behavior -- in the case that there are more than four returns, shrink down to one, regardless of whether any are constant... do you think I should make that change? |
Not sure what the right behavior should be. Also this is probably a separable issue. Offhand I'd say that in minopts/debug we should always produce one return, but it is not clear if that always gives faster jit TP / smaller code size. Also it's at odds with minopts trying to do minimal "risky" work. I suppose you could go back and see if there is any minopts TP/size impact from your original change and then if there is we can figure out what the appropriate behavior should be. |
Tweak a few things so that generated constant return blocks get laid out more optimally: - Don't set BBF_DONT_REMOVE on them; it's ok to move them around, and if all references to them get dropped, it's fine to eliminate them. - Insert them immediately after their lexically-last predecessor when generating them; this increases the likelihood of using fallthrough rather than gotos to target them in the face of fgReorderBlocks' reticence to reorder code that we don't have IBC data for and that hasn't been marked rare. - Make fgReorderBlocks slightly more aggressive for a pattern that now shows up somewhat routinely for returning compound conditionals from predicate functions.
20ac1a9
to
5deb018
Compare
Hm, you raise some good points, I'm not so sure myself.
Good suggestion.
Yes. Created #13915 to track. |
Remove some `goto`s that were added to work around undesirable jit layout (#9692, fixed in dotnet#13314) and epilog factoring (improved in dotnet#13792 and dotnet#13903), which are no longer needed. Resolves #13466.
Tweak a few things so that generated constant return blocks get laid out
more optimally:
if all references to them get dropped, it's fine to eliminate them.
generating them; this increases the likelihood of using fallthrough
rather than gotos to target them in the face of fgReorderBlocks'
reticence to reorder code that we don't have IBC data for and that
hasn't been marked rare.
shows up somewhat routinely for returning compound conditionals from
predicate functions.