Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Improve const return block placement #13903

Merged
merged 1 commit into from
Sep 13, 2017

Conversation

JosephTremoulet
Copy link

Tweak a few things so that generated constant return blocks get laid out
more optimally:

  • Don't set BBF_DONT_REMOVE on them; it's ok to move them around, and
    if all references to them get dropped, it's fine to eliminate them.
  • Insert them immediately after their lexically-last predecessor when
    generating them; this increases the likelihood of using fallthrough
    rather than gotos to target them in the face of fgReorderBlocks'
    reticence to reorder code that we don't have IBC data for and that
    hasn't been marked rare.
  • Make fgReorderBlocks slightly more aggressive for a pattern that now
    shows up somewhat routinely for returning compound conditionals from
    predicate functions.

@JosephTremoulet
Copy link
Author

@AndyAyersMS / @briansull PTAL
/cc @dotnet/jit-contrib

This sample diff from Bytemark shows what most of the diffs look like -- note the eliminated jmps:

diff --git "a/.\\base\\parse_arg.dasm" "b/.\\diff\\parse_arg.dasm"
index 81e8b85..c0d148d 100644
--- "a/.\\base\\parse_arg.dasm"
+++ "b/.\\diff\\parse_arg.dasm"
@@ -30,7 +30,7 @@ G_M15108_IG02:
        cmp      dword ptr [rcx+8], 0
        jbe      SHORT G_M15108_IG12
        cmp      word  ptr [rcx+12], 45
-       jne      SHORT G_M15108_IG07
+       jne      SHORT G_M15108_IG08
 G_M15108_IG03:
        lea      r11, [(reloc)]
        cmp      dword ptr [rcx], ecx
@@ -40,9 +40,9 @@ G_M15108_IG03:
        jbe      SHORT G_M15108_IG12
        movzx    rdx, word  ptr [rcx+14]
        cmp      edx, 63
-       je       SHORT G_M15108_IG07
+       je       SHORT G_M15108_IG08
        cmp      edx, 67
-       jne      SHORT G_M15108_IG07
+       jne      SHORT G_M15108_IG08
 G_M15108_IG04:
        mov      edx, 2
        lea      r11, [(reloc)]
@@ -55,22 +55,19 @@ G_M15108_IG05:
        mov      rcx, rax
 G_M15108_IG06:
        call     [ByteMark:read_comfile(ref)]
-       jmp      SHORT G_M15108_IG09
-G_M15108_IG06B:
-       mov      eax, -1
-       jmp      SHORT G_M15108_IG11
+       xor      eax, eax
 G_M15108_IG07:
-       mov      eax, -1
-G_M15108_IG08:
        lea      rsp, [rbp]
        pop      rbp
        ret      
+G_M15108_IG08:
+       mov      eax, -1
 G_M15108_IG09:
-       xor      eax, eax
-G_M15108_IG10:
        lea      rsp, [rbp]
        pop      rbp
        ret      
+G_M15108_IG10:
+       mov      eax, -1
 G_M15108_IG11:
        lea      rsp, [rbp]
        pop      rbp
@@ -89,9 +86,9 @@ G_M15108_IG14:
        mov      rcx, gword ptr [rcx]
        mov      rdx, gword ptr [rbp-10H]
        call     [Console:WriteLine(ref,ref)]
-       lea      rax, G_M15108_IG06B
+       lea      rax, G_M15108_IG10
 G_M15108_IG15:
        add      rsp, 48
        pop      rbp
        ret      
-; Total bytes of code 196, prolog size 17 for method ByteMark:parse_arg(ref):int:this
+; Total bytes of code 192, prolog size 17 for method ByteMark:parse_arg(ref):int:this

The same issue shows up in one of the motivating cases from https://github.com/dotnet/coreclr/issues/13466#issuecomment-323831444, and this fixes it.

jit-diff results (win x64 release --frameworks --tests):

Summary:
(Note: Lower is better)

Total bytes of diff: -6096 (-0.01 % of base)
    diff is an improvement.

Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes

Top file regressions by size (bytes):
           6 : System.Linq.Parallel.dasm (0.00 % of base)
           1 : JIT\Regression\VS-ia64-JIT\V1.2-M02\b14364\b14364\b14364.dasm (0.01 % of base)

Top file improvements by size (bytes):
       -1200 : System.Private.CoreLib.dasm (-0.04 % of base)
        -893 : Microsoft.CodeAnalysis.CSharp.dasm (-0.04 % of base)
        -240 : Microsoft.CSharp.dasm (-0.08 % of base)
        -220 : JIT\Methodical\fp\exgen\10w5d_cs_ro\10w5d_cs_ro.dasm (-0.02 % of base)
        -214 : Microsoft.CodeAnalysis.dasm (-0.03 % of base)

227 total files with size differences (225 improved, 2 regressed), 1814 unchanged.

Top method regessions by size (bytes):
         215 : Microsoft.CodeAnalysis.dasm - OneToOneUnicodeComparer:Equals(ref,ref):bool:this
         157 : Microsoft.CSharp.dasm - ExpressionBinder:bindUserDefinedConversion(ref,ref,ref,bool,byref,bool):bool:this
          33 : Microsoft.CodeAnalysis.CSharp.dasm - MemberSignatureComparer:HaveSameParameterTypes(struct,ref,struct,ref,bool,bool,bool):bool
          30 : Microsoft.CodeAnalysis.CSharp.dasm - ConsistentSymbolOrder:Compare(ref,ref):int:this
          29 : System.Private.CoreLib.dasm - DateTimeParse:ProcessDateTimeSuffix(byref,byref,byref):bool

Top method improvements by size (bytes):
        -351 : System.Private.CoreLib.dasm - Comparer`1:System.Collections.IComparer.Compare(ref,ref):int:this (39 methods)
        -228 : Microsoft.CodeAnalysis.dasm - OneToOneUnicodeComparer:Compare(ref,ref):int:this
         -71 : System.Private.CoreLib.dasm - ValueTuple`8:System.Collections.IStructuralComparable.CompareTo(ref,ref):int:this (15 methods)
         -58 : Microsoft.CodeAnalysis.CSharp.dasm - MethodSymbolExtensions:IsRuntimeFinalizer(ref,bool):bool
         -42 : Microsoft.CodeAnalysis.CSharp.dasm - SyntaxFacts:IsNamedArgumentName(ref):bool

1507 total methods with size differences (1301 improved, 206 regressed), 201280 unchanged.

The "regression" in OneToOneUnicodeComparer:Equals is just an instance of loop cloning kicking in.

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good.

While you're looking at this code, can you double-check what happens in minopts/debug modes? I'm not sure what the right behavior is there, come to think of it...

@JosephTremoulet
Copy link
Author

While you're looking at this code, can you double-check what happens in minopts/debug modes? I'm not sure what the right behavior is there, come to think of it...

The code to merge down to no more than four returns originally came from a correctness restriction on x86, so it's been firing unconditonally. Before #13792, therefore, the behavior under minopts/debug was the same as when optimizing -- to merge to one return if we have sync block etc., or if we have more than four returns. After #13792 and continuing with this change as currently coded up, the behavior under minopts/debug is still the same as when optimizing -- merge to one return if we have sync block etc., otherwise merge down to no more than four returns, segregating constants where possible.

Arguably the minopts/debug behavior should be more like the previous behavior -- in the case that there are more than four returns, shrink down to one, regardless of whether any are constant... do you think I should make that change?

@AndyAyersMS
Copy link
Member

Not sure what the right behavior should be. Also this is probably a separable issue.

Offhand I'd say that in minopts/debug we should always produce one return, but it is not clear if that always gives faster jit TP / smaller code size. Also it's at odds with minopts trying to do minimal "risky" work.

I suppose you could go back and see if there is any minopts TP/size impact from your original change and then if there is we can figure out what the appropriate behavior should be.

Tweak a few things so that generated constant return blocks get laid out
more optimally:

 - Don't set BBF_DONT_REMOVE on them; it's ok to move them around, and
   if all references to them get dropped, it's fine to eliminate them.
 - Insert them immediately after their lexically-last predecessor when
   generating them; this increases the likelihood of using fallthrough
   rather than gotos to target them in the face of fgReorderBlocks'
   reticence to reorder code that we don't have IBC data for and that
   hasn't been marked rare.
 - Make fgReorderBlocks slightly more aggressive for a pattern that now
   shows up somewhat routinely for returning compound conditionals from
   predicate functions.
@JosephTremoulet
Copy link
Author

Not sure what the right behavior should be

Hm, you raise some good points, I'm not so sure myself.

I suppose you could go back and see if there is any minopts TP/size impact from your original change and then if there is we can figure out what the appropriate behavior should be.

Good suggestion.

Also this is probably a separable issue.

Yes. Created #13915 to track.

@JosephTremoulet JosephTremoulet merged commit 1d1018a into dotnet:master Sep 13, 2017
JosephTremoulet added a commit to JosephTremoulet/coreclr that referenced this pull request Sep 13, 2017
Remove some `goto`s that were added  to work around undesirable jit
layout (#9692, fixed in dotnet#13314) and epilog factoring (improved in
 dotnet#13792 and dotnet#13903), which are no longer needed.

Resolves #13466.
jkotas pushed a commit that referenced this pull request Sep 14, 2017
Remove some `goto`s that were added  to work around undesirable jit
layout (#9692, fixed in #13314) and epilog factoring (improved in
 #13792 and #13903), which are no longer needed.

Resolves #13466.
@JosephTremoulet JosephTremoulet deleted the PlaceReturns branch September 14, 2017 13:57
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants