Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Increase max loops optimized by RyuJIT from 16 to 64. #55614

Merged
merged 1 commit into from
Jul 15, 2021

Conversation

BruceForstall
Copy link
Member

16 seems remarkably small. 64 was not scientifically chosen, but does cover more
cases.

Note that this number is used to allocate the statically-sized loop table, as well as for memory allocation for value
numbering, so there is some overhead to increasing it.

A few microbenchmarks that have diffs show benefit, including 9% for MulMatrix

Method Job Toolchain Mean Error StdDev Median Min Max Ratio
LLoops Job-JXEMSM \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe 559.9 ms 8.20 ms 7.67 ms 556.4 ms 550.6 ms 576.0 ms 1.00
LLoops Job-MUOLTV \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe 552.3 ms 5.84 ms 5.46 ms 552.0 ms 542.4 ms 561.1 ms 0.99
MulMatrix Job-JXEMSM \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe 369.7 ms 4.01 ms 3.56 ms 369.9 ms 364.6 ms 376.9 ms 1.00
MulMatrix Job-MUOLTV \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe 338.1 ms 2.69 ms 2.51 ms 337.7 ms 332.2 ms 341.9 ms 0.91
Puzzle Job-JXEMSM \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe 403.4 ms 6.93 ms 6.48 ms 402.3 ms 394.8 ms 412.5 ms 1.00
Puzzle Job-MUOLTV \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe 394.9 ms 4.16 ms 3.68 ms 395.5 ms 388.2 ms 401.8 ms 0.98

spmi diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 6264
Total bytes of diff: 6285
Total bytes of delta: 21 (0.34% of base)
Total relative delta: 0.00
    diff is a regression.
    relative diff is a regression.
Detail diffs

Top file regressions (bytes):
          21 : 42451.dasm (0.34% of base)

1 total files with Code Size differences (0 improved, 1 regressed), 0 unchanged.

Top method regressions (bytes):
          21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel

Top method regressions (percentages):
          21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel

1 total methods with Code Size differences (0 improved, 1 regressed), 0 unchanged.



Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 39243
Total bytes of diff: 39940
Total bytes of delta: 697 (1.78% of base)
Total relative delta: 0.16
    diff is a regression.
    relative diff is a regression.
Detail diffs

Top file regressions (bytes):
         536 : 25723.dasm (18.29% of base)
         185 : 26877.dasm (2.21% of base)
          66 : 16212.dasm (1.39% of base)
          16 : 16196.dasm (0.36% of base)
          13 : 13322.dasm (0.27% of base)
           9 : 15596.dasm (0.16% of base)

Top file improvements (bytes):
        -125 : 27270.dasm (-6.93% of base)
          -3 : 13993.dasm (-0.05% of base)

8 total files with Code Size differences (2 improved, 6 regressed), 0 unchanged.

Top method regressions (bytes):
         536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int)
          16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel
          13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner)
           9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel

Top method improvements (bytes):
        -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int)

Top method regressions (percentages):
         536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int)
          16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel
          13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner)
           9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel

Top method improvements (percentages):
        -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int)

8 total methods with Code Size differences (2 improved, 6 regressed), 0 unchanged.



Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 142908
Total bytes of diff: 143387
Total bytes of delta: 479 (0.34% of base)
Total relative delta: 0.42
    diff is a regression.
    relative diff is a regression.
Detail diffs

Top file regressions (bytes):
         536 : 248723.dasm (18.29% of base)
         390 : 220441.dasm (14.45% of base)
         390 : 220391.dasm (14.61% of base)
         150 : 239253.dasm (1.78% of base)
          71 : 234803.dasm (1.72% of base)
          16 : 225588.dasm (1.00% of base)
          16 : 225590.dasm (1.00% of base)
           5 : 225285.dasm (0.26% of base)

Top file improvements (bytes):
        -359 : 215690.dasm (-0.99% of base)
        -320 : 215701.dasm (-1.16% of base)
        -128 : 215723.dasm (-0.73% of base)
        -128 : 215666.dasm (-0.62% of base)
        -125 : 239280.dasm (-6.93% of base)
         -29 : 216754.dasm (-0.33% of base)
          -3 : 225316.dasm (-0.15% of base)
          -3 : 225313.dasm (-0.15% of base)

16 total files with Code Size differences (8 improved, 8 regressed), 0 unchanged.

Top method regressions (bytes):
         536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         390 (14.45% of base) : 220441.dasm - VectorTest:Main():int
         390 (14.61% of base) : 220391.dasm - VectorTest:Main():int
         150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int
          16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
           5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method improvements (bytes):
        -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method regressions (percentages):
         536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         390 (14.61% of base) : 220391.dasm - VectorTest:Main():int
         390 (14.45% of base) : 220441.dasm - VectorTest:Main():int
         150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int
          16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
           5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method improvements (percentages):
        -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
        -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
         -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

16 total methods with Code Size differences (8 improved, 8 regressed), 0 unchanged.



Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 15351
Total bytes of diff: 15448
Total bytes of delta: 97 (0.63% of base)
Total relative delta: 0.08
    diff is a regression.
    relative diff is a regression.
Detail diffs

Top file regressions (bytes):
          71 : 63973.dasm (7.63% of base)
          26 : 106039.dasm (0.19% of base)

2 total files with Code Size differences (0 improved, 2 regressed), 1 unchanged.

Top method regressions (bytes):
          71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

Top method regressions (percentages):
          71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

2 total methods with Code Size differences (0 improved, 2 regressed), 1 unchanged.



Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 22540
Total bytes of diff: 22576
Total bytes of delta: 36 (0.16% of base)
Total relative delta: 0.01
    diff is a regression.
    relative diff is a regression.
Detail diffs

Top file regressions (bytes):
          23 : 104600.dasm (0.15% of base)
          14 : 43143.dasm (0.47% of base)

Top file improvements (bytes):
          -1 : 35267.dasm (-0.03% of base)

3 total files with Code Size differences (1 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
          23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
          14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this

Top method improvements (bytes):
          -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this

Top method regressions (percentages):
          14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this
          23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

Top method improvements (percentages):
          -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this

3 total methods with Code Size differences (1 improved, 2 regressed), 0 unchanged.



Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 53782
Total bytes of diff: 53837
Total bytes of delta: 55 (0.10% of base)
Total relative delta: 0.00
    diff is a regression.
    relative diff is a regression.
Detail diffs

Top file regressions (bytes):
          28 : 28889.dasm (0.16% of base)
          27 : 28887.dasm (0.15% of base)

2 total files with Code Size differences (0 improved, 2 regressed), 2 unchanged.

Top method regressions (bytes):
          28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int
          27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int

Top method regressions (percentages):
          28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int
          27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int

2 total methods with Code Size differences (0 improved, 2 regressed), 2 unchanged.


16 seems remarkably small. Note that this number is used to allocate the
statically-sized loop table, as well as for memory allocation for value
numbering, so there is some overhead to increasing it.

A few microbenchmarks that have diffs show benefit, including 9% for MulMatrix

| Method |        Job |                                                                         Toolchain |     Mean |   Error |  StdDev |   Median |      Min |      Max | Ratio |
|------- |----------- |---------------------------------------------------------------------------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:|
| LLoops | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 559.9 ms | 8.20 ms | 7.67 ms | 556.4 ms | 550.6 ms | 576.0 ms |  1.00 |
| LLoops | Job-MUOLTV |  \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 552.3 ms | 5.84 ms | 5.46 ms | 552.0 ms | 542.4 ms | 561.1 ms |  0.99 |
| MulMatrix | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 369.7 ms | 4.01 ms | 3.56 ms | 369.9 ms | 364.6 ms | 376.9 ms |  1.00 |
| MulMatrix | Job-MUOLTV |  \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 338.1 ms | 2.69 ms | 2.51 ms | 337.7 ms | 332.2 ms | 341.9 ms |  0.91 |
| Puzzle | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 403.4 ms | 6.93 ms | 6.48 ms | 402.3 ms | 394.8 ms | 412.5 ms |  1.00 |
| Puzzle | Job-MUOLTV |  \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 394.9 ms | 4.16 ms | 3.68 ms | 395.5 ms | 388.2 ms | 401.8 ms |  0.98 |

spmi diffs:

```

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 6264
Total bytes of diff: 6285
Total bytes of delta: 21 (0.34% of base)
Total relative delta: 0.00
    diff is a regression.
    relative diff is a regression.
```
<details>

<summary>Detail diffs</summary>

```

Top file regressions (bytes):
          21 : 42451.dasm (0.34% of base)

1 total files with Code Size differences (0 improved, 1 regressed), 0 unchanged.

Top method regressions (bytes):
          21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel

Top method regressions (percentages):
          21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel

1 total methods with Code Size differences (0 improved, 1 regressed), 0 unchanged.

```

</details>

--------------------------------------------------------------------------------

```

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 39243
Total bytes of diff: 39940
Total bytes of delta: 697 (1.78% of base)
Total relative delta: 0.16
    diff is a regression.
    relative diff is a regression.
```
<details>

<summary>Detail diffs</summary>

```

Top file regressions (bytes):
         536 : 25723.dasm (18.29% of base)
         185 : 26877.dasm (2.21% of base)
          66 : 16212.dasm (1.39% of base)
          16 : 16196.dasm (0.36% of base)
          13 : 13322.dasm (0.27% of base)
           9 : 15596.dasm (0.16% of base)

Top file improvements (bytes):
        -125 : 27270.dasm (-6.93% of base)
          -3 : 13993.dasm (-0.05% of base)

8 total files with Code Size differences (2 improved, 6 regressed), 0 unchanged.

Top method regressions (bytes):
         536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int)
          16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel
          13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner)
           9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel

Top method improvements (bytes):
        -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int)

Top method regressions (percentages):
         536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int)
          16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel
          13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner)
           9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel

Top method improvements (percentages):
        -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int)

8 total methods with Code Size differences (2 improved, 6 regressed), 0 unchanged.

```

</details>

--------------------------------------------------------------------------------

```

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 142908
Total bytes of diff: 143387
Total bytes of delta: 479 (0.34% of base)
Total relative delta: 0.42
    diff is a regression.
    relative diff is a regression.
```
<details>

<summary>Detail diffs</summary>

```

Top file regressions (bytes):
         536 : 248723.dasm (18.29% of base)
         390 : 220441.dasm (14.45% of base)
         390 : 220391.dasm (14.61% of base)
         150 : 239253.dasm (1.78% of base)
          71 : 234803.dasm (1.72% of base)
          16 : 225588.dasm (1.00% of base)
          16 : 225590.dasm (1.00% of base)
           5 : 225285.dasm (0.26% of base)

Top file improvements (bytes):
        -359 : 215690.dasm (-0.99% of base)
        -320 : 215701.dasm (-1.16% of base)
        -128 : 215723.dasm (-0.73% of base)
        -128 : 215666.dasm (-0.62% of base)
        -125 : 239280.dasm (-6.93% of base)
         -29 : 216754.dasm (-0.33% of base)
          -3 : 225316.dasm (-0.15% of base)
          -3 : 225313.dasm (-0.15% of base)

16 total files with Code Size differences (8 improved, 8 regressed), 0 unchanged.

Top method regressions (bytes):
         536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         390 (14.45% of base) : 220441.dasm - VectorTest:Main():int
         390 (14.61% of base) : 220391.dasm - VectorTest:Main():int
         150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int
          16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
           5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method improvements (bytes):
        -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method regressions (percentages):
         536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         390 (14.61% of base) : 220391.dasm - VectorTest:Main():int
         390 (14.45% of base) : 220441.dasm - VectorTest:Main():int
         150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int
          16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
           5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method improvements (percentages):
        -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
        -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
         -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

16 total methods with Code Size differences (8 improved, 8 regressed), 0 unchanged.

```

</details>

--------------------------------------------------------------------------------

```

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 15351
Total bytes of diff: 15448
Total bytes of delta: 97 (0.63% of base)
Total relative delta: 0.08
    diff is a regression.
    relative diff is a regression.
```
<details>

<summary>Detail diffs</summary>

```

Top file regressions (bytes):
          71 : 63973.dasm (7.63% of base)
          26 : 106039.dasm (0.19% of base)

2 total files with Code Size differences (0 improved, 2 regressed), 1 unchanged.

Top method regressions (bytes):
          71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

Top method regressions (percentages):
          71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

2 total methods with Code Size differences (0 improved, 2 regressed), 1 unchanged.

```

</details>

--------------------------------------------------------------------------------

```

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 22540
Total bytes of diff: 22576
Total bytes of delta: 36 (0.16% of base)
Total relative delta: 0.01
    diff is a regression.
    relative diff is a regression.
```
<details>

<summary>Detail diffs</summary>

```

Top file regressions (bytes):
          23 : 104600.dasm (0.15% of base)
          14 : 43143.dasm (0.47% of base)

Top file improvements (bytes):
          -1 : 35267.dasm (-0.03% of base)

3 total files with Code Size differences (1 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
          23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
          14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this

Top method improvements (bytes):
          -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this

Top method regressions (percentages):
          14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this
          23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

Top method improvements (percentages):
          -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this

3 total methods with Code Size differences (1 improved, 2 regressed), 0 unchanged.

```

</details>

--------------------------------------------------------------------------------

```

Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 53782
Total bytes of diff: 53837
Total bytes of delta: 55 (0.10% of base)
Total relative delta: 0.00
    diff is a regression.
    relative diff is a regression.
```
<details>

<summary>Detail diffs</summary>

```

Top file regressions (bytes):
          28 : 28889.dasm (0.16% of base)
          27 : 28887.dasm (0.15% of base)

2 total files with Code Size differences (0 improved, 2 regressed), 2 unchanged.

Top method regressions (bytes):
          28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int
          27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int

Top method regressions (percentages):
          28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int
          27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int

2 total methods with Code Size differences (0 improved, 2 regressed), 2 unchanged.

```

</details>

--------------------------------------------------------------------------------
@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 14, 2021
@BruceForstall
Copy link
Member Author

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop

@azure-pipelines
Copy link

Azure Pipelines successfully started running 3 pipeline(s).

@BruceForstall
Copy link
Member Author

@AndyAyersMS @dotnet/jit-contrib PTAL

@BruceForstall
Copy link
Member Author

BruceForstall commented Jul 14, 2021

If anyone has an opinion on how the "best" number should be chosen, I'd be happy to hear it. Making it dynamic perhaps would be an option as well: not have a maximum. Not sure if there are algorithms that would need to be reconsidered to avoid bad behavior for large numbers.

@kunalspathak
Copy link
Member

Curious - does choosing 32 not showing enough wins? Can we expose a COMPlus_MaxLoops or something that will help us experiment more? Is there a TP impact?

@AndyAyersMS
Copy link
Member

Does 64 cover all the SPMI cases...?

This might be a place where a more extensive SPMI collection would prove valuable.

@kunalspathak
Copy link
Member

This might be a place where a more extensive SPMI collection would prove valuable.

Could you elaborate on that?

@AndyAyersMS
Copy link
Member

This might be a place where a more extensive SPMI collection would prove valuable.

Could you elaborate on that?

I'm referring to some of the internal collections we had at one point, over much larger amounts of code.

@BruceForstall
Copy link
Member Author

I enabled COUNT_LOOPS and ran it across a merged spmi mega-collection (and bumped the number of histogram buckets), and get:

---------------------------------------------------
Loop stats
---------------------------------------------------
Total number of methods with loops is 64366
Total number of              loops is 92730
Maximum number of loops per method is   192
# of methods overflowing nat loop table is    49
Total number of 'unnatural' loops is 135600
# of methods overflowing unnat loop limit is     0
Total number of loops with an         iterator is 32376
Total number of loops with a simple   iterator is 32376
Total number of loops with a constant iterator is  5515
--------------------------------------------------
Loop count frequency table:
--------------------------------------------------
     <=          0 ===>   11043 count ( 14% of total)
      1 ..       1 ===>   48707 count ( 79% of total)
      2 ..       2 ===>   10489 count ( 93% of total)
      3 ..       3 ===>    2954 count ( 97% of total)
      4 ..       4 ===>     932 count ( 98% of total)
      5 ..       5 ===>     496 count ( 98% of total)
      6 ..       6 ===>     268 count ( 99% of total)
      7 ..       7 ===>     118 count ( 99% of total)
      8 ..       8 ===>      95 count ( 99% of total)
      9 ..       9 ===>      63 count ( 99% of total)
     10 ..      10 ===>      50 count ( 99% of total)
     11 ..      11 ===>      44 count ( 99% of total)
     12 ..      12 ===>      19 count ( 99% of total)
     13 ..      13 ===>      23 count ( 99% of total)
     14 ..      14 ===>      34 count ( 99% of total)
     15 ..      15 ===>      12 count ( 99% of total)
     16 ..      16 ===>      13 count ( 99% of total)
     17 ..      17 ===>       2 count ( 99% of total)
     18 ..      18 ===>       9 count ( 99% of total)
     19 ..      19 ===>       1 count ( 99% of total)
     20 ..      20 ===>       6 count ( 99% of total)
     21 ..      21 ===>       1 count ( 99% of total)
     22 ..      22 ===>       1 count ( 99% of total)
     23 ..      23 ===>       0 count ( 99% of total)
     24 ..      24 ===>       1 count ( 99% of total)
     25 ..      25 ===>       4 count ( 99% of total)
     26 ..      26 ===>       1 count ( 99% of total)
     27 ..      27 ===>       0 count ( 99% of total)
     28 ..      27 ===>       0 count ( 99% of total)
     28 ..      29 ===>       6 count ( 99% of total)
     30 ..      30 ===>       2 count ( 99% of total)
     31 ..      31 ===>       0 count ( 99% of total)
     32 ..      32 ===>       0 count ( 99% of total)
     33 ..      33 ===>       0 count ( 99% of total)
     34 ..      34 ===>       0 count ( 99% of total)
     35 ..      35 ===>       1 count ( 99% of total)
     36 ..      36 ===>       2 count ( 99% of total)
     37 ..      37 ===>       0 count ( 99% of total)
     38 ..      38 ===>       0 count ( 99% of total)
     39 ..      39 ===>       0 count ( 99% of total)
     40 ..      40 ===>       0 count ( 99% of total)
     41 ..      41 ===>       0 count ( 99% of total)
     42 ..      42 ===>       0 count ( 99% of total)
     43 ..      43 ===>       0 count ( 99% of total)
     44 ..      44 ===>       0 count ( 99% of total)
     45 ..      45 ===>       2 count ( 99% of total)
     46 ..      46 ===>       2 count ( 99% of total)
     47 ..      47 ===>       0 count ( 99% of total)
     48 ..      48 ===>       1 count ( 99% of total)
     49 ..      49 ===>       0 count ( 99% of total)
     50 ..      50 ===>       0 count ( 99% of total)
     51 ..      51 ===>       0 count ( 99% of total)
     52 ..      52 ===>       0 count ( 99% of total)
     53 ..      53 ===>       0 count ( 99% of total)
     54 ..      54 ===>       0 count ( 99% of total)
     55 ..      55 ===>       0 count ( 99% of total)
     56 ..      56 ===>       0 count ( 99% of total)
     57 ..      57 ===>       0 count ( 99% of total)
     58 ..      58 ===>       0 count ( 99% of total)
     59 ..      59 ===>       1 count (100% of total)
     60 ..      60 ===>       0 count (100% of total)
      >         60 ===>       6 count (100% of total)
--------------------------------------------------
Loop exit count frequency table:
--------------------------------------------------
     <=          0 ===>     126 count (  0% of total)
      1 ..       1 ===>   57237 count ( 64% of total)
      2 ..       2 ===>   21195 count ( 87% of total)
      3 ..       3 ===>    5175 count ( 93% of total)
      4 ..       4 ===>    2758 count ( 96% of total)
      5 ..       5 ===>    2164 count ( 99% of total)
      6 ..       6 ===>     892 count (100% of total)
      >          6 ===>    3183 count (103% of total)
--------------------------------------------------

So, sticking with powers of 2, 32 only misses ~9 functions with > 32 loops. Even 16 max loops per function hits 99% of all loops (no surprise). Of course there's some crazy outlier: 192 loops in (at least) one function.

The stats for just the benchmarks is:

---------------------------------------------------
Loop stats
---------------------------------------------------
Total number of methods with loops is  2679
Total number of              loops is  4326
Maximum number of loops per method is    61
# of methods overflowing nat loop table is    10
Total number of 'unnatural' loops is  4822
# of methods overflowing unnat loop limit is     0
Total number of loops with an         iterator is  1534
Total number of loops with a simple   iterator is  1534
Total number of loops with a constant iterator is   398
--------------------------------------------------
Loop count frequency table:
--------------------------------------------------
     <=          0 ===>     213 count (  7% of total)
      1 ..       1 ===>    1984 count ( 75% of total)
      2 ..       2 ===>     415 count ( 90% of total)
      3 ..       3 ===>     144 count ( 95% of total)
      4 ..       4 ===>      59 count ( 97% of total)
      5 ..       5 ===>      29 count ( 98% of total)
      6 ..       6 ===>      13 count ( 98% of total)
      7 ..       7 ===>       5 count ( 98% of total)
      8 ..       8 ===>       1 count ( 99% of total)
      9 ..       9 ===>       2 count ( 99% of total)
     10 ..      10 ===>       1 count ( 99% of total)
     11 ..      11 ===>       1 count ( 99% of total)
     12 ..      12 ===>       2 count ( 99% of total)
     13 ..      13 ===>       5 count ( 99% of total)
     14 ..      14 ===>       5 count ( 99% of total)
     15 ..      15 ===>       3 count ( 99% of total)
     16 ..      16 ===>       0 count ( 99% of total)
     17 ..      17 ===>       0 count ( 99% of total)
     18 ..      18 ===>       2 count ( 99% of total)
     19 ..      19 ===>       1 count ( 99% of total)
     20 ..      20 ===>       1 count ( 99% of total)
     21 ..      21 ===>       0 count ( 99% of total)
     22 ..      22 ===>       0 count ( 99% of total)
     23 ..      23 ===>       0 count ( 99% of total)
     24 ..      24 ===>       0 count ( 99% of total)
     25 ..      25 ===>       0 count ( 99% of total)
     26 ..      26 ===>       0 count ( 99% of total)
     27 ..      27 ===>       0 count ( 99% of total)
     28 ..      27 ===>       0 count ( 99% of total)
     28 ..      29 ===>       1 count ( 99% of total)
     30 ..      30 ===>       1 count ( 99% of total)
     31 ..      31 ===>       0 count ( 99% of total)
     32 ..      32 ===>       0 count ( 99% of total)
     33 ..      33 ===>       0 count ( 99% of total)
     34 ..      34 ===>       0 count ( 99% of total)
     35 ..      35 ===>       0 count ( 99% of total)
     36 ..      36 ===>       1 count ( 99% of total)
     37 ..      37 ===>       0 count ( 99% of total)
     38 ..      38 ===>       0 count ( 99% of total)
     39 ..      39 ===>       0 count ( 99% of total)
     40 ..      40 ===>       0 count ( 99% of total)
     41 ..      41 ===>       0 count ( 99% of total)
     42 ..      42 ===>       0 count ( 99% of total)
     43 ..      43 ===>       0 count ( 99% of total)
     44 ..      44 ===>       0 count ( 99% of total)
     45 ..      45 ===>       1 count ( 99% of total)
     46 ..      46 ===>       0 count ( 99% of total)
     47 ..      47 ===>       0 count ( 99% of total)
     48 ..      48 ===>       0 count ( 99% of total)
     49 ..      49 ===>       0 count ( 99% of total)
     50 ..      50 ===>       0 count ( 99% of total)
     51 ..      51 ===>       0 count ( 99% of total)
     52 ..      52 ===>       0 count ( 99% of total)
     53 ..      53 ===>       0 count ( 99% of total)
     54 ..      54 ===>       0 count ( 99% of total)
     55 ..      55 ===>       0 count ( 99% of total)
     56 ..      56 ===>       0 count ( 99% of total)
     57 ..      57 ===>       0 count ( 99% of total)
     58 ..      58 ===>       0 count ( 99% of total)
     59 ..      59 ===>       1 count (100% of total)
     60 ..      60 ===>       0 count (100% of total)
      >         60 ===>       1 count (100% of total)
--------------------------------------------------
Loop exit count frequency table:
--------------------------------------------------
     <=          0 ===>       0 count (  0% of total)
      1 ..       1 ===>    2426 count ( 58% of total)
      2 ..       2 ===>     995 count ( 82% of total)
      3 ..       3 ===>     330 count ( 90% of total)
      4 ..       4 ===>     237 count ( 96% of total)
      5 ..       5 ===>      96 count ( 98% of total)
      6 ..       6 ===>      68 count (100% of total)
      >          6 ===>     174 count (104% of total)
--------------------------------------------------

Unfortunately, the spmi collections, especially the benchmarks collection, have a lot of "MISSING" data currently, so it's not clear how that's skewing the data.

@BruceForstall
Copy link
Member Author

BruceForstall commented Jul 15, 2021

Interestingly, switching from 64 to 32 max loops still gives MulMatrix 9% improvement, but puzzle regresses by 6%!

(puzzle has 45 loops in its main function)

@BruceForstall
Copy link
Member Author

As for throughput: a PIN spmi run with max loops = 64 shows no statistically significant TP difference.

@BruceForstall
Copy link
Member Author

Test failures are all infra or known issues

Copy link
Member

@AndyAyersMS AndyAyersMS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants