Increase max loops optimized by RyuJIT from 16 to 64. #55614

BruceForstall · 2021-07-14T01:08:38Z

16 seems remarkably small. 64 was not scientifically chosen, but does cover more
cases.

Note that this number is used to allocate the statically-sized loop table, as well as for memory allocation for value
numbering, so there is some overhead to increasing it.

A few microbenchmarks that have diffs show benefit, including 9% for MulMatrix

Method	Job	Toolchain	Mean	Error	StdDev	Median	Min	Max	Ratio
LLoops	Job-JXEMSM	\runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe	559.9 ms	8.20 ms	7.67 ms	556.4 ms	550.6 ms	576.0 ms	1.00
LLoops	Job-MUOLTV	\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe	552.3 ms	5.84 ms	5.46 ms	552.0 ms	542.4 ms	561.1 ms	0.99
MulMatrix	Job-JXEMSM	\runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe	369.7 ms	4.01 ms	3.56 ms	369.9 ms	364.6 ms	376.9 ms	1.00
MulMatrix	Job-MUOLTV	\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe	338.1 ms	2.69 ms	2.51 ms	337.7 ms	332.2 ms	341.9 ms	0.91
Puzzle	Job-JXEMSM	\runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe	403.4 ms	6.93 ms	6.48 ms	402.3 ms	394.8 ms	412.5 ms	1.00
Puzzle	Job-MUOLTV	\runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe	394.9 ms	4.16 ms	3.68 ms	395.5 ms	388.2 ms	401.8 ms	0.98

spmi diffs:


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 6264
Total bytes of diff: 6285
Total bytes of delta: 21 (0.34% of base)
Total relative delta: 0.00
    diff is a regression.
    relative diff is a regression.

Detail diffs


Top file regressions (bytes):
          21 : 42451.dasm (0.34% of base)

1 total files with Code Size differences (0 improved, 1 regressed), 0 unchanged.

Top method regressions (bytes):
          21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel

Top method regressions (percentages):
          21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel

1 total methods with Code Size differences (0 improved, 1 regressed), 0 unchanged.


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 39243
Total bytes of diff: 39940
Total bytes of delta: 697 (1.78% of base)
Total relative delta: 0.16
    diff is a regression.
    relative diff is a regression.

Detail diffs


Top file regressions (bytes):
         536 : 25723.dasm (18.29% of base)
         185 : 26877.dasm (2.21% of base)
          66 : 16212.dasm (1.39% of base)
          16 : 16196.dasm (0.36% of base)
          13 : 13322.dasm (0.27% of base)
           9 : 15596.dasm (0.16% of base)

Top file improvements (bytes):
        -125 : 27270.dasm (-6.93% of base)
          -3 : 13993.dasm (-0.05% of base)

8 total files with Code Size differences (2 improved, 6 regressed), 0 unchanged.

Top method regressions (bytes):
         536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int)
          16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel
          13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner)
           9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel

Top method improvements (bytes):
        -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int)

Top method regressions (percentages):
         536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int)
          16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel
          13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner)
           9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel

Top method improvements (percentages):
        -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
          -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int)

8 total methods with Code Size differences (2 improved, 6 regressed), 0 unchanged.


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 142908
Total bytes of diff: 143387
Total bytes of delta: 479 (0.34% of base)
Total relative delta: 0.42
    diff is a regression.
    relative diff is a regression.

Detail diffs


Top file regressions (bytes):
         536 : 248723.dasm (18.29% of base)
         390 : 220441.dasm (14.45% of base)
         390 : 220391.dasm (14.61% of base)
         150 : 239253.dasm (1.78% of base)
          71 : 234803.dasm (1.72% of base)
          16 : 225588.dasm (1.00% of base)
          16 : 225590.dasm (1.00% of base)
           5 : 225285.dasm (0.26% of base)

Top file improvements (bytes):
        -359 : 215690.dasm (-0.99% of base)
        -320 : 215701.dasm (-1.16% of base)
        -128 : 215723.dasm (-0.73% of base)
        -128 : 215666.dasm (-0.62% of base)
        -125 : 239280.dasm (-6.93% of base)
         -29 : 216754.dasm (-0.33% of base)
          -3 : 225316.dasm (-0.15% of base)
          -3 : 225313.dasm (-0.15% of base)

16 total files with Code Size differences (8 improved, 8 regressed), 0 unchanged.

Top method regressions (bytes):
         536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         390 (14.45% of base) : 220441.dasm - VectorTest:Main():int
         390 (14.61% of base) : 220391.dasm - VectorTest:Main():int
         150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int
          16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
           5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method improvements (bytes):
        -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
         -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method regressions (percentages):
         536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][])
         390 (14.61% of base) : 220391.dasm - VectorTest:Main():int
         390 (14.45% of base) : 220441.dasm - VectorTest:Main():int
         150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this
          71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int
          16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
           5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

Top method improvements (percentages):
        -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this
        -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
        -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
         -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int
          -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int

16 total methods with Code Size differences (8 improved, 8 regressed), 0 unchanged.


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 15351
Total bytes of diff: 15448
Total bytes of delta: 97 (0.63% of base)
Total relative delta: 0.08
    diff is a regression.
    relative diff is a regression.

Detail diffs


Top file regressions (bytes):
          71 : 63973.dasm (7.63% of base)
          26 : 106039.dasm (0.19% of base)

2 total files with Code Size differences (0 improved, 2 regressed), 1 unchanged.

Top method regressions (bytes):
          71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

Top method regressions (percentages):
          71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this
          26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

2 total methods with Code Size differences (0 improved, 2 regressed), 1 unchanged.


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 22540
Total bytes of diff: 22576
Total bytes of delta: 36 (0.16% of base)
Total relative delta: 0.01
    diff is a regression.
    relative diff is a regression.

Detail diffs


Top file regressions (bytes):
          23 : 104600.dasm (0.15% of base)
          14 : 43143.dasm (0.47% of base)

Top file improvements (bytes):
          -1 : 35267.dasm (-0.03% of base)

3 total files with Code Size differences (1 improved, 2 regressed), 0 unchanged.

Top method regressions (bytes):
          23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this
          14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this

Top method improvements (bytes):
          -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this

Top method regressions (percentages):
          14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this
          23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this

Top method improvements (percentages):
          -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this

3 total methods with Code Size differences (1 improved, 2 regressed), 0 unchanged.


Summary of Code Size diffs:
(Lower is better)

Total bytes of base: 53782
Total bytes of diff: 53837
Total bytes of delta: 55 (0.10% of base)
Total relative delta: 0.00
    diff is a regression.
    relative diff is a regression.

Detail diffs


Top file regressions (bytes):
          28 : 28889.dasm (0.16% of base)
          27 : 28887.dasm (0.15% of base)

2 total files with Code Size differences (0 improved, 2 regressed), 2 unchanged.

Top method regressions (bytes):
          28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int
          27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int

Top method regressions (percentages):
          28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int
          27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int

2 total methods with Code Size differences (0 improved, 2 regressed), 2 unchanged.

16 seems remarkably small. Note that this number is used to allocate the statically-sized loop table, as well as for memory allocation for value numbering, so there is some overhead to increasing it. A few microbenchmarks that have diffs show benefit, including 9% for MulMatrix | Method | Job | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio | |------- |----------- |---------------------------------------------------------------------------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:| | LLoops | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 559.9 ms | 8.20 ms | 7.67 ms | 556.4 ms | 550.6 ms | 576.0 ms | 1.00 | | LLoops | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 552.3 ms | 5.84 ms | 5.46 ms | 552.0 ms | 542.4 ms | 561.1 ms | 0.99 | | MulMatrix | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 369.7 ms | 4.01 ms | 3.56 ms | 369.9 ms | 364.6 ms | 376.9 ms | 1.00 | | MulMatrix | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 338.1 ms | 2.69 ms | 2.51 ms | 337.7 ms | 332.2 ms | 341.9 ms | 0.91 | | Puzzle | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 403.4 ms | 6.93 ms | 6.48 ms | 402.3 ms | 394.8 ms | 412.5 ms | 1.00 | | Puzzle | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 394.9 ms | 4.16 ms | 3.68 ms | 395.5 ms | 388.2 ms | 401.8 ms | 0.98 | spmi diffs: ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 6264 Total bytes of diff: 6285 Total bytes of delta: 21 (0.34% of base) Total relative delta: 0.00 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 21 : 42451.dasm (0.34% of base) 1 total files with Code Size differences (0 improved, 1 regressed), 0 unchanged. Top method regressions (bytes): 21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel Top method regressions (percentages): 21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel 1 total methods with Code Size differences (0 improved, 1 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 39243 Total bytes of diff: 39940 Total bytes of delta: 697 (1.78% of base) Total relative delta: 0.16 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 536 : 25723.dasm (18.29% of base) 185 : 26877.dasm (2.21% of base) 66 : 16212.dasm (1.39% of base) 16 : 16196.dasm (0.36% of base) 13 : 13322.dasm (0.27% of base) 9 : 15596.dasm (0.16% of base) Top file improvements (bytes): -125 : 27270.dasm (-6.93% of base) -3 : 13993.dasm (-0.05% of base) 8 total files with Code Size differences (2 improved, 6 regressed), 0 unchanged. Top method regressions (bytes): 536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this 66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int) 16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel 13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner) 9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel Top method improvements (bytes): -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int) Top method regressions (percentages): 536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this 66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int) 16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel 13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner) 9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel Top method improvements (percentages): -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int) 8 total methods with Code Size differences (2 improved, 6 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 142908 Total bytes of diff: 143387 Total bytes of delta: 479 (0.34% of base) Total relative delta: 0.42 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 536 : 248723.dasm (18.29% of base) 390 : 220441.dasm (14.45% of base) 390 : 220391.dasm (14.61% of base) 150 : 239253.dasm (1.78% of base) 71 : 234803.dasm (1.72% of base) 16 : 225588.dasm (1.00% of base) 16 : 225590.dasm (1.00% of base) 5 : 225285.dasm (0.26% of base) Top file improvements (bytes): -359 : 215690.dasm (-0.99% of base) -320 : 215701.dasm (-1.16% of base) -128 : 215723.dasm (-0.73% of base) -128 : 215666.dasm (-0.62% of base) -125 : 239280.dasm (-6.93% of base) -29 : 216754.dasm (-0.33% of base) -3 : 225316.dasm (-0.15% of base) -3 : 225313.dasm (-0.15% of base) 16 total files with Code Size differences (8 improved, 8 regressed), 0 unchanged. Top method regressions (bytes): 536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 390 (14.45% of base) : 220441.dasm - VectorTest:Main():int 390 (14.61% of base) : 220391.dasm - VectorTest:Main():int 150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this 71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int 16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int Top method improvements (bytes): -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int Top method regressions (percentages): 536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 390 (14.61% of base) : 220391.dasm - VectorTest:Main():int 390 (14.45% of base) : 220441.dasm - VectorTest:Main():int 150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this 71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int 16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int Top method improvements (percentages): -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 16 total methods with Code Size differences (8 improved, 8 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 15351 Total bytes of diff: 15448 Total bytes of delta: 97 (0.63% of base) Total relative delta: 0.08 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 71 : 63973.dasm (7.63% of base) 26 : 106039.dasm (0.19% of base) 2 total files with Code Size differences (0 improved, 2 regressed), 1 unchanged. Top method regressions (bytes): 71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this 26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this Top method regressions (percentages): 71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this 26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this 2 total methods with Code Size differences (0 improved, 2 regressed), 1 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 22540 Total bytes of diff: 22576 Total bytes of delta: 36 (0.16% of base) Total relative delta: 0.01 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 23 : 104600.dasm (0.15% of base) 14 : 43143.dasm (0.47% of base) Top file improvements (bytes): -1 : 35267.dasm (-0.03% of base) 3 total files with Code Size differences (1 improved, 2 regressed), 0 unchanged. Top method regressions (bytes): 23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this 14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this Top method improvements (bytes): -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this Top method regressions (percentages): 14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this 23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this Top method improvements (percentages): -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this 3 total methods with Code Size differences (1 improved, 2 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 53782 Total bytes of diff: 53837 Total bytes of delta: 55 (0.10% of base) Total relative delta: 0.00 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 28 : 28889.dasm (0.16% of base) 27 : 28887.dasm (0.15% of base) 2 total files with Code Size differences (0 improved, 2 regressed), 2 unchanged. Top method regressions (bytes): 28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int 27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int Top method regressions (percentages): 28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int 27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int 2 total methods with Code Size differences (0 improved, 2 regressed), 2 unchanged. ``` </details> --------------------------------------------------------------------------------

BruceForstall · 2021-07-14T01:08:54Z

/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop

azure-pipelines · 2021-07-14T01:09:30Z

Azure Pipelines successfully started running 3 pipeline(s).

BruceForstall · 2021-07-14T01:09:38Z

@AndyAyersMS @dotnet/jit-contrib PTAL

BruceForstall · 2021-07-14T01:10:33Z

If anyone has an opinion on how the "best" number should be chosen, I'd be happy to hear it. Making it dynamic perhaps would be an option as well: not have a maximum. Not sure if there are algorithms that would need to be reconsidered to avoid bad behavior for large numbers.

kunalspathak · 2021-07-14T01:46:42Z

Curious - does choosing 32 not showing enough wins? Can we expose a COMPlus_MaxLoops or something that will help us experiment more? Is there a TP impact?

AndyAyersMS · 2021-07-14T01:51:09Z

Does 64 cover all the SPMI cases...?

This might be a place where a more extensive SPMI collection would prove valuable.

kunalspathak · 2021-07-14T01:52:49Z

This might be a place where a more extensive SPMI collection would prove valuable.

Could you elaborate on that?

AndyAyersMS · 2021-07-14T01:56:18Z

This might be a place where a more extensive SPMI collection would prove valuable.

Could you elaborate on that?

I'm referring to some of the internal collections we had at one point, over much larger amounts of code.

BruceForstall · 2021-07-14T23:58:33Z

I enabled COUNT_LOOPS and ran it across a merged spmi mega-collection (and bumped the number of histogram buckets), and get:

---------------------------------------------------
Loop stats
---------------------------------------------------
Total number of methods with loops is 64366
Total number of              loops is 92730
Maximum number of loops per method is   192
# of methods overflowing nat loop table is    49
Total number of 'unnatural' loops is 135600
# of methods overflowing unnat loop limit is     0
Total number of loops with an         iterator is 32376
Total number of loops with a simple   iterator is 32376
Total number of loops with a constant iterator is  5515
--------------------------------------------------
Loop count frequency table:
--------------------------------------------------
     <=          0 ===>   11043 count ( 14% of total)
      1 ..       1 ===>   48707 count ( 79% of total)
      2 ..       2 ===>   10489 count ( 93% of total)
      3 ..       3 ===>    2954 count ( 97% of total)
      4 ..       4 ===>     932 count ( 98% of total)
      5 ..       5 ===>     496 count ( 98% of total)
      6 ..       6 ===>     268 count ( 99% of total)
      7 ..       7 ===>     118 count ( 99% of total)
      8 ..       8 ===>      95 count ( 99% of total)
      9 ..       9 ===>      63 count ( 99% of total)
     10 ..      10 ===>      50 count ( 99% of total)
     11 ..      11 ===>      44 count ( 99% of total)
     12 ..      12 ===>      19 count ( 99% of total)
     13 ..      13 ===>      23 count ( 99% of total)
     14 ..      14 ===>      34 count ( 99% of total)
     15 ..      15 ===>      12 count ( 99% of total)
     16 ..      16 ===>      13 count ( 99% of total)
     17 ..      17 ===>       2 count ( 99% of total)
     18 ..      18 ===>       9 count ( 99% of total)
     19 ..      19 ===>       1 count ( 99% of total)
     20 ..      20 ===>       6 count ( 99% of total)
     21 ..      21 ===>       1 count ( 99% of total)
     22 ..      22 ===>       1 count ( 99% of total)
     23 ..      23 ===>       0 count ( 99% of total)
     24 ..      24 ===>       1 count ( 99% of total)
     25 ..      25 ===>       4 count ( 99% of total)
     26 ..      26 ===>       1 count ( 99% of total)
     27 ..      27 ===>       0 count ( 99% of total)
     28 ..      27 ===>       0 count ( 99% of total)
     28 ..      29 ===>       6 count ( 99% of total)
     30 ..      30 ===>       2 count ( 99% of total)
     31 ..      31 ===>       0 count ( 99% of total)
     32 ..      32 ===>       0 count ( 99% of total)
     33 ..      33 ===>       0 count ( 99% of total)
     34 ..      34 ===>       0 count ( 99% of total)
     35 ..      35 ===>       1 count ( 99% of total)
     36 ..      36 ===>       2 count ( 99% of total)
     37 ..      37 ===>       0 count ( 99% of total)
     38 ..      38 ===>       0 count ( 99% of total)
     39 ..      39 ===>       0 count ( 99% of total)
     40 ..      40 ===>       0 count ( 99% of total)
     41 ..      41 ===>       0 count ( 99% of total)
     42 ..      42 ===>       0 count ( 99% of total)
     43 ..      43 ===>       0 count ( 99% of total)
     44 ..      44 ===>       0 count ( 99% of total)
     45 ..      45 ===>       2 count ( 99% of total)
     46 ..      46 ===>       2 count ( 99% of total)
     47 ..      47 ===>       0 count ( 99% of total)
     48 ..      48 ===>       1 count ( 99% of total)
     49 ..      49 ===>       0 count ( 99% of total)
     50 ..      50 ===>       0 count ( 99% of total)
     51 ..      51 ===>       0 count ( 99% of total)
     52 ..      52 ===>       0 count ( 99% of total)
     53 ..      53 ===>       0 count ( 99% of total)
     54 ..      54 ===>       0 count ( 99% of total)
     55 ..      55 ===>       0 count ( 99% of total)
     56 ..      56 ===>       0 count ( 99% of total)
     57 ..      57 ===>       0 count ( 99% of total)
     58 ..      58 ===>       0 count ( 99% of total)
     59 ..      59 ===>       1 count (100% of total)
     60 ..      60 ===>       0 count (100% of total)
      >         60 ===>       6 count (100% of total)
--------------------------------------------------
Loop exit count frequency table:
--------------------------------------------------
     <=          0 ===>     126 count (  0% of total)
      1 ..       1 ===>   57237 count ( 64% of total)
      2 ..       2 ===>   21195 count ( 87% of total)
      3 ..       3 ===>    5175 count ( 93% of total)
      4 ..       4 ===>    2758 count ( 96% of total)
      5 ..       5 ===>    2164 count ( 99% of total)
      6 ..       6 ===>     892 count (100% of total)
      >          6 ===>    3183 count (103% of total)
--------------------------------------------------

So, sticking with powers of 2, 32 only misses ~9 functions with > 32 loops. Even 16 max loops per function hits 99% of all loops (no surprise). Of course there's some crazy outlier: 192 loops in (at least) one function.

The stats for just the benchmarks is:

---------------------------------------------------
Loop stats
---------------------------------------------------
Total number of methods with loops is  2679
Total number of              loops is  4326
Maximum number of loops per method is    61
# of methods overflowing nat loop table is    10
Total number of 'unnatural' loops is  4822
# of methods overflowing unnat loop limit is     0
Total number of loops with an         iterator is  1534
Total number of loops with a simple   iterator is  1534
Total number of loops with a constant iterator is   398
--------------------------------------------------
Loop count frequency table:
--------------------------------------------------
     <=          0 ===>     213 count (  7% of total)
      1 ..       1 ===>    1984 count ( 75% of total)
      2 ..       2 ===>     415 count ( 90% of total)
      3 ..       3 ===>     144 count ( 95% of total)
      4 ..       4 ===>      59 count ( 97% of total)
      5 ..       5 ===>      29 count ( 98% of total)
      6 ..       6 ===>      13 count ( 98% of total)
      7 ..       7 ===>       5 count ( 98% of total)
      8 ..       8 ===>       1 count ( 99% of total)
      9 ..       9 ===>       2 count ( 99% of total)
     10 ..      10 ===>       1 count ( 99% of total)
     11 ..      11 ===>       1 count ( 99% of total)
     12 ..      12 ===>       2 count ( 99% of total)
     13 ..      13 ===>       5 count ( 99% of total)
     14 ..      14 ===>       5 count ( 99% of total)
     15 ..      15 ===>       3 count ( 99% of total)
     16 ..      16 ===>       0 count ( 99% of total)
     17 ..      17 ===>       0 count ( 99% of total)
     18 ..      18 ===>       2 count ( 99% of total)
     19 ..      19 ===>       1 count ( 99% of total)
     20 ..      20 ===>       1 count ( 99% of total)
     21 ..      21 ===>       0 count ( 99% of total)
     22 ..      22 ===>       0 count ( 99% of total)
     23 ..      23 ===>       0 count ( 99% of total)
     24 ..      24 ===>       0 count ( 99% of total)
     25 ..      25 ===>       0 count ( 99% of total)
     26 ..      26 ===>       0 count ( 99% of total)
     27 ..      27 ===>       0 count ( 99% of total)
     28 ..      27 ===>       0 count ( 99% of total)
     28 ..      29 ===>       1 count ( 99% of total)
     30 ..      30 ===>       1 count ( 99% of total)
     31 ..      31 ===>       0 count ( 99% of total)
     32 ..      32 ===>       0 count ( 99% of total)
     33 ..      33 ===>       0 count ( 99% of total)
     34 ..      34 ===>       0 count ( 99% of total)
     35 ..      35 ===>       0 count ( 99% of total)
     36 ..      36 ===>       1 count ( 99% of total)
     37 ..      37 ===>       0 count ( 99% of total)
     38 ..      38 ===>       0 count ( 99% of total)
     39 ..      39 ===>       0 count ( 99% of total)
     40 ..      40 ===>       0 count ( 99% of total)
     41 ..      41 ===>       0 count ( 99% of total)
     42 ..      42 ===>       0 count ( 99% of total)
     43 ..      43 ===>       0 count ( 99% of total)
     44 ..      44 ===>       0 count ( 99% of total)
     45 ..      45 ===>       1 count ( 99% of total)
     46 ..      46 ===>       0 count ( 99% of total)
     47 ..      47 ===>       0 count ( 99% of total)
     48 ..      48 ===>       0 count ( 99% of total)
     49 ..      49 ===>       0 count ( 99% of total)
     50 ..      50 ===>       0 count ( 99% of total)
     51 ..      51 ===>       0 count ( 99% of total)
     52 ..      52 ===>       0 count ( 99% of total)
     53 ..      53 ===>       0 count ( 99% of total)
     54 ..      54 ===>       0 count ( 99% of total)
     55 ..      55 ===>       0 count ( 99% of total)
     56 ..      56 ===>       0 count ( 99% of total)
     57 ..      57 ===>       0 count ( 99% of total)
     58 ..      58 ===>       0 count ( 99% of total)
     59 ..      59 ===>       1 count (100% of total)
     60 ..      60 ===>       0 count (100% of total)
      >         60 ===>       1 count (100% of total)
--------------------------------------------------
Loop exit count frequency table:
--------------------------------------------------
     <=          0 ===>       0 count (  0% of total)
      1 ..       1 ===>    2426 count ( 58% of total)
      2 ..       2 ===>     995 count ( 82% of total)
      3 ..       3 ===>     330 count ( 90% of total)
      4 ..       4 ===>     237 count ( 96% of total)
      5 ..       5 ===>      96 count ( 98% of total)
      6 ..       6 ===>      68 count (100% of total)
      >          6 ===>     174 count (104% of total)
--------------------------------------------------

Unfortunately, the spmi collections, especially the benchmarks collection, have a lot of "MISSING" data currently, so it's not clear how that's skewing the data.

BruceForstall · 2021-07-15T01:23:02Z

Interestingly, switching from 64 to 32 max loops still gives MulMatrix 9% improvement, but puzzle regresses by 6%!

(puzzle has 45 loops in its main function)

BruceForstall · 2021-07-15T01:37:14Z

As for throughput: a PIN spmi run with max loops = 64 shows no statistically significant TP difference.

BruceForstall · 2021-07-15T17:43:57Z

Test failures are all infra or known issues

AndyAyersMS

LGTM

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jul 14, 2021

BruceForstall requested a review from AndyAyersMS July 14, 2021 01:09

AndyAyersMS approved these changes Jul 15, 2021

View reviewed changes

BruceForstall merged commit 25686d5 into dotnet:main Jul 15, 2021

BruceForstall deleted the IncreaseMaxLoops branch July 15, 2021 17:52

ManickaP mentioned this pull request Jul 20, 2021

[QUIC] Remove AppContext switch from S.N.Quic #56027

Merged

This was referenced Jul 22, 2021

[Perf] Changes at 7/14/2021 8:40:34 PM DrewScoggins/performance-2#7615

Open

[Perf] Changes at 7/15/2021 9:01:47 AM DrewScoggins/performance-2#7617

Open

JulieLeeMSFT mentioned this pull request Aug 10, 2021

What's new in .NET 6 Preview 7 dotnet/core#6444

Closed

ghost locked as resolved and limited conversation to collaborators Aug 14, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Increase max loops optimized by RyuJIT from 16 to 64. #55614

Increase max loops optimized by RyuJIT from 16 to 64. #55614

BruceForstall commented Jul 14, 2021

BruceForstall commented Jul 14, 2021

azure-pipelines bot commented Jul 14, 2021

BruceForstall commented Jul 14, 2021

BruceForstall commented Jul 14, 2021 •

edited

Loading

kunalspathak commented Jul 14, 2021

AndyAyersMS commented Jul 14, 2021

kunalspathak commented Jul 14, 2021

AndyAyersMS commented Jul 14, 2021

BruceForstall commented Jul 14, 2021

BruceForstall commented Jul 15, 2021 •

edited

Loading

BruceForstall commented Jul 15, 2021

BruceForstall commented Jul 15, 2021

AndyAyersMS left a comment

Increase max loops optimized by RyuJIT from 16 to 64. #55614

Increase max loops optimized by RyuJIT from 16 to 64. #55614

Conversation

BruceForstall commented Jul 14, 2021

BruceForstall commented Jul 14, 2021

azure-pipelines bot commented Jul 14, 2021

BruceForstall commented Jul 14, 2021

BruceForstall commented Jul 14, 2021 • edited Loading

kunalspathak commented Jul 14, 2021

AndyAyersMS commented Jul 14, 2021

kunalspathak commented Jul 14, 2021

AndyAyersMS commented Jul 14, 2021

BruceForstall commented Jul 14, 2021

BruceForstall commented Jul 15, 2021 • edited Loading

BruceForstall commented Jul 15, 2021

BruceForstall commented Jul 15, 2021

AndyAyersMS left a comment

Choose a reason for hiding this comment

BruceForstall commented Jul 14, 2021 •

edited

Loading

BruceForstall commented Jul 15, 2021 •

edited

Loading