-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Increase max loops optimized by RyuJIT from 16 to 64. #55614
Conversation
16 seems remarkably small. Note that this number is used to allocate the statically-sized loop table, as well as for memory allocation for value numbering, so there is some overhead to increasing it. A few microbenchmarks that have diffs show benefit, including 9% for MulMatrix | Method | Job | Toolchain | Mean | Error | StdDev | Median | Min | Max | Ratio | |------- |----------- |---------------------------------------------------------------------------------- |---------:|--------:|--------:|---------:|---------:|---------:|------:| | LLoops | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 559.9 ms | 8.20 ms | 7.67 ms | 556.4 ms | 550.6 ms | 576.0 ms | 1.00 | | LLoops | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 552.3 ms | 5.84 ms | 5.46 ms | 552.0 ms | 542.4 ms | 561.1 ms | 0.99 | | MulMatrix | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 369.7 ms | 4.01 ms | 3.56 ms | 369.9 ms | 364.6 ms | 376.9 ms | 1.00 | | MulMatrix | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 338.1 ms | 2.69 ms | 2.51 ms | 337.7 ms | 332.2 ms | 341.9 ms | 0.91 | | Puzzle | Job-JXEMSM | \runtime2\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 403.4 ms | 6.93 ms | 6.48 ms | 402.3 ms | 394.8 ms | 412.5 ms | 1.00 | | Puzzle | Job-MUOLTV | \runtime\artifacts\tests\coreclr\windows.x64.Checked\Tests\Core_Root\CoreRun.exe | 394.9 ms | 4.16 ms | 3.68 ms | 395.5 ms | 388.2 ms | 401.8 ms | 0.98 | spmi diffs: ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 6264 Total bytes of diff: 6285 Total bytes of delta: 21 (0.34% of base) Total relative delta: 0.00 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 21 : 42451.dasm (0.34% of base) 1 total files with Code Size differences (0 improved, 1 regressed), 0 unchanged. Top method regressions (bytes): 21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel Top method regressions (percentages): 21 ( 0.34% of base) : 42451.dasm - RelationalModel:Create(IModel,IRelationalAnnotationProvider):IRelationalModel 1 total methods with Code Size differences (0 improved, 1 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 39243 Total bytes of diff: 39940 Total bytes of delta: 697 (1.78% of base) Total relative delta: 0.16 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 536 : 25723.dasm (18.29% of base) 185 : 26877.dasm (2.21% of base) 66 : 16212.dasm (1.39% of base) 16 : 16196.dasm (0.36% of base) 13 : 13322.dasm (0.27% of base) 9 : 15596.dasm (0.16% of base) Top file improvements (bytes): -125 : 27270.dasm (-6.93% of base) -3 : 13993.dasm (-0.05% of base) 8 total files with Code Size differences (2 improved, 6 regressed), 0 unchanged. Top method regressions (bytes): 536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this 66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int) 16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel 13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner) 9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel Top method improvements (bytes): -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int) Top method regressions (percentages): 536 (18.29% of base) : 25723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 185 ( 2.21% of base) : 26877.dasm - Benchstone.BenchF.LLoops:Main1(int):this 66 ( 1.39% of base) : 16212.dasm - Jil.Deserialize.Methods:SkipWithLeadChar(System.IO.TextReader,int) 16 ( 0.36% of base) : 16196.dasm - DynamicClass:_DynamicMethod9(System.IO.TextReader,int):MicroBenchmarks.Serializers.MyEventsListerViewModel 13 ( 0.27% of base) : 13322.dasm - DynamicClass:Regex1_Go(System.Text.RegularExpressions.RegexRunner) 9 ( 0.16% of base) : 15596.dasm - DynamicClass:_DynamicMethod9(byref,int):MicroBenchmarks.Serializers.MyEventsListerViewModel Top method improvements (percentages): -125 (-6.93% of base) : 27270.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -3 (-0.05% of base) : 13993.dasm - Jil.Deserialize.Methods:SkipWithLeadCharThunkReader(byref,int) 8 total methods with Code Size differences (2 improved, 6 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 142908 Total bytes of diff: 143387 Total bytes of delta: 479 (0.34% of base) Total relative delta: 0.42 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 536 : 248723.dasm (18.29% of base) 390 : 220441.dasm (14.45% of base) 390 : 220391.dasm (14.61% of base) 150 : 239253.dasm (1.78% of base) 71 : 234803.dasm (1.72% of base) 16 : 225588.dasm (1.00% of base) 16 : 225590.dasm (1.00% of base) 5 : 225285.dasm (0.26% of base) Top file improvements (bytes): -359 : 215690.dasm (-0.99% of base) -320 : 215701.dasm (-1.16% of base) -128 : 215723.dasm (-0.73% of base) -128 : 215666.dasm (-0.62% of base) -125 : 239280.dasm (-6.93% of base) -29 : 216754.dasm (-0.33% of base) -3 : 225316.dasm (-0.15% of base) -3 : 225313.dasm (-0.15% of base) 16 total files with Code Size differences (8 improved, 8 regressed), 0 unchanged. Top method regressions (bytes): 536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 390 (14.45% of base) : 220441.dasm - VectorTest:Main():int 390 (14.61% of base) : 220391.dasm - VectorTest:Main():int 150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this 71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int 16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int Top method improvements (bytes): -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int Top method regressions (percentages): 536 (18.29% of base) : 248723.dasm - Benchstone.BenchI.MulMatrix:Inner(System.Int32[][],System.Int32[][],System.Int32[][]) 390 (14.61% of base) : 220391.dasm - VectorTest:Main():int 390 (14.45% of base) : 220441.dasm - VectorTest:Main():int 150 ( 1.78% of base) : 239253.dasm - Benchstone.BenchF.LLoops:Main1(int):this 71 ( 1.72% of base) : 234803.dasm - SmallLoop1:Main():int 16 ( 1.00% of base) : 225588.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 16 ( 1.00% of base) : 225590.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 5 ( 0.26% of base) : 225285.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int Top method improvements (percentages): -125 (-6.93% of base) : 239280.dasm - Benchstone.BenchI.Puzzle:DoIt():bool:this -320 (-1.16% of base) : 215701.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -359 (-0.99% of base) : 215690.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.73% of base) : 215723.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -128 (-0.62% of base) : 215666.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -29 (-0.33% of base) : 216754.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225316.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int -3 (-0.15% of base) : 225313.dasm - IntelHardwareIntrinsicTest.Program:Main(System.String[]):int 16 total methods with Code Size differences (8 improved, 8 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 15351 Total bytes of diff: 15448 Total bytes of delta: 97 (0.63% of base) Total relative delta: 0.08 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 71 : 63973.dasm (7.63% of base) 26 : 106039.dasm (0.19% of base) 2 total files with Code Size differences (0 improved, 2 regressed), 1 unchanged. Top method regressions (bytes): 71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this 26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this Top method regressions (percentages): 71 ( 7.63% of base) : 63973.dasm - Microsoft.Diagnostics.Tracing.Parsers.Symbol.FileVersionTraceData:ToXml(System.Text.StringBuilder):System.Text.StringBuilder:this 26 ( 0.19% of base) : 106039.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this 2 total methods with Code Size differences (0 improved, 2 regressed), 1 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 22540 Total bytes of diff: 22576 Total bytes of delta: 36 (0.16% of base) Total relative delta: 0.01 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 23 : 104600.dasm (0.15% of base) 14 : 43143.dasm (0.47% of base) Top file improvements (bytes): -1 : 35267.dasm (-0.03% of base) 3 total files with Code Size differences (1 improved, 2 regressed), 0 unchanged. Top method regressions (bytes): 23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this 14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this Top method improvements (bytes): -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this Top method regressions (percentages): 14 ( 0.47% of base) : 43143.dasm - Microsoft.CodeAnalysis.CSharp.Symbols.SourceMemberContainerTypeSymbol:ForceComplete(Microsoft.CodeAnalysis.SourceLocation,System.Threading.CancellationToken):this 23 ( 0.15% of base) : 104600.dasm - Microsoft.VisualBasic.CompilerServices.VBBinder:BindToMethod(int,System.Reflection.MethodBase[],byref,System.Reflection.ParameterModifier[],System.Globalization.CultureInfo,System.String[],byref):System.Reflection.MethodBase:this Top method improvements (percentages): -1 (-0.03% of base) : 35267.dasm - Microsoft.CodeAnalysis.CSharp.Syntax.InternalSyntax.LanguageParser:ParseNamespaceBody(byref,byref,byref,ushort):this 3 total methods with Code Size differences (1 improved, 2 regressed), 0 unchanged. ``` </details> -------------------------------------------------------------------------------- ``` Summary of Code Size diffs: (Lower is better) Total bytes of base: 53782 Total bytes of diff: 53837 Total bytes of delta: 55 (0.10% of base) Total relative delta: 0.00 diff is a regression. relative diff is a regression. ``` <details> <summary>Detail diffs</summary> ``` Top file regressions (bytes): 28 : 28889.dasm (0.16% of base) 27 : 28887.dasm (0.15% of base) 2 total files with Code Size differences (0 improved, 2 regressed), 2 unchanged. Top method regressions (bytes): 28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int 27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int Top method regressions (percentages): 28 ( 0.16% of base) : 28889.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach003.freach003.Test:MainMethod():int 27 ( 0.15% of base) : 28887.dasm - ManagedTests.DynamicCSharp.Conformance.dynamic.statements.freach.freach004.freach004.Test:MainMethod():int 2 total methods with Code Size differences (0 improved, 2 regressed), 2 unchanged. ``` </details> --------------------------------------------------------------------------------
/azp run runtime-coreclr jitstress, runtime-coreclr libraries-jitstress, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
@AndyAyersMS @dotnet/jit-contrib PTAL |
If anyone has an opinion on how the "best" number should be chosen, I'd be happy to hear it. Making it dynamic perhaps would be an option as well: not have a maximum. Not sure if there are algorithms that would need to be reconsidered to avoid bad behavior for large numbers. |
Curious - does choosing 32 not showing enough wins? Can we expose a |
Does 64 cover all the SPMI cases...? This might be a place where a more extensive SPMI collection would prove valuable. |
Could you elaborate on that? |
I'm referring to some of the internal collections we had at one point, over much larger amounts of code. |
I enabled
So, sticking with powers of 2, 32 only misses ~9 functions with > 32 loops. Even 16 max loops per function hits 99% of all loops (no surprise). Of course there's some crazy outlier: 192 loops in (at least) one function. The stats for just the benchmarks is:
Unfortunately, the spmi collections, especially the benchmarks collection, have a lot of "MISSING" data currently, so it's not clear how that's skewing the data. |
Interestingly, switching from 64 to 32 max loops still gives MulMatrix 9% improvement, but puzzle regresses by 6%! (puzzle has 45 loops in its main function) |
As for throughput: a PIN spmi run with max loops = 64 shows no statistically significant TP difference. |
Test failures are all infra or known issues |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
16 seems remarkably small. 64 was not scientifically chosen, but does cover more
cases.
Note that this number is used to allocate the statically-sized loop table, as well as for memory allocation for value
numbering, so there is some overhead to increasing it.
A few microbenchmarks that have diffs show benefit, including 9% for MulMatrix
spmi diffs:
Detail diffs
Detail diffs
Detail diffs
Detail diffs
Detail diffs
Detail diffs