-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Potential performance regression in OrdinalIgnoreCase string comparision #59087
Comments
Tagging subscribers to this area: @tarekgh, @safern Issue DetailsOriginally detected by the bot in DrewScoggins/performance-2#4549 but not reported in dotnet/runtime (cc @DrewScoggins) Repro: git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter System.Globalization.Tests.StringEquality.Compare_Same_Upper A quick look at the historical data and zooming: It might have been caused by PGO (cc @AndyAyersMS @kunalspathak):
|
Standalone repro: // Licensed to the .NET Foundation under one or more agreements.
// The .NET Foundation licenses this file to you under the MIT license.
// See the LICENSE file in the project root for more information.
using System.Collections.Generic;
using System.IO;
using System.Linq;
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;
using BenchmarkDotNet.Configs;
using BenchmarkDotNet.Environments;
using BenchmarkDotNet.Jobs;
namespace System.Globalization.Tests
{
[Config(typeof(ConfigWithCustomEnvVars))]
[DisassemblyDiagnoser(maxDepth: 6, exportDiff: true)]
public class StringEquality
{
private class ConfigWithCustomEnvVars : ManualConfig
{
private const string JitNoInline = "COMPlus_JitNoInline";
public ConfigWithCustomEnvVars()
{
AddJob(Job.Default.WithRuntime(CoreRuntime.Core60).WithId("Default"));
AddJob(Job.Default.WithRuntime(CoreRuntime.Core60)
.WithEnvironmentVariables(new EnvironmentVariable("DOTNET_JitDisablePgo", "1"))
.WithId("No PGO"));
}
}
private string _value, _same, _sameUpper, _diffAtFirstChar;
public static IEnumerable<(CultureInfo CultureInfo, CompareOptions CompareOptions)> GetOptions()
{
yield return (new CultureInfo("en-US"), CompareOptions.OrdinalIgnoreCase);
}
[ParamsSource(nameof(GetOptions))]
public (CultureInfo CultureInfo, CompareOptions CompareOptions) Options;
[Params(1024)] // single execution path = single test case
public int Count;
[GlobalSetup]
public void Setup()
{
// we are using part of Alice's Adventures in Wonderland text as test data
char[] characters = File.ReadAllText(@"path\to\alice29.txt").Take(Count).ToArray();
_value = new string(characters);
_same = new string(characters);
_sameUpper = _same.ToUpper();
char[] copy = characters.ToArray();
copy[0] = (char)(copy[0] + 1);
_diffAtFirstChar = new string(copy);
}
[Benchmark] // the most work to do for IgnoreCase: every char needs to be compared and uppercased
public int Compare_Same_Upper() => Options.CultureInfo.CompareInfo.Compare(_value, _sameUpper, Options.CompareOptions);
public static void Main() => BenchmarkRunner.Run<StringEquality>();
}
} Asmdiff (NoPGO is on the left): https://www.diffchecker.com/2guAUS5i |
The most interesting part is that FullPGO is still slower than NoPGO mode cc @AndyAyersMS (I've not investigated why yet):
|
Another benchmark that got most likely affected by this change: But in this particular case only Unix-like systems got affected: System.Globalization.Tests.StringEquality.Compare_DifferentFirstChar(Count: 1024, Options: (en-US, Ordinal))
|
One more Unix-specific benchmark that regressed then: System.Buffers.Text.Tests.Utf8FormatterTests.FormatterInt64(value: 12345)
|
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsOriginally detected by the bot in DrewScoggins/performance-2#4549 but not reported in dotnet/runtime (cc @DrewScoggins) System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase))
Repro: git clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f net5.0 net6.0 --filter System.Globalization.Tests.StringEquality.Compare_Same_Upper A quick look at the historical data and zooming: It might have been caused by PGO (cc @AndyAyersMS @kunalspathak):
|
Looking at the first of these... will update with details. |
There seem to be several issues in First is that the PGO training data does not see many cases of case-insensitive comparison. So in the compound expression runtime/src/libraries/System.Private.CoreLib/src/System/Globalization/Ordinal.cs Lines 28 to 29 in 1b14c94
The first clause is true with likelihood 0.97. This means that the CSE In the benchmark however this code path is taken quite often and so PGO imposes an extra cost. One possible fix here is to broaden the set of inputs we see during PGO training, though it's also possible the likelihoods we see now are realistic. We should arguably revisit the CSE costing heuristics as it does not seem like doing this extra CSE would actually cause any issues. Second is that PGO data leads to poor layout decisions. The jit's block layout algorithm is locally greedy and this leads to globally sub-optimal layouts. One such example happens early in the method where there is a control flow diamond; with PGO There is work anticipated here in .NET 7 but no easy fix in the meantime. Third is that the early block reordering done by the JIT interferes with loop recognition, and the JIT mistakenly thinks there are two loops in the method (instead of one multi-exit loop). While concerning, it doesn't seem to cause problems here as the loops are not currently optimizable but the fact that the JIT does the reordering so early is indicative of two shortcomings: (1) loop recognition is too pattern sensitive; (2) optimizing block order should generally wait until later in the phase pipeline. This is also something we've seen elsewhere and may try and address in .NET 7.
|
I can repro the
The other cases to (longer values) to format also show small regressions. |
Can repro Using The 3 key methods are So regression seems to be coming from changes in
Note in both cases we fail to do some of the aggressive inlines because of budget checks. We should revisit this pattern of aggressive inlines, and or figure out how to accommodate this as part of fixing #41692. PGO/NoPGO have very similar inlines, save that with NoPGO we do one more inline which is likely not a factor here:
It is going to be difficult to pin down what exactly is leading to the perf loss here, given the size of the method, the substantial differences in generated code, and the lack of good tooling on unix, but I'll dig in and see if I can uncover anything. |
For the key inlinee
As a result we consider many of the blocks to be rarely executed.
This likely explains the perf degradation. This can be addressed in variety of ways (none of which are easily addressable in .NET 6).
|
I don't see anything here we can address at this point in .NET 6, so am going to move this out of the 6.0 milestone. |
A few more notes -- as with the case @EgorBo noted above
Looking at full pgo, we see it helps the Default
No PGO
Full PGO
I wonder how much of this might be that BDN doesn't run the Tier0 code sufficient times or something similar...? Playing around with Full PGO +
however that trick doesn't help the
|
Usually BDN invokes the code enough times for it to get promoted, but sometimes the background thread used by Tiered JIT does not get a chance to "kick in" and promote things to Tier 1. |
Performance of Runtime commit range for the perf drop was 489b034...250fda5 which doesn't show anything relevant. Perf repo commit range was dotnet/performance@759f8b0...bff6fec which also doesn't show anything relevant. So guessing |
All the other benchmarks other than More recently there seems to have been a more or less sustained regression in early May 2022 with #68869 where we no longer dip below 1200. This was undone at the end of May via #70144 but we did not completely recover the old perf. There does not seem to be any good explanation for the recent spike up to 1800 and then drop back down. |
Drilling into The late block reordering scrambles the loop body, interposing non-loop blocks. This sort of reordering is risky unless you have very high confidence in your profile. This is something I hope we can improve on in .NET 8. |
Seems like this might not have been spotted by auto filing, going to double-check. A it was dotnet/perf-autofiling-issues#12928 that we evidently never triaged. |
Originally detected by the bot in DrewScoggins/performance-2#4549 but not reported in dotnet/runtime (cc @DrewScoggins)
System.Globalization.Tests.StringEquality.Compare_Same_Upper(Count: 1024, Options: (en-US, OrdinalIgnoreCase))
Repro:
A quick look at the historical data
and zooming:
It might have been caused by PGO (cc @AndyAyersMS @kunalspathak):
f64246c...25f1800
category:performance
theme:benchmarks
The text was updated successfully, but these errors were encountered: