-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Perf] Regression in System.Collections.CtorFromCollection<String>.ConcurrentQueue #56017
Comments
Tagging subscribers to this area: @eiriktsarpalis Issue DetailsRun Information
Regressions in System.Collections.CtorFromCollection<String>
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Collections.CtorFromCollection<String>*' PayloadsHistogramSystem.Collections.CtorFromCollection<String>.LinkedList(Size: 512)
System.Collections.CtorFromCollection<String>.ConcurrentQueue(Size: 512)
DocsProfiling workflow for dotnet/runtime repository
|
There were no changes to S.C.C in this interval and nothing jumps out of the other libraries history (I would assume this test uses not much else) This is labeled x64 meaning it didn't regress on other arch. Falling back to codegen, I see these:
The test loops over 512 strings, adding each to a queue segment. there is no contention so I'm not sure there is much nested looping though. @BruceForstall is it possible something like #55299 could have affected this? |
#55299 could be the cause if (1) there's a tight, commonly-executed loop, (2) the JIT's loop cloning kicked in (this happens for loops that look like "for (int i = 0; i < n; i++)") and there's an array access using the loop variable, e.g., "a[i]", where it knows it can get rid of the array bounds check, and (3) there's some other conditional branch in the loop. Then, it's possible that a change is code alignment causes the Intel jcc erratum to kick in. #54345 is another possibility. cc @kunalspathak |
I reverted each of below commits and generated disassembly of entire benchmark run but didn't see anything that comes out:
VTune points at Further digging shows that I tried looking at disassembly difference before and after LSRA changes in #54345 and the only thing that comes out is before LSRA change, we would spill the value into |
I'm not able to discern a regression running the particular benchmark on my machine. Here are the results of a few runs: 0733681 (baseline)
405da67 (compare)
Running statistical tests comparing the two commits with 3% threshold also show no perf regression. |
I will try and repro this on the lab machines. Maybe this is an architectural thing that we only hit on older Intel hardware. |
Should we close or move to 7.0.0? |
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsRun Information
Regressions in System.Collections.CtorFromCollection<String>
Reprogit clone https://github.com/dotnet/performance.git
py .\performance\scripts\benchmarks_ci.py -f netcoreapp5.0 --filter 'System.Collections.CtorFromCollection<String>*' PayloadsHistogramSystem.Collections.CtorFromCollection<String>.LinkedList(Size: 512)
System.Collections.CtorFromCollection<String>.ConcurrentQueue(Size: 512)
DocsProfiling workflow for dotnet/runtime repository
|
@DrewScoggins have you reproduced this regression on a newer hardware? |
I will look at this right now on my local machine. |
This was collected on my machine
|
Ok, it looks like the originally reported baseline commit was in fact after the regression was first recorded. This might explain why I wasn't seeing any regressions on my machine. I'd need to run a bisect on the correct commit range, but at this point I find it unlikely we'd be able to squeeze in a fix this late. Looking at older history for that particular benchmark it would seem like performance is a net improvement compared to .NET 5, so I would suggest punting this to 7.0.0: |
Using a wider commit range 890fde7...405da67 still does not show regression on my machine. The benchmark results tend to fluctuate in the order 500ns across runs and this is typically the only source of variation when comparing builds. More importantly statistical testing with 1% threshold does not indicate any performance regressions either: cd src/benchmarks/micro
dotnet run -c release -f net6.0 --corerun \
<path to baseline commit build> \
<path to compare commit build> \
--filter 'System.Collections.CtorFromCollection<String>.ConcurrentQueue' \
--statisticalTest 1% Getting the following results:
Sample results:
@DrewScoggins given that you're measuring a consistent regression on your hardware, would it be possible to run a git-bisect? |
Let's invetigate the perf regression in .NET 7. |
Maybe moving this instruction broke store-to-load forwarding? Could also be a combination of that and the data dependency of the spill being closer now, so the CPU might stall on that. In any case I think that changing the spilling logic is just as likely to come with a bunch more regressions and not something we should be doing at this point in the release cycle. Looking at the benchmark data it also seems like performance is improved in .NET 7 compared to .NET 6, so I will just close this. |
Run Information
Regressions in System.Collections.CtorFromCollection<String>
Historical Data in Reporting System
Repro
Payloads
Baseline
Compare
Histogram
System.Collections.CtorFromCollection<String>.LinkedList(Size: 512)
System.Collections.CtorFromCollection<String>.ConcurrentQueue(Size: 512)
Docs
Profiling workflow for dotnet/runtime repository
Benchmarking workflow for dotnet/runtime repository
The text was updated successfully, but these errors were encountered: