-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RyuJIT: Tight loop performance regression #71628
Comments
Beside the codegen you should write the method like public int Sum()
{
if (Interlocked.Read(ref _state) == 0)
{
return 0;
}
int sum = 0;
int[] array = _array;
for (int i = 0; i < array.Length; ++i)
sum += array[i];
return sum;
} It's faster then, and codegen will be as expected. So key points are:
|
This issue has been marked |
This is an epic problem of LSRA's including resolution moves in the middle of the loop where spill/reload happens. The generated code that gfoidl suggested is much better, but regardless the LSRA problem should be handled in runtime. |
@gfoidl, the example I used here is quite reduced form of the real code piece (still reproducing the issue). Workaround is not something to worry about (I have several for my real problem). Thanks for the array-trick hint though. I've seen it before, but forgot happily. Those range checks are especially annoying to see in my case, because of I have array as |
This is likely won't get time during .NET 8, but I will mark it as Pri3 for .NET 8. |
This falls in the category of "resolution phase" of LSRA noted in #47194. |
Description
Consider the following method containing tight loop:
The method is modified then by adding a short check before the loop, as follows:
It is reasonable to expect (given that array is sufficiently long) that the check added should not affect method performance dramatically, right?
But it does:
Modified method becomes ~3x slower, which suggests that the loop is performed slower.
Analysis
Indeed, it is. Inspecting IL does not expose any difference (loop bodies are identical in both cases). Code generated by RyuJIT for the loop is different however.
In the first case, the sum is accumulated into eax register:
Where as in the second one it is accumulated into r9d register, which is loaded from and saved to stack on each iteration of the loop:
There is one additional jump in the second case also. See complete generated codes at the end.
Regression
As one may see from the benchmark results below, this is the regression actually:
Both methods were on par in the .NET Framework 4.8 and .NET Core 2.2
Generated code
.NET 6.0.6 (6.0.622.26707), X64 RyuJIT
category:cq
theme:register-allocator
skill-level:expert
cost:large
impact:large
The text was updated successfully, but these errors were encountered: