-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Skip CSE for non-hoistable floats #57438
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsFixes #57087 regression. if we decide not to hoist a FP value - we better also skip CSE for it. Example (Linux, where we don't have callee-saved XMMs): public static void Testik()
{
for (int i = 0; i < 100; i++)
{
Console.WriteLine(3.14f);
Console.WriteLine(3.14f);
}
} Codegen diff: https://www.diffchecker.com/gHTKTLAf @AndyAyersMS proposed a better fix, but it's more complicated and unlikely to make into .NET 6.0. Also, since SuperPMI.py's diff is quite small and mostly an improvement (including benchmarks.run.Linux.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.Linux.x64.checked.mch:
Detail diffs
libraries.crossgen2.Linux.x64.checked.mch:
Detail diffs
|
Burgers.Test0 (#57087) diff: https://www.diffchecker.com/HAYcIrcR Size regression example: https://www.diffchecker.com/NiaNkVA9 (loop alignment) |
It looks like CSE already tries to account for this: runtime/src/coreclr/jit/optcse.cpp Lines 2602 to 2616 in 225acfe
Any idea why it's not enough? |
Wow, thanks! Didn't know about runtime/src/coreclr/jit/optcse.cpp Line 2612 in 225acfe
1200 < 100 )
|
New diffs: benchmarks.run.Linux.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.Linux.x64.checked.mch:
Detail diffs
libraries.pmi.Linux.x64.checked.mch:
Detail diffs
libraries.crossgen2.Linux.x64.checked.mch:
Detail diffs
libraries_tests.pmi.Linux.x64.checked.mch:
Detail diffs
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Do the diffs generally look good? Can you post perfscore diffs?
Mostly, regressions don't look good in some cases, e.g.: public static void Testik(double real, double imaginary)
{
var complex = new Complex(real, imaginary);
Complex result = -complex;
VerifyRealImaginaryProperties(result, -complex.Real, -complex.Imaginary);
result = Complex.Negate(complex);
VerifyRealImaginaryProperties(result, -complex.Real, -complex.Imaginary);
} codegen diff: https://www.diffchecker.com/sM5lRuLM
So the heuristics have to be tuned more... |
Seems like there ought to be some cost element -- we likely only want to block cross-call CSEs for things that are cheap to recompute, like constants. |
CSE will estimate the use/def cost based on |
Seems like accurate costing is tricky. Uses that happen before a call are cheaper than the ones that happen after a call, but we only count the total number. We know there's at least one of each, I suppose. |
src/coreclr/jit/optcse.cpp
Outdated
if (candidate->Expr()->IsCnsFltOrDbl() && (CNT_CALLEE_SAVED_FLOAT == 0)) | ||
{ | ||
// We should do CSE for fp constants in case of LiveAcrossCall only when absolutely necessary | ||
// on ABIs without callee-saved registers. | ||
cse_use_cost += 2; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am ok with this more surgical fix for 6.0, but if we decide to go this route then you should open an issue about revisiting the float heuristics for .NET 7. It does not really make sense to adjust the use cost of a CSE based on the contents of the tree that is being CSE'd, I think.
Why uses after calls are more expensive? We're going to spill a value before a call and then load it to a reg again, so all uses after the call are going to use that reg making it as cheap as before-call uses, am I correct? (ah, well if that reg will be available) So I found the least painful workaround - I disable CSE for constants if there are 4 or less uses on SysV-like ABIs when we cross calls. jit-diff: benchmarks.run.Linux.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.Linux.x64.checked.mch:
Detail diffs
libraries.pmi.Linux.x64.checked.mch:
Detail diffs
libraries.crossgen2.Linux.x64.checked.mch:
Detail diffs
libraries_tests.pmi.Linux.x64.checked.mch:
Detail diffs
Overall it's mostly an improvement, but there are some regressions (all previous jit-diffs were even worse): https://www.diffchecker.com/ow2qn9Su @jakobbotsch @AndyAyersMS PTAL once again please |
The CSE heuristics aren't going to get every case. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This fix looks good to me
I realize that, but cases like this probably suggest that the heuristic can be adjusted (as it was now). |
…e-for-non-hoistable
Failures:
|
Fixes #57087 regression (I checked the actual benchmark after this PR).
if we decide not to hoist a FP value - we better also skip CSE for it. Example (Linux, where we don't have callee-saved XMMs):
Codegen diff: https://www.diffchecker.com/gHTKTLAf
@AndyAyersMS proposed a better fix, but it's more complicated and unlikely to make into .NET 6.0. Also, since SuperPMI.py's diff is quite small and mostly an improvement (including
Burgers.Test0
) I decided to give it a try.benchmarks.run.Linux.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.Linux.x64.checked.mch:
Detail diffs
libraries.crossgen2.Linux.x64.checked.mch:
Detail diffs