-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: PGO-based block reordering interferes with loop recognition #67318
Comments
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsWe currently rearrange the flow graph based on profile weights before we perform loop recognition. Since this recognition is lexical, rearrangement can impair loop recognition. Here's a simple example: ;; COMPlus_TieredPGO=1
;; COMPlus_TC_QuickJitForLoops=0
;; (optionally use release runtime, as checked runtimes now change tiering parameters to cause tiering up sooner)
using System;
using System.Runtime.CompilerServices;
using System.Threading;
interface I
{
public int F(int x, int y);
}
class Add : I
{
int I.F(int x, int y) => x + y;
}
class Mul : I
{
int I.F(int x, int y) => x * y;
}
class X
{
[MethodImpl(MethodImplOptions.NoInlining)]
public static int MapF(I m, int[] xs, int[] ys, int from, int to)
{
int r = 0;
for (int i = from; i < to; i++)
{
r += m.F(xs[i], ys[i]);
}
return r;
}
public static int Main()
{
int[] xs = new int[] { 1, 2, 3, 4 };
int[] ys = new int[] { 4, 3, 2, 1 };
I m = new Add();
I mm = new Mul();
int r = 0;
// comment these two out to create a heavily biased GDV
r += MapF(mm, xs, ys, 0, 3);
r -= MapF(mm, xs, ys, 0, 3);
for (int i = 0; i < 30; i++)
{
r += MapF(m, xs, ys, 0, 3);
Thread.Sleep(15);
}
Thread.Sleep(50);
for (int i = 0; i < 70; i++)
{
r += MapF(m, xs, ys, 0, 3);
}
return r / 20;
}
} As is we produce the flow graph on the left; with the two Because of this reordering, loop recognition arrives at very different results, even though the flow graphs are isomorphic:
We actually see a high volume of monomorphic call sites with PGO, meaning we will quite often be seeing profiles like the one at right where the GDV test is highly biased and the residual indirect call site is cold. Seems like we should look into deferring block reordering until after optimization.
|
In principle all we have to do here is move
So, it's going to take a bit of work to tease apart what is an essential constrant or to make the implicit more explicit. Initial results from a CQ/Code Size standpoint look very promising, so this is worth pushing on. I am wondering if we can get out of the strong dependence on |
Also, some subset of #66998 is likely explained by the above. |
Note we can't (currently) update loop recognition to handle non-contiguous loop bodies because it uses the block extent in various ways for correctness, eg runtime/src/coreclr/jit/optimizer.cpp Lines 5596 to 5635 in daec9dc
|
Also see #59087 (comment). I am going to retitle this issue since it's more broadly about block reordering because of PGO interfering with loop optimization, and not restricted to GDV. |
The JIT currently will aggressively reorder the flow graph before running its loop recognition phases. When there is PGO data this sometimes perturbs the block order so that loops are no longer recognized, and we miss out on some loop optimizations. This change defers most block reordering until after the JIT has gone through the optimization phases. There is still a limited form of flow cleanup done early on. There is also a compensating change in loop recognition in one place where it was relying on adjacent blocks being combined. Fixes dotnet#67318.
…69878) The JIT currently will aggressively reorder the flow graph before running its loop recognition phases. When there is PGO data this sometimes perturbs the block order so that loops are no longer recognized, and we miss out on some loop optimizations. This change defers most block reordering until after the JIT has gone through the optimization phases. There is still a limited form of flow cleanup done early on. There is also a compensating change in loop recognition in one place where it was relying on adjacent blocks being combined. Fixes #67318.
We currently rearrange the flow graph based on profile weights before we perform loop recognition. Since this recognition is lexical, rearrangement can impair loop recognition.
Here's a simple example:
As is we produce the flow graph on the left; with the two
MapF(mm, ...)
calls commented out the graph on the right:Because of this reordering, loop recognition arrives at very different results, even though the flow graphs are isomorphic:
We actually see a high volume of monomorphic call sites with PGO, meaning we will quite often be seeing profiles like the one at right where the GDV test is highly biased and the residual indirect call site is cold.
Seems like we should look into deferring block reordering until after optimization.
cc @BruceForstall @EgorBo
The text was updated successfully, but these errors were encountered: