-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
JIT: simple forward substitution pass #63720
JIT: simple forward substitution pass #63720
Conversation
Extend ref counting done by local morph so that we can determine single-def single-use locals. Add a phase that runs just after local morph that will attempt to forward single-def single-use local defs to uses when they are in adjacent statements. Fix or work around issues uncovered elsewhere: * `gtFoldExprCompare` might fold "identical" volatile subtrees * `fgGetStubAddrArg` cannot handle complex trees * some simd/hw operations can lose struct handles * some calls cannot handle struct local args Addresses dotnet#6973 and related issues. Still sorting through exactly which ones are fixed, so list below may need revising. Fixes dotnet#48605. Fixes dotnet#51599. Fixes dotnet#55472. Improves some but not all cases in dotnet#12280 and dotnet#62064. Does not fix dotnet#33002, dotnet#47082, or dotnet#63116; these require handling multiple uses or bypassing statements.
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsExtend ref counting done by local morph so that we can determine Add a phase that runs just after local morph that will attempt to Fix or work around issues uncovered elsewhere:
Addresses #6973 and related issues. Still sorting through exactly Fixes #48605. Improves some but not all cases in #12280 and #62064. Does not fix #33002, #47082, or #63116; these require handling multiple
|
cc @dotnet/jit-contrib Large numbers of diffs expected. A few large regressions where we now decide to clone loops. Some small regressions where we now report generics context where we didn't previously (possibly a bug fix, the exact requirements here are a bit squishy). Lots of improvements. Will post some examples once this gets a bit further along. |
Looks like various issues cropping up during crossgen for non-windows non-x64 builds. Hopefully minor. |
// if the next tree can't change the value of fwdSubNode or be adversly impacted by fwdSubNode effects | ||
// | ||
const bool fwdSubNodeInvariant = ((fwdSubNode->gtFlags & GTF_ALL_EFFECT) == 0); | ||
const bool nextTreeIsPureUpToUse = ((fsv.GetFlags() & (GTF_EXCEPT | GTF_GLOB_REF | GTF_CALL)) == 0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It feels very risky to rely on GTF_GLOB_REF
before (or, really, after, since we don't have the checker enabled for it) morph. For example:
static int* _lclAddr;
[MethodImpl(MethodImplOptions.NoInlining)]
static int Problem(int a, int b)
{
[MethodImpl(MethodImplOptions.NoInlining)]
static void CaptureLcl(int* lclAddr)
{
_lclAddr = lclAddr;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static int MutateLcl()
{
(*_lclAddr)++;
return 0;
}
CaptureLcl(&a);
b = a + a;
return MutateLcl() + b; // We will forward-substitute 'b' because "a + a" won't have GTF_GLOB_REF on it.
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the example. Seems like we'll need to walk over the tree to substitute and compute the actual set of effects flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Added some extra flags scanning.
* For x86 we can't forward sub `GT_LCLHEAP`. * For arm64 we need to take more care with multi-reg struct types.
Going to set this aside for a while to focus on OSR. Looks like:
@EgorBo feel free to look into any of the above. |
JIT/Directed/debugging/debuginfo/tester is failing everywhere as we're evidently removing statements with debug info. Wonder if we should be merging debug info somehow. |
Latest diff summary for windows x64 (since CI runs are still unable to provide data, likely because of the huge number of impacted methods): aspnet.run.windows.x64.checked.mch:
Detail diffs
benchmarks.run.windows.x64.checked.mch:
Detail diffs
coreclr_tests.pmi.windows.x64.checked.mch:
Detail diffs
libraries.crossgen2.windows.x64.checked.mch:
Detail diffs
libraries.pmi.windows.x64.checked.mch:
Detail diffs
libraries_tests.pmi.windows.x64.checked.mch:
Detail diffs
|
As noted earlier, the big regressions (eg |
From my impression some regressions are just missing opportunities for CSE that was supposed to "un-substitute" back where needed. Example: public class Strings
{
public string s1 { get; }
public string s2 { get; }
public string s3 { get; }
}
public class Program
{
public string Test(Strings size)
=> string.Concat(size.s1, size.s2, size.s3);
} G_M19485_IG02:
mov rcx, gword ptr [rdx+8]
- mov rax, gword ptr [rdx+16]
- mov r8, gword ptr [rdx+24]
- mov rdx, rax
+ mov gword ptr [rsp+10H], rdx
+ mov rdx, gword ptr [rdx+16]
+ mov r8, gword ptr [rsp+10H]
+ mov r8, gword ptr [r8+24]
G_M19485_IG03:
jmp System.String:Concat(System.String,System.String,System.String):System.String
-; Total bytes of code: 20
+; Total bytes of code: 27 |
Or (in this case) poor modelling of ABI constraints, LSRA preferencing, or poor choices for or arg evaluation order... @jakobbotsch was noting our current arg order sorting seems to be unhelpful. |
Might need to avoid forward sub of an implicit byref's promoted fields.... |
The debug info tests assume that mappings will be placed on statements containing calls. We should probably think about what the best way forward is -- perhaps prefer using the earliest offset? If we determine current behavior is ok the tests can be changed to have multiple uses to turn off the forward sub. |
I am not sure the NativeAOT failure is related, well it is related but during reducing a repro I came up with one that fails even on Main jit (if I did everything correctly): using System;
using System.Diagnostics;
using System.Runtime.CompilerServices;
using System.Text;
public class Test
{
public static void Main()
{
try
{
throw new Exception();
}
catch when (FilterWithStackTrace())
{
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
static bool FilterWithStackTrace()
{
Consume(new StackTrace(0, true).ToString());
return true;
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static void Consume(string s)
{
}
} @MichalStrehovsky could you please check this repro? UPD: nvm, I am now able to repro it only under this PR, investigating... |
NativeAOT's failed test was a difficult one since it only reproduced on app start and not during AOT compilation... NgenDump diff: https://www.diffchecker.com/mbzS9LLk Is this a correct substitution: |
It looks like we are reordering a call that modifies local0 and a read of local0. Seems like we need to watch for address-exposed uses when computing the effects of the statement we're trying to substitute into. Can you see if this diff fixes things? @@ -145,6 +145,15 @@ public:
m_parentNode = parent;
}
}
+
+ // Uses of address-exposed locals are modelled as global refs.
+ //
+ LclVarDsc* const varDsc = m_compiler->lvaGetDesc(lclNum);
+
+ if (varDsc->IsAddressExposed())
+ {
+ m_accumulatedFlags |= GTF_GLOB_REF;
+ }
}
m_accumulatedFlags |= (node->gtFlags & GTF_GLOB_EFFECT);
@@ -215,8 +224,8 @@ public:
if (node->OperIs(GT_LCL_VAR))
{
- unsigned const lclNum = node->AsLclVarCommon()->GetLclNum();
- LclVarDsc* varDsc = m_compiler->lvaGetDesc(lclNum);
+ unsigned const lclNum = node->AsLclVarCommon()->GetLclNum();
+ LclVarDsc* const varDsc = m_compiler->lvaGetDesc(lclNum);
if (varDsc->IsAddressExposed())
{ |
/azp run Fuzzlyn, Antigen |
Azure Pipelines successfully started running 2 pipeline(s). |
I think the Fuzzlyn and Antigen issues are known issues. And the runtime CI failures likewise. So, I'm going to merge. |
@AndyAyersMS I do see 2 Fuzzlyn examples in the latest run that look related. // Generated by Fuzzlyn v1.5 on 2022-02-03 20:06:22
// Run on X86 Windows
// Seed: 4993495812901676847
// Reduced from 57.3 KiB to 0.6 KiB in 00:00:48
// Debug: Outputs 1
// Release: Outputs 4294967295
public struct S1
{
public uint F4;
public long F7;
public S1(uint f4): this()
{
F4 = f4;
}
public ref bool M1(uint arg0, bool arg1)
{
this.F4 = (uint)-this.F4;
System.Console.WriteLine(arg0);
return ref Program.s_4;
}
}
public struct S2
{
public S1 F3;
public S2(S1 f3): this()
{
F3 = f3;
}
}
public class Program
{
public static S1[] s_2 = new S1[]{new S1(0)};
public static bool s_4;
public static void Main()
{
S2 vr0 = new S2(new S1(1));
s_2[0].M1(vr0.F3.F4, vr0.F3.M1(0, false));
}
} // Generated by Fuzzlyn v1.5 on 2022-02-03 20:11:11
// Run on X86 Windows
// Seed: 3146441340761756415
// Reduced from 127.7 KiB to 0.7 KiB in 00:03:37
// Debug: Throws 'System.NullReferenceException'
// Release: Runs successfully
public struct S0
{
public int F3;
public uint F6;
}
public class C1
{
public S1 M31(C1 arg0)
{
return new S1(new C1(), new S0());
}
}
public struct S1
{
public C1 F4;
public S0 F7;
public S1(C1 f4, S0 f7): this()
{
}
public C1 M26(int arg0)
{
if (Program.s_4)
{
this.F4 = new C1();
}
return new C1();
}
}
public class Program
{
public static bool s_4 = true;
public static long s_20;
public static void Main()
{
S1 vr8 = default(S1);
var vr9 = vr8.F7.F3;
S1 vr10 = vr8.F4.M31(vr8.M26(vr9));
System.Console.WriteLine(s_20);
}
} |
The first one needs a little massaging to repro as a console app: // Generated by Fuzzlyn v1.5 on 2022-02-03 20:06:22
// Run on X86 Windows
// Seed: 4993495812901676847
// Reduced from 57.3 KiB to 0.6 KiB in 00:00:48
// Debug: Outputs 1
// Release: Outputs 4294967295
using System;
public struct S1
{
public uint F4;
public long F7;
public S1(uint f4): this()
{
F4 = f4;
}
public ref bool M1(uint arg0, bool arg1)
{
this.F4 = (uint)-this.F4;
Program.s_rt.WriteLine(arg0);
return ref Program.s_4;
}
}
public struct S2
{
public S1 F3;
public S2(S1 f3): this()
{
F3 = f3;
}
}
public class Program
{
public static IRT s_rt;
public static S1[] s_2 = new S1[]{new S1(0)};
public static bool s_4;
public static void Main()
{
s_rt = new C();
S2 vr0 = new S2(new S1(1));
s_2[0].M1(vr0.F3.F4, vr0.F3.M1(0, false));
}
}
public interface IRT
{
void WriteLine<T>(T value);
}
public class C : IRT
{
public void WriteLine<T>(T value)
{
Console.WriteLine(value);
}
} The second repros out of the box. |
For LIR we verify that we can really consider locals to be used at their user by having a checker that looks for interfering stores to the same locals. However, in some cases we may have "interfering" GT_LCL_FLD/GT_STORE_LCL_FLD, in the sense that they work on the same local but on a disjoint range of bytes. Add support to validate this. This fixes dotnet#57919 which made the fuzzer jobs very noisy and made it easy to miss actual new examples (e.g. dotnet#63720 was just merged even though there were real examples found there). Fix dotnet#57919
For LIR we verify that we can really consider locals to be used at their user by having a checker that looks for interfering stores to the same locals. However, in some cases we may have "interfering" GT_LCL_FLD/GT_STORE_LCL_FLD, in the sense that they work on the same local but on a disjoint range of bytes. Add support to validate this. This fixes #57919 which made the fuzzer jobs very noisy and made it easy to miss actual new examples (e.g. #63720 was just merged even though there were real examples found there). Fix #57919
Some failures from this weekend's Fuzzlyn run. All of these repro in main (179ffff) in win-x86 and not on win-x64. // Generated by Fuzzlyn v1.5 on 2022-02-06 17:06:01
// Run on X86 Windows
// Seed: 7854701479455767355
// Reduced from 192.2 KiB to 0.6 KiB in 00:03:51
// Debug: Outputs 0
// Release: Outputs 1
public struct S0
{
public uint F0;
public ulong F2;
public int F3;
public ushort F6;
public int F8;
public int[] M35(int arg0, int[] arg1)
{
this.F3 = arg1[0];
Program.s_rt.WriteLine(arg0);
return arg1;
}
}
public class Program
{
public static IRT s_rt;
public static int[][] s_28 = new int[][]{new int[]{1}};
public static void Main()
{
s_rt = new C();
var vr3 = new S0();
var vr6 = vr3.F3;
var vr13 = s_28[0];
var vr12 = vr3.M35(0, vr13);
vr3.M35(vr6, vr12);
}
}
public interface IRT
{
void WriteLine<T>(T value);
}
public class C : IRT
{
public void WriteLine<T>(T value)
{
System.Console.WriteLine(value);
}
} // Generated by Fuzzlyn v1.5 on 2022-02-06 18:06:07
// Run on X86 Windows
// Seed: 525178552114891112
// Reduced from 142.0 KiB to 0.9 KiB in 00:06:35
// Debug: Throws 'System.NullReferenceException'
// Release: Runs successfully
public interface I0
{
}
public class C0
{
public ushort F0;
}
public struct S0 : I0
{
public byte F0;
public bool F1;
public long F3;
public byte F4;
public C0 F5;
public S0(long f3, C0 f5): this()
{
F5 = f5;
}
public C1 M24(ref bool arg0)
{
this = new S0(0, new C0());
return Program.s_18;
}
}
public class C1
{
}
public class Program
{
public static IRT s_rt;
public static C1 s_18;
public static void Main()
{
s_rt = new C();
var vr4 = new S0(0, new C0());
I0 vr7 = vr4;
bool vr8 = default(bool);
S0 vr9 = default(S0);
var vr10 = vr9.F5;
var vr11 = vr9.M24(ref vr8);
M8(vr10, ref vr7, vr11);
}
public static void M8(C0 argThis, ref I0 arg0, C1 arg1)
{
s_rt.WriteLine(argThis.F0);
}
}
public interface IRT
{
void WriteLine<T>(T value);
}
public class C : IRT
{
public void WriteLine<T>(T value)
{
System.Console.WriteLine(value);
}
} // Generated by Fuzzlyn v1.5 on 2022-02-06 16:34:36
// Run on X86 Windows
// Seed: 15345211054104247945
// Reduced from 56.7 KiB to 1.1 KiB in 00:00:55
// Debug: Outputs 0
// Release: Outputs -1
public struct S0
{
public ulong F1;
public int F2;
public byte F3;
public short F5;
public long F6;
public void M4(int arg0, sbyte[] arg1)
{
Program.s_rt.WriteLine(arg0);
}
public sbyte[] M6(ref int arg0, bool arg1, ref S0 arg2, ref byte arg3)
{
var vr0 = Program.s_5;
Program.M7(arg2, vr0, ref Program.s_6, ref Program.s_10, arg0);
var vr1 = arg2.F6;
Program.M8(arg2, vr1);
arg0 = this.F2--;
return new sbyte[]{1};
}
}
public class Program
{
public static IRT s_rt;
public static bool s_2;
public static S0 s_4;
public static uint[] s_5;
public static byte s_6;
public static S0[] s_10;
public static S0[, ] s_20 = new S0[, ]{{new S0()}};
public static void Main()
{
s_rt = new C();
S0 vr2 = default(S0);
new S0().M4(vr2.F2, vr2.M6(ref s_4.F2, s_2, ref s_20[0, 0], ref s_4.F3));
}
public static void M8(S0 arg0, long arg1)
{
}
public static void M7(S0 argThis, uint[] arg0, ref byte arg1, ref S0[] arg2, int arg3)
{
}
}
public interface IRT
{
void WriteLine<T>(T value);
}
public class C : IRT
{
public void WriteLine<T>(T value)
{
System.Console.WriteLine(value);
}
} |
@jakobbotsch - do you think they are related to the forward substitution? Could you open GH issues instead for better tracking? |
@kunalspathak Yes I expect so. Opened #64904. I don't really want to open two more yet since it seems very likely they may have the same cause. |
arm64 improvements: dotnet/perf-autofiling-issues#3438 dotnet/perf-autofiling-issues#3434 and dotnet/perf-autofiling-issues#3433 |
Extend ref counting done by local morph so that we can determine
single-def single-use locals.
Add a phase that runs just after local morph that will attempt to
forward single-def single-use local defs to uses when they are in
adjacent statements.
Fix or work around issues uncovered elsewhere:
gtFoldExprCompare
might fold "identical" volatile subtreesfgGetStubAddrArg
cannot handle complex treesAddresses #6973 and related issues. Still sorting through exactly
which ones are fixed, so list below may need revising.
Fixes #48605.
Fixes #51599.
Fixes #55472.
Improves some but not all cases in #12880 and #62604.
Does not address #33002, #47082, or #63116; these require handling multiple
uses or bypassing statements.