-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Memory containment for FIELD_LIST
operands on x86
#65803
Memory containment for FIELD_LIST
operands on x86
#65803
Conversation
Tagging subscribers to this area: @JulieLeeMSFT Issue DetailsEnables broader containment for FIELD_LIST operands on x86. The primary benefits come from containing loads that long decomposition has left (e. g. - mov ecx, dword ptr [ebp+08H]
- mov edx, dword ptr [ebp+0CH]
- push edx
- push ecx
+ push dword ptr [ebp+0CH]
+ push dword ptr [ebp+08H] Additionally, there is some code simplification enabled by the use of We're expecting some some good x86 (only) diffs.
|
4e01bc3
to
b62428c
Compare
Windows ARM64 build failures are #65818. @dotnet/jit-contrib - some x86 CQ improvements to unblock further improvements. |
b62428c
to
9b93e40
Compare
614d1e7
to
d383575
Compare
Some nice diffs here. Looking through the diffs, it's easy to see where we could do better (not related to this change), e.g.:
|
I found an interesting diff in spmi libraries_tests,
diff:
Can you explain this? Looks like the code is pre-existing but perhaps wasn't executed before:
Minimally, this needs to be:
but it seems like this should be an assert. Also, it's worrisome that the GC info is changed with your changes. (also saw this in |
One of your changes does retyping, so we end up with diffs like:
Why do we want this? It looks like it doesn't hurt, but...? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A few comments
The premise is that we'd like to use I had two revisions of this change where that logic was flawed in one way or another, failing to deal properly with As you note, we may end up not using
Yes, these were the diffs that worried me too... The cause is that the old
What kind of assert do you have in mind? |
So are there cases where previously we would have something like:
and now we have something like:
? Assuming the target was a 4-byte slot?
If I add But it also looks like there are two cases:
why does the second zero here get a GC update but the first goes through the code path that doesn't? |
I don't believe so, the optimization is for keeping parity with the previous implementation that did this effectively implicitly: switch (fieldNode->OperGet())
{
case GT_LCL_VAR:
// Notably this doesn't handle the FIELD<short>(fieldNode<byte>) case correctly,
// but that was ok because only very specific promotion-produced IR shapes were expected here.
inst_TT(INS_push, fieldNode, 0, 0, emitActualTypeSize(fieldNode->TypeGet())); // Note the "emitActualTypeSize".
break; There are basically three nodes we can see on this path due to the aforementioned decomposition pessimization (that I also would like to remove at some point):
If we are looking at the same SPMI diff for IN006f: 00018C push 0
IN0070: 00018E push 0
+UNDONE: record GCref push [cns]
IN0071: 000190 push 0
+ ; gcr arg push 4
IN0072: 000192 mov ecx, gword ptr [ebp-5CH]
; gcrRegs +[ecx]
; GC ptr vars -{V41 V45 V61 V100}
IN0073: 000195 call System.IO.Strategies.BufferedFileStreamStrategy:WriteSpan()
; gcrRegs -[ecx]
- ; gcr arg pop 1
+ ; gcr arg pop 2
02 F0 BF ...| 017C reg EAX becoming dead
0E BF 8A ...| 0182 reg ECX becoming dead
F0 BF 8A ...| 018C push ptr 1 (iptr)
-F0 49 C8 ...| 0195 reg ECX becoming live
+A6 0D D0 ...| 0192 push ptr 4
+4B D0 4B ...| 0195 reg ECX becoming live
0D 4B F2 ...| 019A reg ECX becoming dead
-C8 F2 45 ...| 019A pop 1 ptrs
+D0 F2 45 ...| 019A pop 2 ptrs
4B 45 F1 ...| 019D reg ECX becoming live
F2 45 05 ...| 01BA reg EAX becoming live
F1 05 40 ...| 01CF reg EAX becoming dead
My understanding of the GC reporting for all |
Maybe your new inst_TT should strip the GC info the same way the old code was (implicitly) doing it, and assert that a GC constant is null when doing so. e.g.,
or, if you think it's ok to generate this new GC info, then replace:
which seems unnecessary, with an assert:
Comments? |
I would like to understand - what's the reason for expecting only [MethodImpl(MethodImplOptions.NoInlining)]
private static void Problem(ref int addrSrc)
{
var addr = MemoryMarshal.CreateSpan(ref addrSrc, 1);
ref int constAddr = ref *(int*)100; // Just a placeholder, it could be address of a real (pinned) GC object
if (Unsafe.AreSame(ref MemoryMarshal.GetReference(addr), ref constAddr))
{
CallForSpan(addr);
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
static void CallForSpan<T>(Span<T> value) { } |
I was just being conservative -- I didn't expect anything else. How could you get a non-null constant byref? I would think your example would be illegal. I would think it would have to be pinned memory if it was in the GC heap, at which point it could just be an unmanaged pointer, but it couldn't be constant anyway. |
Well, we are sort of defining that for ourselves at this point, but in general the compiler does expect to see non-null It is true that we don't have that many sources of them right now, but it also the case that with the
Ahh, but it could. Say it was a pointer into an array allocated with We also have the curious case of collectible assemblies which can be kept alive by RVAs-turned-constant-byrefs pointing into them. |
Ok, you've convinced me there are weird cases, so we don't need to assert that the constant is null. But it seems it's still safe to remove the GC reporting on it, since obviously the constant can't be updated by the GC. |
It cannot be updated, but it still influences what things are reported as roots. Modified example from earlierprivate static void Main()
{
Problem(ref *_objAddr);
}
[ModuleInitializer]
public static void Init() => RuntimeHelpers.RunClassConstructor(typeof(Program).TypeHandle);
[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
private static int Problem(ref int addrSrc)
{
var addr = MemoryMarshal.CreateSpan(ref addrSrc, 1);
ref int constAddr = ref *_objAddr;
if (Unsafe.AreSame(ref MemoryMarshal.GetReference(addr), ref constAddr))
{
CallForProblemWithPushes(addr, RvrsRt());
}
int i = 0;
for (; i < 10; i++) { }
return i;
}
private static readonly int* _objAddr = ConstructObjAddrForPushes();
private static object _objRt;
private static WeakReference _objFlg;
private static int* ConstructObjAddrForPushes()
{
var obj = GC.AllocateArray<int>(1, pinned: true);
fixed (int* addr = obj)
{
_objRt = obj;
_objFlg = new WeakReference(obj);
return addr;
}
}
[MethodImpl(MethodImplOptions.NoInlining)]
private static long RvrsRt()
{
Console.WriteLine(_objFlg.IsAlive);
_objRt = null;
return 0;
}
[MethodImpl(MethodImplOptions.NoInlining)]
static ref int CallForProblemWithPushes(Span<int> objAddr, long use)
{
Console.WriteLine(_objFlg.IsAlive);
return ref objAddr[0];
} Running under UNDONE: record GCref push [cns]
True
True If we strip the GC-ness in // Did we push a GC ref value?
if (id->idGCref())
{
#ifdef DEBUG
+ id->idGCref(GCT_NONE);
printf("UNDONE: record GCref push [cns]\n");
#endif
} We instead get a VM assert about a corrupt ref:
(Presumably, when it walks the stack inside This is "sort of" expected, you can get similar "GC holes" in other ways, because the compiler doesn't model shortening of lifetimes as a side-effect (indeed: constant propagation of byrefs that lead to this in the first place would be illegal if we modelled shortening of lifetimes as a side-effect). That said, it does not feel right to me to "take advantage" of this being (implicitly) allowed in codegen/emitter specifically that should ideally work in such a way that "lifetimes in" are the same as "lifetimes out". |
a) It is clearer and cheaper to directly use the emitter anyway. b) "inst_RV_TT" will be made XARCH-only in an upcoming change.
a) Make them XARCH-only. b) Rewrite them using "OperandDesc".
Enable broader containment for FIELD_LIST operands on x86. The primary benefits come from containing loads that long decomposition has left (e. g. LCL_FLDs): ```diff - mov ecx, dword ptr [ebp+08H] - mov edx, dword ptr [ebp+0CH] - push edx - push ecx + push dword ptr [ebp+0CH] + push dword ptr [ebp+08H] ```
Insert an assert instead about what we expect and handle GCInfo-wise.
19ab275
to
d3d5326
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for all the explanations and updates.
Another windows/x86 regression in #68433 |
Enables broader containment for
FIELD_LIST
operands on x86.The primary benefits come from containing loads that long decomposition has left (e. g.
LCL_FLD
s):Additionally, there is some code simplification enabled by the use of
OperandDesc
in twoinst_*
functions with this change.Some some good x86(-only) diffs. A few small regressions because of some worse RA choices.
Note: I think a couple stress runs for this change are warranted, the logic here is not straightforward.