Memory containment for `FIELD_LIST` operands on x86 #65803

SingleAccretion · 2022-02-23T20:19:15Z

Enables broader containment for FIELD_LIST operands on x86.

The primary benefits come from containing loads that long decomposition has left (e. g. LCL_FLDs):

-       mov      ecx, dword ptr [ebp+08H]
-       mov      edx, dword ptr [ebp+0CH]
-       push     edx
-       push     ecx
+       push     dword ptr [ebp+0CH]
+       push     dword ptr [ebp+08H]

Additionally, there is some code simplification enabled by the use of OperandDesc in two inst_* functions with this change.

Some some good x86(-only) diffs. A few small regressions because of some worse RA choices.

Note: I think a couple stress runs for this change are warranted, the logic here is not straightforward.

ghost · 2022-02-23T20:19:22Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

Enables broader containment for FIELD_LIST operands on x86.

The primary benefits come from containing loads that long decomposition has left (e. g. LCL_FLDs):

-       mov      ecx, dword ptr [ebp+08H]
-       mov      edx, dword ptr [ebp+0CH]
-       push     edx
-       push     ecx
+       push     dword ptr [ebp+0CH]
+       push     dword ptr [ebp+08H]

Additionally, there is some code simplification enabled by the use of OperandDesc in two inst_* functions with this change.

We're expecting some some good x86 (only) diffs.

Author:	SingleAccretion
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

SingleAccretion · 2022-02-24T16:10:47Z

Windows ARM64 build failures are #65818.

@dotnet/jit-contrib - some x86 CQ improvements to unblock further improvements.

BruceForstall · 2022-03-22T18:11:53Z

Some nice diffs here. Looking through the diffs, it's easy to see where we could do better (not related to this change), e.g.:

       mov      dword ptr [ebp-24H], edx
       mov      dword ptr [ebp-20H], eax
       push     dword ptr [ebp-20H]                 // could just use "push eax"
       push     dword ptr [ebp-24H]                 // could just use "push edx"

BruceForstall · 2022-03-22T18:16:14Z

I found an interesting diff in spmi libraries_tests, System.Tests.TimeZoneInfoTests:NJulianRuleTest(System.String,int,int,bool):
base:

G_M59070_IG13:        ; gcrefRegs=00000048 {ebx esi}, byrefRegs=00000000 {}, byref
       push     dword ptr [ebp-38H]
       push     bword ptr [ebp-64H]
       ; byr arg push 1
       push     0
       push     0
       push     0
       mov      ecx, gword ptr [ebp-5CH]
       ; gcrRegs +[ecx]
       ; GC ptr vars -{V41 V45 V61 V100}
       call     [System.IO.Strategies.BufferedFileStreamStrategy:WriteSpan()]
       ; gcrRegs -[ecx]
       ; gcr arg pop 1

diff:

G_M59070_IG13:        ; gcrefRegs=00000048 {ebx esi}, byrefRegs=00000000 {}, byref
       push     dword ptr [ebp-38H]
       push     bword ptr [ebp-64H]
       ; byr arg push 1
       push     0
       push     0
UNDONE: record GCref push [cns]
       push     0
       ; gcr arg push 4
       mov      ecx, gword ptr [ebp-5CH]
       ; gcrRegs +[ecx]
       ; GC ptr vars -{V41 V45 V61 V100}
       call     [System.IO.Strategies.BufferedFileStreamStrategy:WriteSpan()]
       ; gcrRegs -[ecx]
       ; gcr arg pop 2

Can you explain this?

Looks like the code is pre-existing but perhaps wasn't executed before:

            // Did we push a GC ref value?
            if (id->idGCref())
            {
#ifdef DEBUG
                printf("UNDONE: record GCref push [cns]\n");
#endif
            }

Minimally, this needs to be:

            // Did we push a GC ref value?
            if (id->idGCref())
            {
                JITDUMP("UNDONE: record GCref push [cns]\n");
            }

but it seems like this should be an assert.

Also, it's worrisome that the GC info is changed with your changes.

(also saw this in System.Tests.TimeZoneInfoTests:IsDaylightSavingTime_CatamarcaMultiYearDaylightSavings())

BruceForstall · 2022-03-22T18:19:08Z

One of your changes does retyping, so we end up with diffs like:

;  V17 tmp11        [V17,T04] (  3, 32   )    bool  ->  [ebp-2CH]   spill-single-def V11.value(offs=0x01) P-INDEP "field V11.value (fldOffset=0x1)"
...
       mov      cl, byte  ptr [ebp-2CH]
       mov      byte  ptr [esp+01H], cl
=>
       mov      ecx, dword ptr [ebp-2CH]
       mov      byte  ptr [esp+01H], cl

Why do we want this? It looks like it doesn't hurt, but...?

BruceForstall

A few comments

src/coreclr/jit/emitxarch.cpp

SingleAccretion · 2022-03-22T19:30:02Z

Why do we want this? It looks like it doesn't hurt, but...?

The premise is that we'd like to use push for varTypeIsSmall (non-candidate) LCL_VAR/LCL_FLD nodes. Retyping achieves that, and makes the logic for when that is legal centralized (LSRA and codegen can use a simple varTypeIsI test to determine whether push can be used).

I had two revisions of this change where that logic was flawed in one way or another, failing to deal properly with IND byte or FIELD<short>(LCL_FLD<byte>), and that's the code I came to after fixing the issues.

As you note, we may end up not using push after all, and then it's sort of pointless to retype (perhaps it could avoid partial register issues on some CPUs), but it should indeed not hurt things.

Can you explain this?

Yes, these were the diffs that worried me too... The cause is that the old inst_TT, when passed a GC-typed constant tree (here, null), called inst_IV, which in turn called GetEmitter()->emitIns_I(ins, EA_PTRSIZE, val);, losing GC-ness along the way. So with this change we're generating more (arguably redundant) GC info, but it should be strictly "safer". Perhaps we could retype nulls to nints in codegen for such situations, but that's a separate issue.

but it seems like this should be an assert.

What kind of assert do you have in mind?

BruceForstall · 2022-03-22T22:44:54Z

The premise is that we'd like to use push for varTypeIsSmall (non-candidate) LCL_VAR/LCL_FLD nodes.

So are there cases where previously we would have something like:

       sub       esp, 4
       mov      cl, byte  ptr [ebp-2CH]
       mov      byte  ptr [esp], cl

and now we have something like:

       push     dword ptr [ebp-2CH]

? Assuming the target was a 4-byte slot?

What kind of assert do you have in mind?

If I add assert(!id->idGCref()); it never hits in the baseline. This indicates to me that this code never expected GC refs, and we need to do something, if this is really a GC ref.

But it also looks like there are two cases:

       push     0
UNDONE: record GCref push [cns]
       push     0
       ; gcr arg push 4

why does the second zero here get a GC update but the first goes through the code path that doesn't?

SingleAccretion · 2022-03-22T23:36:56Z

So are there cases where previously we would have something like ... and now we have something like

I don't believe so, the optimization is for keeping parity with the previous implementation that did this effectively implicitly:

switch (fieldNode->OperGet())
{
    case GT_LCL_VAR:
        // Notably this doesn't handle the FIELD<short>(fieldNode<byte>) case correctly,
        // but that was ok because only very specific promotion-produced IR shapes were expected here.
        inst_TT(INS_push, fieldNode, 0, 0, emitActualTypeSize(fieldNode->TypeGet())); // Note the "emitActualTypeSize".
        break;

There are basically three nodes we can see on this path due to the aforementioned decomposition pessimization (that I also would like to remove at some point): LCL_VAR<field type> | CNS_INT<INT> | <anything INT-sized>, the scenario that was regressing without the optimization is a promoted small field of an address-exposed struct.

why does the second zero here get a GC update but the first goes through the code path that doesn't?

If we are looking at the same SPMI diff for NJulianRuleTest, then the diff is that these are for the same argument, I believe:

 IN006f: 00018C push     0
 IN0070: 00018E push     0
+UNDONE: record GCref push [cns]
 IN0071: 000190 push     0
+       ; gcr arg push 4
 IN0072: 000192 mov      ecx, gword ptr [ebp-5CH]
        ; gcrRegs +[ecx]
        ; GC ptr vars -{V41 V45 V61 V100}
 IN0073: 000195 call     System.IO.Strategies.BufferedFileStreamStrategy:WriteSpan()
        ; gcrRegs -[ecx]
-       ; gcr arg pop 1
+       ; gcr arg pop 2

 02    F0 BF ...| 017C        reg EAX becoming dead
 0E    BF 8A ...| 0182        reg ECX becoming dead
 F0 BF 8A    ...| 018C        push ptr  1 (iptr)
-F0 49    C8 ...| 0195        reg ECX becoming live
+A6    0D D0 ...| 0192        push ptr  4
+4B    D0 4B ...| 0195        reg ECX becoming live
 0D    4B F2 ...| 019A        reg ECX becoming dead
-C8    F2 45 ...| 019A        pop  1 ptrs
+D0    F2 45 ...| 019A        pop  2 ptrs
 4B    45 F1 ...| 019D        reg ECX becoming live
 F2 45    05 ...| 01BA        reg EAX becoming live
 F1 05    40 ...| 01CF        reg EAX becoming dead

This indicates to me that this code never expected GC refs, and we need to do something, if this is really a GC ref.

My understanding of the GC reporting for all push/pops is that it already happens via the emitOutputInstr -> emitStackPush -> emitStackPushLargeStk -> gcRegPtrAllocDsc call chain.

BruceForstall · 2022-03-23T17:50:47Z

Maybe your new inst_TT should strip the GC info the same way the old code was (implicitly) doing it, and assert that a GC constant is null when doing so.

e.g.,

        case OperandKind::Imm:
        {
            // The only gc/byref allowed as an immediate is zero (null). Strip the GC-ness when generating the code.
            emitAttr attr = op1Desc.GetEmitAttrForImmediate(size);
            assert(!EA_IS_GCREF_OR_BYREF(attr) || (!EA_IS_RELOC(attr) && (op1Desc.GetImmediate() == 0)));
            emit->emitIns_I(ins, EA_SIZE(attr), op1Desc.GetImmediate());
            break;
        }

or, if you think it's ok to generate this new GC info, then replace:

            // Did we push a GC ref value?
            if (id->idGCref())
            {
#ifdef DEBUG
                printf("UNDONE: record GCref push [cns]\n");
#endif
            }

which seems unnecessary, with an assert:

            // We only expect zero gc/byref (non-relocatable) constant.
            assert(!id->idGCref() || (!id->idIsCnsReloc() && (val == 0)));

Comments?

SingleAccretion · 2022-03-23T22:16:07Z

Comments?

I would like to understand - what's the reason for expecting only nulls here? I think we can get (and should handle) arbitrary constants, byrefs especially. E. g. here is a construction that would hit the val != 0 case (there could be others):

[MethodImpl(MethodImplOptions.NoInlining)]
private static void Problem(ref int addrSrc)
{
    var addr = MemoryMarshal.CreateSpan(ref addrSrc, 1);
    ref int constAddr = ref *(int*)100;  // Just a placeholder, it could be address of a real (pinned) GC object

    if (Unsafe.AreSame(ref MemoryMarshal.GetReference(addr), ref constAddr))
    {
        CallForSpan(addr);
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
static void CallForSpan<T>(Span<T> value) { }

BruceForstall · 2022-03-23T22:33:08Z

I would like to understand - what's the reason for expecting only nulls here?

I was just being conservative -- I didn't expect anything else. How could you get a non-null constant byref? I would think your example would be illegal. I would think it would have to be pinned memory if it was in the GC heap, at which point it could just be an unmanaged pointer, but it couldn't be constant anyway.

SingleAccretion · 2022-03-23T23:02:45Z

How could you get a non-null constant byref?

Well, we are sort of defining that for ourselves at this point, but in general the compiler does expect to see non-null byrefs (VN handles them at the very least), and we had bugs caused by "only null byrefs exist"-like assumptions.

It is true that we don't have that many sources of them right now, but it also the case that with the Unsafe.NullRef and derivatives I don't think we can really say that it's illegal at this point.

I would think it would have to be pinned memory if it was in the GC heap, at which point it could just be an unmanaged pointer, but it couldn't be constant anyway.

Ahh, but it could. Say it was a pointer into an array allocated with GC.AllocateArray(length, pinned: true), that was then saved into a static readonly, which would in turn be imported as a constant. Or that the creator of the code was actually using Reflection.Emit and "burned-in" the address of an object.

We also have the curious case of collectible assemblies which can be kept alive by RVAs-turned-constant-byrefs pointing into them.

BruceForstall · 2022-03-23T23:11:30Z

Ok, you've convinced me there are weird cases, so we don't need to assert that the constant is null.

But it seems it's still safe to remove the GC reporting on it, since obviously the constant can't be updated by the GC.

SingleAccretion · 2022-03-24T15:25:51Z

But it seems it's still safe to remove the GC reporting on it, since obviously the constant can't be updated by the GC.

It cannot be updated, but it still influences what things are reported as roots.

Modified example from earlier

private static void Main()
{
    Problem(ref *_objAddr);
}

[ModuleInitializer]
public static void Init() => RuntimeHelpers.RunClassConstructor(typeof(Program).TypeHandle);

[MethodImpl(MethodImplOptions.NoInlining | MethodImplOptions.AggressiveOptimization)]
private static int Problem(ref int addrSrc)
{
    var addr = MemoryMarshal.CreateSpan(ref addrSrc, 1);
    ref int constAddr = ref *_objAddr;

    if (Unsafe.AreSame(ref MemoryMarshal.GetReference(addr), ref constAddr))
    {
        CallForProblemWithPushes(addr, RvrsRt());
    }

    int i = 0;
    for (; i < 10; i++) { }

    return i;
}

private static readonly int* _objAddr = ConstructObjAddrForPushes();
private static object _objRt;
private static WeakReference _objFlg;

private static int* ConstructObjAddrForPushes()
{
    var obj = GC.AllocateArray<int>(1, pinned: true);
    fixed (int* addr = obj)
    {
        _objRt = obj;
        _objFlg = new WeakReference(obj);
        return addr;
    }
}

[MethodImpl(MethodImplOptions.NoInlining)]
private static long RvrsRt()
{
    Console.WriteLine(_objFlg.IsAlive);
    _objRt = null;
    return 0;
}

[MethodImpl(MethodImplOptions.NoInlining)]
static ref int CallForProblemWithPushes(Span<int> objAddr, long use)
{
    Console.WriteLine(_objFlg.IsAlive);
    return ref objAddr[0];
}

Running under GCStress=0xC, TieredCompilation=0, with this change we will see:

UNDONE: record GCref push [cns]
True
True

If we strip the GC-ness in emitOutputIV:

             // Did we push a GC ref value?
             if (id->idGCref())
             {
 #ifdef DEBUG
+                id->idGCref(GCT_NONE);
                 printf("UNDONE: record GCref push [cns]\n");
 #endif
            }

We instead get a VM assert about a corrupt ref:

UNDONE: record GCref push [cns]
True

Assert failure(PID 6552 [0x00001998], Thread: 25516 [0x63ac]): SanityCheck()

CORECLR! MethodTable::Validate + 0x31 (0x5d836300)
CORECLR! Object::ValidateInner + 0x134 (0x5d840ea4)
CORECLR! Object::Validate + 0xAF (0x5d840d1f)
CORECLR! WKS::GCHeap::Promote + 0x77 (0x5dbc3fd7)
CORECLR! ByRefPointerOffsetsReporter::Report + 0x1C (0x5d876b60)
CORECLR! ByRefPointerOffsetsReporter::Find + 0x65 (0x5d870715)
CORECLR! ByRefPointerOffsetsReporter::Find + 0xDC (0x5d87078c)
CORECLR! ReportPointersFromValueType + 0x3B (0x5d876ba0)
CORECLR! ReportPointersFromValueTypeArg + 0x36 (0x5d876c5b)
CORECLR! MetaSig::GcScanRoots + 0x263 (0x5d870a75)
    File: C:\Users\Accretion\source\dotnet\runtime\src\coreclr\vm\methodtable.cpp Line: 8046
    Image: C:\Users\Accretion\source\dotnet\build\CustomCoreRoot_x86\corerun_x86.exe

(Presumably, when it walks the stack inside CallForProblemWithPushes)

This is "sort of" expected, you can get similar "GC holes" in other ways, because the compiler doesn't model shortening of lifetimes as a side-effect (indeed: constant propagation of byrefs that lead to this in the first place would be illegal if we modelled shortening of lifetimes as a side-effect).

That said, it does not feel right to me to "take advantage" of this being (implicitly) allowed in codegen/emitter specifically that should ideally work in such a way that "lifetimes in" are the same as "lifetimes out".

a) It is clearer and cheaper to directly use the emitter anyway. b) "inst_RV_TT" will be made XARCH-only in an upcoming change.

a) Make them XARCH-only. b) Rewrite them using "OperandDesc".

Enable broader containment for FIELD_LIST operands on x86. The primary benefits come from containing loads that long decomposition has left (e. g. LCL_FLDs): ```diff - mov ecx, dword ptr [ebp+08H] - mov edx, dword ptr [ebp+0CH] - push edx - push ecx + push dword ptr [ebp+0CH] + push dword ptr [ebp+08H] ```

Insert an assert instead about what we expect and handle GCInfo-wise.

BruceForstall

Thanks for all the explanations and updates.

kunalspathak · 2022-04-22T18:57:11Z

Another windows/x86 regression in #68433

dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 23, 2022

ghost added the community-contribution Indicates that the PR has been added by a community member label Feb 23, 2022

SingleAccretion force-pushed the PutArgFldList-Containment branch 2 times, most recently from 4e01bc3 to b62428c Compare February 23, 2022 21:22

SingleAccretion closed this Feb 24, 2022

SingleAccretion reopened this Feb 24, 2022

SingleAccretion closed this Feb 24, 2022

SingleAccretion reopened this Feb 24, 2022

SingleAccretion marked this pull request as ready for review February 24, 2022 16:10

SingleAccretion force-pushed the PutArgFldList-Containment branch from b62428c to 9b93e40 Compare February 25, 2022 20:56

SingleAccretion closed this Feb 26, 2022

SingleAccretion reopened this Feb 26, 2022

JulieLeeMSFT assigned SingleAccretion Mar 4, 2022

JulieLeeMSFT added this to the 7.0.0 milestone Mar 4, 2022

SingleAccretion force-pushed the PutArgFldList-Containment branch 2 times, most recently from 614d1e7 to d383575 Compare March 7, 2022 21:59

JulieLeeMSFT requested a review from BruceForstall March 21, 2022 17:39

JulieLeeMSFT assigned BruceForstall Mar 21, 2022

BruceForstall requested changes Mar 22, 2022

View reviewed changes

src/coreclr/jit/emitxarch.cpp Show resolved Hide resolved

ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels Mar 22, 2022

SingleAccretion mentioned this pull request Mar 24, 2022

[LoongArch64] coreclr-jit directory #62843

Merged

SingleAccretion added 5 commits March 30, 2022 02:12

Remove "inst_RV_TT" usages from non-XARCH code

01a61a1

a) It is clearer and cheaper to directly use the emitter anyway. b) "inst_RV_TT" will be made XARCH-only in an upcoming change.

Refactor "inst_TT" and "inst_RV_TT"

726892f

a) Make them XARCH-only. b) Rewrite them using "OperandDesc".

emitIns_A: add function header

ed38ad8

Remove the "UNDONE: ..." printing

d3d5326

Insert an assert instead about what we expect and handle GCInfo-wise.

SingleAccretion force-pushed the PutArgFldList-Containment branch from 19ab275 to d3d5326 Compare March 30, 2022 20:41

BruceForstall approved these changes Mar 31, 2022

View reviewed changes

BruceForstall merged commit e402d8e into dotnet:main Mar 31, 2022

SingleAccretion deleted the PutArgFldList-Containment branch March 31, 2022 21:11

kunalspathak mentioned this pull request Apr 22, 2022

Regressions in System.Numerics.Tests.Constructor #68410

Closed

SingleAccretion mentioned this pull request Apr 23, 2022

[Perf] Changes at 4/3/2022 6:13:34 PM #68433

Closed

JulieLeeMSFT mentioned this pull request Apr 29, 2022

What's new in .NET 7 Preview 4 [WIP] dotnet/core#7378

Closed

SingleAccretion mentioned this pull request May 13, 2022

Understanding codegen behavior around ref and struct locals #69321

Closed

ghost locked as resolved and limited conversation to collaborators May 23, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Memory containment for `FIELD_LIST` operands on x86 #65803

Memory containment for `FIELD_LIST` operands on x86 #65803

SingleAccretion commented Feb 23, 2022 •

edited

Loading

ghost commented Feb 23, 2022

SingleAccretion commented Feb 24, 2022

BruceForstall commented Mar 22, 2022

BruceForstall commented Mar 22, 2022 •

edited

Loading

BruceForstall commented Mar 22, 2022

BruceForstall left a comment

SingleAccretion commented Mar 22, 2022

BruceForstall commented Mar 22, 2022

SingleAccretion commented Mar 22, 2022

BruceForstall commented Mar 23, 2022

SingleAccretion commented Mar 23, 2022

BruceForstall commented Mar 23, 2022 •

edited

Loading

SingleAccretion commented Mar 23, 2022 •

edited

Loading

BruceForstall commented Mar 23, 2022

SingleAccretion commented Mar 24, 2022

BruceForstall left a comment

kunalspathak commented Apr 22, 2022

Memory containment for FIELD_LIST operands on x86 #65803

Memory containment for FIELD_LIST operands on x86 #65803

Conversation

SingleAccretion commented Feb 23, 2022 • edited Loading

ghost commented Feb 23, 2022

SingleAccretion commented Feb 24, 2022

BruceForstall commented Mar 22, 2022

BruceForstall commented Mar 22, 2022 • edited Loading

BruceForstall commented Mar 22, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

SingleAccretion commented Mar 22, 2022

BruceForstall commented Mar 22, 2022

SingleAccretion commented Mar 22, 2022

BruceForstall commented Mar 23, 2022

SingleAccretion commented Mar 23, 2022

BruceForstall commented Mar 23, 2022 • edited Loading

SingleAccretion commented Mar 23, 2022 • edited Loading

BruceForstall commented Mar 23, 2022

SingleAccretion commented Mar 24, 2022

BruceForstall left a comment

Choose a reason for hiding this comment

kunalspathak commented Apr 22, 2022

Memory containment for `FIELD_LIST` operands on x86 #65803

Memory containment for `FIELD_LIST` operands on x86 #65803

SingleAccretion commented Feb 23, 2022 •

edited

Loading

BruceForstall commented Mar 22, 2022 •

edited

Loading

BruceForstall commented Mar 23, 2022 •

edited

Loading

SingleAccretion commented Mar 23, 2022 •

edited

Loading