Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow StoreLclVar src to be IND/FLD. #59315

Merged
merged 4 commits into from
May 23, 2022

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Sep 19, 2021

This change allows STORE_LCL_VAR(IND/LCL_FLD) to stay as is in lowering without transforming them into STORE_OBJ(ADDR(LCL_VAR), IND/LCL_FLD) and allows the local to be enregistered.

You can see the changes with COMPLUS_JitEnregStats=1, for example, for benchmarks.run.windows.x64.checked.mch:

-total number of locals: 346736, number of enregistered: 309780, notEnreg: 36956, ratio: 0.89
-total number of struct locals: 10920, number of enregistered: 86, notEnreg: 10834, ratio: 0.01
+total number of locals: 346736, number of enregistered: 309849, notEnreg: 36887, ratio: 0.89
+total number of struct locals: 10920, number of enregistered: 155, notEnreg: 10765, ratio: 0.01

-m_blockOp 449, ratio: 0.01
+m_blockOp 379, ratio: 0.01
+m_lclAddrNode 1, ratio: 0.00

there are some good diffs:

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.crossgen2.windows.x64.checked.mch
 Total bytes of delta: -18966 (-0.83% of base)
 696 total files with Code Size differences (615 improved, 81 regressed), 4834 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\coreclr_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -26884 (-0.78% of base)
 374 total files with Code Size differences (356 improved, 18 regressed), 2652 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\benchmarks.run.windows.x64.checked.mch
 Total bytes of delta: -1084 (-2.59% of base)
 31 total files with Code Size differences (29 improved, 2 regressed), 7 unchanged.

There are regressions due to SIMD register upper save/restore. Imagine that we write and read a local once, but between these 2 points we have N calls, it will cause 10 UpperVectoreStore/Restore to be generated. This is an LSRA issue, it should be able to see that SIMD lclVar is rarely used but has to be saved/restored often so it is not profitable to allocate it to a register.
I have discussed this with @kunalspathak and he showed me that there are a few places already where we make such decisions:

#if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
else if (currentInterval->isUpperVector)
{
// This is a save or restore of the upper half of a large vector lclVar.
Interval* lclVarInterval = currentInterval->relatedInterval;
assert(lclVarInterval->isLocalVar);
if (refType == RefTypeUpperVectorSave)
{
if ((lclVarInterval->physReg == REG_NA) ||
(lclVarInterval->isPartiallySpilled && (currentInterval->physReg == REG_STK)))
{
allocate = false;
}
else
{
lclVarInterval->isPartiallySpilled = true;
}
}
else if (refType == RefTypeUpperVectorRestore)
{
assert(currentInterval->isUpperVector);
if (lclVarInterval->isPartiallySpilled)
{
lclVarInterval->isPartiallySpilled = false;
}
else
{
allocate = false;
}
}
}
else if (refType == RefTypeUpperVectorSave)
{
assert(!currentInterval->isLocalVar);
// Note that this case looks a lot like the case below, but in this case we need to spill
// at the previous RefPosition.
// We may want to consider allocating two callee-save registers for this case, but it happens rarely
// enough that it may not warrant the additional complexity.
if (assignedRegister != REG_NA)
{
unassignPhysReg(getRegisterRecord(assignedRegister), currentInterval->firstRefPosition);
INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
}
currentRefPosition->registerAssignment = RBM_NONE;
continue;
}
#endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE

#if FEATURE_PARTIAL_SIMD_CALLEE_SAVE && defined(TARGET_XARCH)
// We can also avoid allocating a register (in fact we don't want to) if we have
// an UpperVectorRestore on xarch where the value is on the stack.
if ((currentRefPosition->refType == RefTypeUpperVectorRestore) && (currentInterval->physReg == REG_NA))
{
assert(currentRefPosition->regOptional);
allocate = false;
}
#endif

The worst is:

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

The regressions are not in benchmarks/asp so I think we can create an issue about it but take the change as is and plan the fix for 7.0.

Full diffs
  Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\aspnet.run.windows.x64.checked.mch
 Total bytes of delta: 18 (0.00% of base)
 31 total files with Code Size differences (17 improved, 14 regressed), 261 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\benchmarks.run.windows.x64.checked.mch
 Total bytes of delta: -1084 (-2.59% of base)
 31 total files with Code Size differences (29 improved, 2 regressed), 7 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\coreclr_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -26884 (-0.78% of base)
 374 total files with Code Size differences (356 improved, 18 regressed), 2652 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.crossgen2.windows.x64.checked.mch
 Total bytes of delta: -18966 (-0.83% of base)
 696 total files with Code Size differences (615 improved, 81 regressed), 4834 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.pmi.windows.x64.checked.mch
 Total bytes of delta: -7902 (-0.62% of base)
 406 total files with Code Size differences (369 improved, 37 regressed), 3466 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -9172 (-1.28% of base)
 565 total files with Code Size differences (454 improved, 111 regressed), 174 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch
 Total bytes of delta: -9328 (-0.15% of base)
 282 total files with Code Size differences (223 improved, 59 regressed), 752 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.crossgen2.Linux.arm64.checked.mch
 Total bytes of delta: 26804 (1.06% of base)
 3112 total files with Code Size differences (731 improved, 2381 regressed), 970 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.pmi.Linux.arm64.checked.mch
 Total bytes of delta: 36544 (3.81% of base)
 2464 total files with Code Size differences (85 improved, 2379 regressed), 103 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries_tests.pmi.Linux.arm64.checked.mch
 Total bytes of delta: -1332 (-0.35% of base)
 235 total files with Code Size differences (192 improved, 43 regressed), 111 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\benchmarks.run.Linux.x64.checked.mch
 Total bytes of delta: -400 (-1.48% of base)
 27 total files with Code Size differences (26 improved, 1 regressed), 3 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\coreclr_tests.pmi.Linux.x64.checked.mch
 Total bytes of delta: -1749 (-0.06% of base)
 147 total files with Code Size differences (147 improved, 0 regressed), 2632 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries.crossgen2.Linux.x64.checked.mch
 Total bytes of delta: -9922 (-0.60% of base)
 270 total files with Code Size differences (254 improved, 16 regressed), 2788 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries.pmi.Linux.x64.checked.mch
 Total bytes of delta: -3411 (-0.49% of base)
 145 total files with Code Size differences (140 improved, 5 regressed), 1331 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries_tests.pmi.Linux.x64.checked.mch
 Total bytes of delta: -4927 (-1.87% of base)
 233 total files with Code Size differences (218 improved, 15 regressed), 39 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\benchmarks.run.windows.arm64.checked.mch
 Total bytes of delta: -532 (-0.99% of base)
 20 total files with Code Size differences (19 improved, 1 regressed), 26 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\coreclr_tests.pmi.windows.arm64.checked.mch
 Total bytes of delta: -9484 (-0.15% of base)
 291 total files with Code Size differences (231 improved, 60 regressed), 751 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.crossgen2.windows.arm64.checked.mch
 Total bytes of delta: 26372 (1.02% of base)
 3173 total files with Code Size differences (774 improved, 2399 regressed), 978 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries_tests.pmi.windows.arm64.checked.mch
 Total bytes of delta: -1060 (-0.20% of base)
 414 total files with Code Size differences (277 improved, 137 regressed), 133 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\benchmarks.run.windows.x86.checked.mch
 Total bytes of delta: -701 (-2.86% of base)
 20 total files with Code Size differences (20 improved, 0 regressed), 12 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\coreclr_tests.pmi.windows.x86.checked.mch
 Total bytes of delta: -2992 (-0.14% of base)
 404 total files with Code Size differences (404 improved, 0 regressed), 193 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries.crossgen2.windows.x86.checked.mch
 Total bytes of delta: -25537 (-1.59% of base)
 2593 total files with Code Size differences (2361 improved, 232 regressed), 1493 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries.pmi.windows.x86.checked.mch
 Total bytes of delta: -4008 (-0.79% of base)
 116 total files with Code Size differences (114 improved, 2 regressed), 1379 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries_tests.pmi.windows.x86.checked.mch
 Total bytes of delta: -10404 (-1.24% of base)
 315 total files with Code Size differences (313 improved, 2 regressed), 432 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\coreclr_tests.pmi.Linux.arm.checked.mch
 Total bytes of delta: -90 (-0.00% of base)
 14 total files with Code Size differences (14 improved, 0 regressed), 372 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries.crossgen2.Linux.arm.checked.mch
 Total bytes of delta: -1518 (-0.15% of base)
 424 total files with Code Size differences (418 improved, 6 regressed), 1069 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries.pmi.Linux.arm.checked.mch
 Total bytes of delta: -4 (-0.00% of base)
 6 total files with Code Size differences (4 improved, 2 regressed), 44 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries_tests.pmi.Linux.arm.checked.mch
 Total bytes of delta: -222 (-0.32% of base)
 52 total files with Code Size differences (52 improved, 0 regressed), 29 unchanged.

Diff example
-; Total bytes of code 35, prolog size 7, PerfScore 16.65, instruction count 9
+; Total bytes of code 15, prolog size 3, PerfScore 9.95, instruction count 5

-G_M9563_IG01:        ; func=00, offs=000000H, size=0007H
+G_M9563_IG01:        ; func=00, offs=000000H, size=0003H

-sub      rsp, 24
-vzeroupper 
+vzeroupper 

-G_M9563_IG02:        ; offs=000007H, size=0017H
+G_M9563_IG02:        ; offs=000003H, size=000BH

-vmovupd  xmm0, xmmword ptr [rcx]
-vmovupd  xmmword ptr [V03 rsp+08H], xmm0
-vmovupd  xmm0, xmmword ptr [V03 rsp+08H]
-vmovupd  xmmword ptr [rdx], xmm0
-mov      rax, rdx
+vmovupd  xmm0, xmmword ptr [rcx]
+vmovupd  xmmword ptr [rdx], xmm0
+mov      rax, rdx

-G_M9563_IG03:        ; offs=00001EH, size=0005H
+G_M9563_IG03:        ; offs=00000EH, size=0001H

-add      rsp, 24
ret

// It usually happens when the real type is a small struct.
loadIns = INS_mov;
}
emit->emitInsLoadInd(loadIns, emitTypeSize(tree), tree->GetRegNum(), tree);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This logic allows us to generate STORE_LCL_VAR struct<1> V01(IND struct(addr) as

mov al, [addr]
mov  [V11], al

instead of

movsx rax, [addr]
mov  [V11], al

The solution with additional flag does not look very nice but it was better than alternatives that I have tried:

  1. support contained IND under STORE_LCL_VAR: it required too many changes in LSRA/codegen/emitter and it was hard to make it work for all cases and it created many new checks for this scenario;
  2. type the IND as int so we do mov rax, [addr] instead of mov al, [addr]. This could be a solution without any additional diffs but I am not sure if this transformation is correct.
    We currently use it for nullchecks already since https://github.com/dotnet/runtime/pull/37991/files so NULLCHECK byte addr is currently transformed into NULLCHECK int addr but if the addr is the last byte on a last readable page won't it cause issues? Maybe we should reconsider this transformation (in
    ind->gtType = TYP_INT;
    )
    could somebody please confirm if it can't cause issues?

image

@sandreenko
Copy link
Contributor Author

PTAL @kunalspathak @BruceForstall. Could you please also trigger JitStress/JitStressRegs and tag jitcontrib?

@danmoseley danmoseley added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 19, 2021
@ghost
Copy link

ghost commented Sep 19, 2021

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This change allows STORE_LCL_VAR(IND/LCL_FLD) to stay as is in lowering without transforming them into STORE_OBJ(ADDR(LCL_VAR), IND/LCL_FLD) and allows the local to be enregistered.

You can see the changes with COMPLUS_JitEnregStats=1, for example, for benchmarks.run.windows.x64.checked.mch:

-total number of locals: 346736, number of enregistered: 309780, notEnreg: 36956, ratio: 0.89
-total number of struct locals: 10920, number of enregistered: 86, notEnreg: 10834, ratio: 0.01
+total number of locals: 346736, number of enregistered: 309849, notEnreg: 36887, ratio: 0.89
+total number of struct locals: 10920, number of enregistered: 155, notEnreg: 10765, ratio: 0.01

-m_blockOp 449, ratio: 0.01
+m_blockOp 379, ratio: 0.01
+m_lclAddrNode 1, ratio: 0.00

there are some good diffs:

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.crossgen2.windows.x64.checked.mch
 Total bytes of delta: -18966 (-0.83% of base)
 696 total files with Code Size differences (615 improved, 81 regressed), 4834 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\coreclr_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -26884 (-0.78% of base)
 374 total files with Code Size differences (356 improved, 18 regressed), 2652 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\benchmarks.run.windows.x64.checked.mch
 Total bytes of delta: -1084 (-2.59% of base)
 31 total files with Code Size differences (29 improved, 2 regressed), 7 unchanged.

There are regressions due to SIMD register upper save/restore. Imagine that we write and read a local once, but between these 2 points we have N calls, it will cause 10 UpperVectoreStore/Restore to be generated. This is an LSRA issue, it should be able to see that SIMD lclVar is rarely used but has to be saved/restored often so it is not profitable to allocate it to a register.
I have discussed this with @kunalspathak and he showed me that there are a few places already where we make such decisions:

#if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
else if (currentInterval->isUpperVector)
{
// This is a save or restore of the upper half of a large vector lclVar.
Interval* lclVarInterval = currentInterval->relatedInterval;
assert(lclVarInterval->isLocalVar);
if (refType == RefTypeUpperVectorSave)
{
if ((lclVarInterval->physReg == REG_NA) ||
(lclVarInterval->isPartiallySpilled && (currentInterval->physReg == REG_STK)))
{
allocate = false;
}
else
{
lclVarInterval->isPartiallySpilled = true;
}
}
else if (refType == RefTypeUpperVectorRestore)
{
assert(currentInterval->isUpperVector);
if (lclVarInterval->isPartiallySpilled)
{
lclVarInterval->isPartiallySpilled = false;
}
else
{
allocate = false;
}
}
}
else if (refType == RefTypeUpperVectorSave)
{
assert(!currentInterval->isLocalVar);
// Note that this case looks a lot like the case below, but in this case we need to spill
// at the previous RefPosition.
// We may want to consider allocating two callee-save registers for this case, but it happens rarely
// enough that it may not warrant the additional complexity.
if (assignedRegister != REG_NA)
{
unassignPhysReg(getRegisterRecord(assignedRegister), currentInterval->firstRefPosition);
INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
}
currentRefPosition->registerAssignment = RBM_NONE;
continue;
}
#endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lsra.cpp#L5351-L5359

The worst is:

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

The regressions are not in benchmarks/asp so I think we can create an issue about it but take the change as is and plan the fix for 7.0.

Full diffs
  Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\aspnet.run.windows.x64.checked.mch
 Total bytes of delta: 18 (0.00% of base)
 31 total files with Code Size differences (17 improved, 14 regressed), 261 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\benchmarks.run.windows.x64.checked.mch
 Total bytes of delta: -1084 (-2.59% of base)
 31 total files with Code Size differences (29 improved, 2 regressed), 7 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\coreclr_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -26884 (-0.78% of base)
 374 total files with Code Size differences (356 improved, 18 regressed), 2652 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.crossgen2.windows.x64.checked.mch
 Total bytes of delta: -18966 (-0.83% of base)
 696 total files with Code Size differences (615 improved, 81 regressed), 4834 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.pmi.windows.x64.checked.mch
 Total bytes of delta: -7902 (-0.62% of base)
 406 total files with Code Size differences (369 improved, 37 regressed), 3466 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -9172 (-1.28% of base)
 565 total files with Code Size differences (454 improved, 111 regressed), 174 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch
 Total bytes of delta: -9328 (-0.15% of base)
 282 total files with Code Size differences (223 improved, 59 regressed), 752 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.crossgen2.Linux.arm64.checked.mch
 Total bytes of delta: 26804 (1.06% of base)
 3112 total files with Code Size differences (731 improved, 2381 regressed), 970 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.pmi.Linux.arm64.checked.mch
 Total bytes of delta: 36544 (3.81% of base)
 2464 total files with Code Size differences (85 improved, 2379 regressed), 103 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries_tests.pmi.Linux.arm64.checked.mch
 Total bytes of delta: -1332 (-0.35% of base)
 235 total files with Code Size differences (192 improved, 43 regressed), 111 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\benchmarks.run.Linux.x64.checked.mch
 Total bytes of delta: -400 (-1.48% of base)
 27 total files with Code Size differences (26 improved, 1 regressed), 3 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\coreclr_tests.pmi.Linux.x64.checked.mch
 Total bytes of delta: -1749 (-0.06% of base)
 147 total files with Code Size differences (147 improved, 0 regressed), 2632 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries.crossgen2.Linux.x64.checked.mch
 Total bytes of delta: -9922 (-0.60% of base)
 270 total files with Code Size differences (254 improved, 16 regressed), 2788 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries.pmi.Linux.x64.checked.mch
 Total bytes of delta: -3411 (-0.49% of base)
 145 total files with Code Size differences (140 improved, 5 regressed), 1331 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries_tests.pmi.Linux.x64.checked.mch
 Total bytes of delta: -4927 (-1.87% of base)
 233 total files with Code Size differences (218 improved, 15 regressed), 39 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\benchmarks.run.windows.arm64.checked.mch
 Total bytes of delta: -532 (-0.99% of base)
 20 total files with Code Size differences (19 improved, 1 regressed), 26 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\coreclr_tests.pmi.windows.arm64.checked.mch
 Total bytes of delta: -9484 (-0.15% of base)
 291 total files with Code Size differences (231 improved, 60 regressed), 751 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.crossgen2.windows.arm64.checked.mch
 Total bytes of delta: 26372 (1.02% of base)
 3173 total files with Code Size differences (774 improved, 2399 regressed), 978 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries_tests.pmi.windows.arm64.checked.mch
 Total bytes of delta: -1060 (-0.20% of base)
 414 total files with Code Size differences (277 improved, 137 regressed), 133 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\benchmarks.run.windows.x86.checked.mch
 Total bytes of delta: -701 (-2.86% of base)
 20 total files with Code Size differences (20 improved, 0 regressed), 12 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\coreclr_tests.pmi.windows.x86.checked.mch
 Total bytes of delta: -2992 (-0.14% of base)
 404 total files with Code Size differences (404 improved, 0 regressed), 193 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries.crossgen2.windows.x86.checked.mch
 Total bytes of delta: -25537 (-1.59% of base)
 2593 total files with Code Size differences (2361 improved, 232 regressed), 1493 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries.pmi.windows.x86.checked.mch
 Total bytes of delta: -4008 (-0.79% of base)
 116 total files with Code Size differences (114 improved, 2 regressed), 1379 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries_tests.pmi.windows.x86.checked.mch
 Total bytes of delta: -10404 (-1.24% of base)
 315 total files with Code Size differences (313 improved, 2 regressed), 432 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\coreclr_tests.pmi.Linux.arm.checked.mch
 Total bytes of delta: -90 (-0.00% of base)
 14 total files with Code Size differences (14 improved, 0 regressed), 372 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries.crossgen2.Linux.arm.checked.mch
 Total bytes of delta: -1518 (-0.15% of base)
 424 total files with Code Size differences (418 improved, 6 regressed), 1069 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries.pmi.Linux.arm.checked.mch
 Total bytes of delta: -4 (-0.00% of base)
 6 total files with Code Size differences (4 improved, 2 regressed), 44 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries_tests.pmi.Linux.arm.checked.mch
 Total bytes of delta: -222 (-0.32% of base)
 52 total files with Code Size differences (52 improved, 0 regressed), 29 unchanged.

Diff example
-; Total bytes of code 35, prolog size 7, PerfScore 16.65, instruction count 9
+; Total bytes of code 15, prolog size 3, PerfScore 9.95, instruction count 5

-G_M9563_IG01:        ; func=00, offs=000000H, size=0007H
+G_M9563_IG01:        ; func=00, offs=000000H, size=0003H

-sub      rsp, 24
-vzeroupper 
+vzeroupper 

-G_M9563_IG02:        ; offs=000007H, size=0017H
+G_M9563_IG02:        ; offs=000003H, size=000BH

-vmovupd  xmm0, xmmword ptr [rcx]
-vmovupd  xmmword ptr [V03 rsp+08H], xmm0
-vmovupd  xmm0, xmmword ptr [V03 rsp+08H]
-vmovupd  xmmword ptr [rdx], xmm0
-mov      rax, rdx
+vmovupd  xmm0, xmmword ptr [rcx]
+vmovupd  xmmword ptr [rdx], xmm0
+mov      rax, rdx

-G_M9563_IG03:        ; offs=00001EH, size=0005H
+G_M9563_IG03:        ; offs=00000EH, size=0001H

-add      rsp, 24
ret
Author: sandreenko
Assignees: -
Labels:

area-CodeGen-coreclr

Milestone: -

@kunalspathak
Copy link
Member

@dotnet/jit-contrib

@jakobbotsch
Copy link
Member

type the IND as int so we do mov rax, [addr] instead of mov al, [addr]. This could be a solution without any additional diffs but I am not sure if this transformation is correct.
We currently use it for nullchecks already since https://github.com/dotnet/runtime/pull/37991/files so NULLCHECK byte addr is currently transformed into NULLCHECK int addr but if the addr is the last byte on a last readable page won't it cause issues?

Related: #58874

However do we ever see null-check nodes on addresses that aren't pointer aligned?

@jakobbotsch
Copy link
Member

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@sandreenko
Copy link
Contributor Author

However do we ever see null-check nodes on addresses that aren't pointer aligned?

I don't see why we can't do something like:

    static void ByteNullCheck(ref byte bytePointer)
    {
        byte v = 0;
        v = bytePointer;

    }

if roslyn does not optimize it JIT will see:
ASG(v, INDIR(bytePointer)) and will do DeadStoreElimination and replace IND(bytePointer) with NULLCHECK(bytePointer)

I have tried it and it seems working:

    [ 0]   3 (0x003) ldarg.0
    [ 1]   4 (0x004) ldind.u1
    [ 1]   5 (0x005) stloc.0

STMT00003 (IL 0x003...  ???)
               [000008] -A-XG-------              *  ASG       int
               [000007] D------N----              +--*  LCL_VAR   int    V01 loc0
               [000006] *--XG-------              \--*  IND       ubyte
               [000005] ------------                 \--*  LCL_VAR   byref  V00 arg0

    [ 0]   6 (0x006) ret

STMT00004 (IL 0x006...  ???)
               [000009] ------------              *  RETURN    void

looks like we need to stop changing type in TransformUnusedIndirection and in LowerRetStruct and use the flag that I add in this PR to avoid longer instructions on XARCH.
There is a problem though, TransformUnusedIndirection changes types to int also because in some cases the original type is STRUCT and we don't know the real size of it, the user carries this information. We will need to add an additional argument to TransformUnusedIndirection(..., size).

@sandreenko sandreenko marked this pull request as ready for review September 21, 2021 00:36
@sandreenko
Copy link
Contributor Author

The failures look unrelated, @kunalspathak could I ask you to open an issue about upper save/restore cause you can add labels and know the area better?

@kunalspathak
Copy link
Member

jitstressregs/jitstress failures related to #58481

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While the changes seems good to me, the arm64 regression is not small to ignore.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.crossgen2.Linux.arm64.checked.mch
 Total bytes of delta: 26804 (1.06% of base)
 3112 total files with Code Size differences (731 improved, 2381 regressed), 970 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.pmi.Linux.arm64.checked.mch
 Total bytes of delta: 36544 (3.81% of base)
 2464 total files with Code Size differences (85 improved, 2379 regressed), 103 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.crossgen2.windows.arm64.checked.mch
 Total bytes of delta: 26372 (1.02% of base)
 3173 total files with Code Size differences (774 improved, 2399 regressed), 978 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

Do you mind sharing the perf score numbers along couple of diff examples? If the arm64 regressions are related to SIMD save/restore, the sensible thing to do would be to fix it first before taking this change.

@sandreenko
Copy link
Contributor Author

Do you mind sharing the perf score numbers along couple of diff examples? If the arm64 regressions are related to SIMD save/restore, the sensible thing to do would be to fix it first before taking this change.

Here are the full diffs including PerfScore and an example of JitDump diff, search for "Active intervals at end of allocation:" in the diff to see the new refPositions
spmi.diffs.IndLclFld.txt
base.txt
diff.txt
.

@kunalspathak
Copy link
Member

Ping to myself.

@JulieLeeMSFT
Copy link
Member

Ping @kunalspathak.

@kunalspathak
Copy link
Member

Hoping to look at it this week.

@kunalspathak
Copy link
Member

@sandreenko - do you mind rebasing your changes to get the fresh test result? We also have asmdiff pipeline now which will help us see the diffs across various collections/platforms.

@sandreenko
Copy link
Contributor Author

@sandreenko - do you mind rebasing your changes to get the fresh test result? We also have asmdiff pipeline now which will help us see the diffs across various collections/platforms.

Sure, unfortunately, I have just moved to NC and my desktop is still on its way, once I get it and assemble it I will update the diff.

@sandreenko sandreenko force-pushed the enregSupportIndAndLclFldV2 branch 2 times, most recently from 532d2f6 to a653110 Compare January 24, 2022 06:52
@sandreenko
Copy link
Contributor Author

@sandreenko - do you mind rebasing your changes to get the fresh test result? We also have asmdiff pipeline now which will help us see the diffs across various collections/platforms.

Done. It looks like SuperPmi has unrelated failures right now so I could not repeat the testing.

@JulieLeeMSFT JulieLeeMSFT added this to the 7.0.0 milestone Jan 25, 2022
@kunalspathak
Copy link
Member

@sandreenko - I want to look into your PR and understand the regressions. Unfortunately, I am busy with #64130 with various errors. Once I wrap that up, I will look into this PR. Thanks!

@kunalspathak
Copy link
Member

Ping to myself.

@sandreenko
Copy link
Contributor Author

I am back now, I am planning to change this PR to x64/arm32/x86 so there are no regressions. We will enable arm64 when storeuppervector issue is fixed.

@ghost ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 25, 2022
@sandreenko sandreenko force-pushed the enregSupportIndAndLclFldV2 branch from a653110 to 6897d87 Compare April 30, 2022 00:31
@sandreenko
Copy link
Contributor Author

sandreenko commented Apr 30, 2022

@kunalspathak the diff was updated to exclude arm64.

New diffs:

[20:27:52] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.aspnet.run.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.aspnet.run.windows.x64.checked.1\diff
 [20:28:02] Total bytes of delta: 233 (0.00% of base)
 [20:28:03] 38 total files with Code Size differences (14 improved, 24 regressed), 445 unchanged.

[20:28:09] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.x64.checked.1\diff
 [20:28:10] Total bytes of delta: -968 (-0.01% of base)
 [20:28:10] 32 total files with Code Size differences (27 improved, 5 regressed), 7 unchanged.

[20:29:12] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.x64.checked.1\diff
 [20:30:33] Total bytes of delta: -14898 (-0.01% of base)
 [20:30:37] 214 total files with Code Size differences (186 improved, 28 regressed), 3007 unchanged.

[20:30:53] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.x64.checked.1\diff
 [20:32:30] Total bytes of delta: -31151 (-0.09% of base)
 [20:32:34] 670 total files with Code Size differences (558 improved, 112 regressed), 3522 unchanged.

[20:32:58] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.x64.checked.1\diff
 [20:34:26] Total bytes of delta: -26584 (-0.05% of base)
 [20:34:30] 424 total files with Code Size differences (320 improved, 104 regressed), 3458 unchanged.

[20:35:22] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.x64.checked.1\diff
 [20:35:43] Total bytes of delta: -11866 (-0.01% of base)
 [20:35:45] 722 total files with Code Size differences (515 improved, 207 regressed), 193 unchanged.

[20:36:48] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.arm64.checked.1\diff
 [20:36:51] Total bytes of delta: -504 (-0.00% of base)
 [20:36:52] 100 total files with Code Size differences (100 improved, 0 regressed), 36 unchanged.

[20:37:00] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.arm64.checked.1\diff
 [20:37:02] Total bytes of delta: -76 (-0.00% of base)
 [20:37:03] 15 total files with Code Size differences (15 improved, 0 regressed), 77 unchanged.

[20:37:25] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.arm64.checked.1\diff
 [20:37:28] Total bytes of delta: -36 (-0.00% of base)
 [20:37:29] 7 total files with Code Size differences (7 improved, 0 regressed), 106 unchanged.

[20:38:19] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.arm64.checked.1\diff
 [20:38:22] Total bytes of delta: -300 (-0.00% of base)
 [20:38:23] 52 total files with Code Size differences (52 improved, 0 regressed), 117 unchanged.

[20:38:29] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.Linux.x64.checked.1\diff
 [20:38:42] Total bytes of delta: -567 (-0.00% of base)
 [20:38:43] 29 total files with Code Size differences (26 improved, 3 regressed), 590 unchanged.

[20:39:39] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.x64.checked.1\diff
 [20:40:52] Total bytes of delta: -3391 (-0.00% of base)
 [20:40:56] 180 total files with Code Size differences (169 improved, 11 regressed), 2998 unchanged.

[20:41:01] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.x64.checked.1\diff
 [20:41:03] Total bytes of delta: -2877 (-0.03% of base)
 [20:41:03] 65 total files with Code Size differences (64 improved, 1 regressed), 34 unchanged.

[20:41:26] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.x64.checked.1\diff
 [20:41:58] Total bytes of delta: -1894 (-0.00% of base)
 [20:42:00] 145 total files with Code Size differences (74 improved, 71 regressed), 1339 unchanged.

[20:42:49] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.x64.checked.1\diff
 [20:42:59] Total bytes of delta: -8360 (-0.01% of base)
 [20:43:00] 315 total files with Code Size differences (275 improved, 40 regressed), 99 unchanged.

[20:43:06] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.arm64.checked.1\diff
 [20:43:07] Total bytes of delta: 0 (0.00% of base)
 [20:43:07] 0 total files with Code Size differences (0 improved, 0 regressed), 23 unchanged.

[20:44:07] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.arm64.checked.1\diff
 [20:44:09] Total bytes of delta: -504 (-0.00% of base)
 [20:44:10] 100 total files with Code Size differences (100 improved, 0 regressed), 36 unchanged.

[20:44:25] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.arm64.checked.1\diff
 [20:44:27] Total bytes of delta: -144 (-0.00% of base)
 [20:44:28] 28 total files with Code Size differences (28 improved, 0 regressed), 88 unchanged.

[20:44:51] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.arm64.checked.1\diff
 [20:44:54] Total bytes of delta: -36 (-0.00% of base)
 [20:44:54] 7 total files with Code Size differences (7 improved, 0 regressed), 113 unchanged.

[20:45:48] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.arm64.checked.1\diff
 [20:45:52] Total bytes of delta: -300 (-0.00% of base)
 [20:45:53] 52 total files with Code Size differences (52 improved, 0 regressed), 126 unchanged.

[20:46:01] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.x86.checked.1\diff
 [20:46:02] Total bytes of delta: -667 (-0.01% of base)
 [20:46:03] 22 total files with Code Size differences (19 improved, 3 regressed), 7 unchanged.

[20:47:10] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.x86.checked.1\diff
 [20:47:57] Total bytes of delta: -4096 (-0.00% of base)
 [20:47:58] 296 total files with Code Size differences (290 improved, 6 regressed), 361 unchanged.

[20:48:17] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.x86.checked.1\diff
 [20:49:19] Total bytes of delta: -1744 (-0.01% of base)
 [20:49:22] 140 total files with Code Size differences (103 improved, 37 regressed), 1454 unchanged.

[20:49:56] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.x86.checked.1\diff
 [20:50:57] Total bytes of delta: -3330 (-0.01% of base)
 [20:50:59] 184 total files with Code Size differences (114 improved, 70 regressed), 1359 unchanged.

[20:52:16] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.x86.checked.1\diff
 [20:52:51] Total bytes of delta: -12715 (-0.01% of base)
 [20:52:53] 390 total files with Code Size differences (362 improved, 28 regressed), 473 unchanged.

[20:54:15] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.arm.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.arm.checked.1\diff
 [20:56:08] Total bytes of delta: -528 (-0.00% of base)
 [20:56:10] 167 total files with Code Size differences (158 improved, 9 regressed), 360 unchanged.

[20:56:24] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.arm.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.arm.checked.1\diff
 [20:56:29] Total bytes of delta: -6 (-0.00% of base)
 [20:56:29] 61 total files with Code Size differences (34 improved, 27 regressed), 31 unchanged.

[20:57:02] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.arm.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.arm.checked.1\diff
 [20:57:06] Total bytes of delta: 288 (0.00% of base)
 [20:57:06] 57 total files with Code Size differences (4 improved, 53 regressed), 17 unchanged.

[20:58:20] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.arm.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.arm.checked.1\diff
 [20:58:25] Total bytes of delta: -248 (-0.00% of base)
 [20:58:26] 108 total files with Code Size differences (66 improved, 42 regressed), 25 unchanged.

the regressions are for functions with many funclet epilogues when we restore callee-save xmm in each of them, also could be improved in LSRA.

Full diffs:

spmi.base.diff.txt

@sandreenko
Copy link
Contributor Author

@kunalspathak could you please review the new diff?

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great. Few questions and it should be ready to go. I did check the diffs (specially those on x64) and most of them are coming because we need more registers and we would save the existing values on stack inside prolog before using them, so that should be fine.

E.g. Windows/x64 asp

38 total files with Code Size differences (14 improved, 24 regressed), 445 unchanged.

Top method regressions (bytes):
          74 (1.32 % of base) : 65671.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          74 (1.32 % of base) : 52754.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          56 (2.45 % of base) : 66005.dasm - <DisposeAsync>d__17:MoveNext():this
          56 (2.45 % of base) : 52980.dasm - <DisposeAsync>d__17:MoveNext():this
          36 (1.24 % of base) : 44251.dasm - <DisposeAsync>d__17:MoveNext():this
          36 (1.24 % of base) : 57988.dasm - <DisposeAsync>d__17:MoveNext():this
          26 (0.94 % of base) : 63133.dasm - <DisposeAsync>d__17:MoveNext():this
          25 (1.90 % of base) : 65969.dasm - RelationalCommandDiagnosticsLogger:CommandReaderExecutedAsync(IRelationalConnection,DbCommand,DbContext,Guid,Guid,DbDataReader,DateTimeOffset,TimeSpan,int,CancellationToken):ValueTask`1:this
          25 (1.90 % of base) : 53137.dasm - RelationalCommandDiagnosticsLogger:CommandReaderExecutedAsync(IRelationalConnection,DbCommand,DbContext,Guid,Guid,DbDataReader,DateTimeOffset,TimeSpan,int,CancellationToken):ValueTask`1:this
          17 (0.24 % of base) : 43711.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          17 (0.24 % of base) : 57590.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          15 (1.28 % of base) : 65917.dasm - RelationalCommandDiagnosticsLogger:CommandReaderExecutingAsync(IRelationalConnection,DbCommand,DbContext,Guid,Guid,DateTimeOffset,int,CancellationToken):ValueTask`1:this
          15 (1.28 % of base) : 53131.dasm - RelationalCommandDiagnosticsLogger:CommandReaderExecutingAsync(IRelationalConnection,DbCommand,DbContext,Guid,Guid,DateTimeOffset,int,CancellationToken):ValueTask`1:this
          14 (0.21 % of base) : 62916.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          12 (2.07 % of base) : 29218.dasm - KeyRing:.ctor(IKey,IEnumerable`1):this
           5 (0.12 % of base) : 60963.dasm - RBTree`1:RBInsert(int,int,int,int,bool):int:this
           3 (0.35 % of base) : 65731.dasm - RelationalCommandDiagnosticsLogger:CommandCreated(IRelationalConnection,DbCommand,int,DbContext,Guid,Guid,DateTimeOffset,TimeSpan,int):DbCommand:this
           3 (0.35 % of base) : 53129.dasm - RelationalCommandDiagnosticsLogger:CommandCreated(IRelationalConnection,DbCommand,int,DbContext,Guid,Guid,DateTimeOffset,TimeSpan,int):DbCommand:this
           1 (0.88 % of base) : 60985.dasm - RBTree`1:Minimum(int):int:this
           1 (0.76 % of base) : 60996.dasm - RBTree`1:Successor(int):int:this

// Do it now.
GenTreeIndir* indir = src->AsIndir();
LowerIndir(indir);
#if defined(TARGET_XARCH)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is defined(TARGET_XARCH) needed and rest of the code inside !defined(TARGET_ARM64) includes arm as well?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have GTF_IND_DONT_EXTEND only for x64 because there mov rax, [addr] is smaller than INS_movzx/INS_movsx al, [addr]. On other platforms it is not defined and not needed.

{
if (src->OperIs(GT_OBJ, GT_BLK))
{
src->SetOper(GT_IND);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unless there is an assert somewhere that makes sure that currOper != newOper, just remove the if (src->OperIs(GT_OBJ, GT_BLK)) ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is RecordOperBashing calls inside and I remember that I used to have breakpoints there, I would prefer not to call it if it is not neccessary.

src/coreclr/jit/lower.cpp Outdated Show resolved Hide resolved
@ghost ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels May 10, 2022
@sandreenko sandreenko force-pushed the enregSupportIndAndLclFldV2 branch from ba27770 to 414f3d0 Compare May 20, 2022 23:35
@sandreenko
Copy link
Contributor Author

ping @kunalspathak

@kunalspathak
Copy link
Member

/azp run runtime-coreclr superpmi-diffs, runtime-coreclr superpmi-replay

@azure-pipelines
Copy link

Azure Pipelines successfully started running 2 pipeline(s).

@kunalspathak
Copy link
Member

ping @kunalspathak

Sorry, I forgot about this. I am rerunning superpmi jobs because they timed out in last run.

@kunalspathak
Copy link
Member

LGTM. Thank you!

@kunalspathak kunalspathak merged commit a0ff92c into dotnet:main May 23, 2022
@shushanhf
Copy link
Contributor

Hi, @jakobbotsch
There is an error for LA liking

/home/qiao/work_qiao/push_runtime/src/coreclr/jit/lower.cpp:7259:10: error: no member named 'ClearDontExtend' in 'GenTreeIndir'
    ind->ClearDontExtend();
    ~~~  ^
1 error generated.

@jakobbotsch
Copy link
Member

@shushanhf I assume the ifdef chain there should be

-#ifdef TARGET_ARM64
+#if defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64)
     bool useNullCheck = true;
 #elif TARGET_ARM
     bool           useNullCheck          = false;
 #else  // TARGET_XARCH
     bool useNullCheck = !ind->Addr()->isContained();
     ind->ClearDontExtend();
 #endif // !TARGET_XARCH

Feel free to submit this change with an upcoming PR.

@shushanhf
Copy link
Contributor

@shushanhf I assume the ifdef chain there should be

-#ifdef TARGET_ARM64
+#if defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64)
     bool useNullCheck = true;
 #elif TARGET_ARM
     bool           useNullCheck          = false;
 #else  // TARGET_XARCH
     bool useNullCheck = !ind->Addr()->isContained();
     ind->ClearDontExtend();
 #endif // !TARGET_XARCH

Feel free to submit this change with an upcoming PR.

Thanks.
I had pushed by PR #69711

@kunalspathak
Copy link
Member

kunalspathak commented Jun 2, 2022

Improvements on windows/x64: dotnet/perf-autofiling-issues#5496, dotnet/perf-autofiling-issues#5463

@sandreenko sandreenko deleted the enregSupportIndAndLclFldV2 branch June 3, 2022 03:00
@ghost ghost locked as resolved and limited conversation to collaborators Jul 3, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants