Allow StoreLclVar src to be IND/FLD. #59315

sandreenko · 2021-09-19T10:17:53Z

This change allows STORE_LCL_VAR(IND/LCL_FLD) to stay as is in lowering without transforming them into STORE_OBJ(ADDR(LCL_VAR), IND/LCL_FLD) and allows the local to be enregistered.

You can see the changes with COMPLUS_JitEnregStats=1, for example, for benchmarks.run.windows.x64.checked.mch:

-total number of locals: 346736, number of enregistered: 309780, notEnreg: 36956, ratio: 0.89
-total number of struct locals: 10920, number of enregistered: 86, notEnreg: 10834, ratio: 0.01
+total number of locals: 346736, number of enregistered: 309849, notEnreg: 36887, ratio: 0.89
+total number of struct locals: 10920, number of enregistered: 155, notEnreg: 10765, ratio: 0.01

-m_blockOp 449, ratio: 0.01
+m_blockOp 379, ratio: 0.01
+m_lclAddrNode 1, ratio: 0.00

there are some good diffs:

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.crossgen2.windows.x64.checked.mch
 Total bytes of delta: -18966 (-0.83% of base)
 696 total files with Code Size differences (615 improved, 81 regressed), 4834 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\coreclr_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -26884 (-0.78% of base)
 374 total files with Code Size differences (356 improved, 18 regressed), 2652 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\benchmarks.run.windows.x64.checked.mch
 Total bytes of delta: -1084 (-2.59% of base)
 31 total files with Code Size differences (29 improved, 2 regressed), 7 unchanged.

There are regressions due to SIMD register upper save/restore. Imagine that we write and read a local once, but between these 2 points we have N calls, it will cause 10 UpperVectoreStore/Restore to be generated. This is an LSRA issue, it should be able to see that SIMD lclVar is rarely used but has to be saved/restored often so it is not profitable to allocate it to a register.
I have discussed this with @kunalspathak and he showed me that there are a few places already where we make such decisions:

runtime/src/coreclr/jit/lsra.cpp

Lines 4985 to 5031 in cef245d

    
           #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE 
        
                   else if (currentInterval->isUpperVector) 
        
                   { 
        
                       // This is a save or restore of the upper half of a large vector lclVar. 
        
                       Interval* lclVarInterval = currentInterval->relatedInterval; 
        
                       assert(lclVarInterval->isLocalVar); 
        
                       if (refType == RefTypeUpperVectorSave) 
        
                       { 
        
                           if ((lclVarInterval->physReg == REG_NA) || 
        
                               (lclVarInterval->isPartiallySpilled && (currentInterval->physReg == REG_STK))) 
        
                           { 
        
                               allocate = false; 
        
                           } 
        
                           else 
        
                           { 
        
                               lclVarInterval->isPartiallySpilled = true; 
        
                           } 
        
                       } 
        
                       else if (refType == RefTypeUpperVectorRestore) 
        
                       { 
        
                           assert(currentInterval->isUpperVector); 
        
                           if (lclVarInterval->isPartiallySpilled) 
        
                           { 
        
                               lclVarInterval->isPartiallySpilled = false; 
        
                           } 
        
                           else 
        
                           { 
        
                               allocate = false; 
        
                           } 
        
                       } 
        
                   } 
        
                   else if (refType == RefTypeUpperVectorSave) 
        
                   { 
        
                       assert(!currentInterval->isLocalVar); 
        
                       // Note that this case looks a lot like the case below, but in this case we need to spill 
        
                       // at the previous RefPosition. 
        
                       // We may want to consider allocating two callee-save registers for this case, but it happens rarely 
        
                       // enough that it may not warrant the additional complexity. 
        
                       if (assignedRegister != REG_NA) 
        
                       { 
        
                           unassignPhysReg(getRegisterRecord(assignedRegister), currentInterval->firstRefPosition); 
        
                           INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval)); 
        
                       } 
        
                       currentRefPosition->registerAssignment = RBM_NONE; 
        
                       continue; 
        
                   } 
        
           #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE

runtime/src/coreclr/jit/lsra.cpp

Lines 5351 to 5359 in cef245d

    
           #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE && defined(TARGET_XARCH) 
        
                           // We can also avoid allocating a register (in fact we don't want to) if we have 
        
                           // an UpperVectorRestore on xarch where the value is on the stack. 
        
                           if ((currentRefPosition->refType == RefTypeUpperVectorRestore) && (currentInterval->physReg == REG_NA)) 
        
                           { 
        
                               assert(currentRefPosition->regOptional); 
        
                               allocate = false; 
        
                           } 
        
           #endif

The worst is:

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

The regressions are not in benchmarks/asp so I think we can create an issue about it but take the change as is and plan the fix for 7.0.

Full diffs

  Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\aspnet.run.windows.x64.checked.mch
 Total bytes of delta: 18 (0.00% of base)
 31 total files with Code Size differences (17 improved, 14 regressed), 261 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\benchmarks.run.windows.x64.checked.mch
 Total bytes of delta: -1084 (-2.59% of base)
 31 total files with Code Size differences (29 improved, 2 regressed), 7 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\coreclr_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -26884 (-0.78% of base)
 374 total files with Code Size differences (356 improved, 18 regressed), 2652 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.crossgen2.windows.x64.checked.mch
 Total bytes of delta: -18966 (-0.83% of base)
 696 total files with Code Size differences (615 improved, 81 regressed), 4834 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.pmi.windows.x64.checked.mch
 Total bytes of delta: -7902 (-0.62% of base)
 406 total files with Code Size differences (369 improved, 37 regressed), 3466 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -9172 (-1.28% of base)
 565 total files with Code Size differences (454 improved, 111 regressed), 174 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch
 Total bytes of delta: -9328 (-0.15% of base)
 282 total files with Code Size differences (223 improved, 59 regressed), 752 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.crossgen2.Linux.arm64.checked.mch
 Total bytes of delta: 26804 (1.06% of base)
 3112 total files with Code Size differences (731 improved, 2381 regressed), 970 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.pmi.Linux.arm64.checked.mch
 Total bytes of delta: 36544 (3.81% of base)
 2464 total files with Code Size differences (85 improved, 2379 regressed), 103 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries_tests.pmi.Linux.arm64.checked.mch
 Total bytes of delta: -1332 (-0.35% of base)
 235 total files with Code Size differences (192 improved, 43 regressed), 111 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\benchmarks.run.Linux.x64.checked.mch
 Total bytes of delta: -400 (-1.48% of base)
 27 total files with Code Size differences (26 improved, 1 regressed), 3 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\coreclr_tests.pmi.Linux.x64.checked.mch
 Total bytes of delta: -1749 (-0.06% of base)
 147 total files with Code Size differences (147 improved, 0 regressed), 2632 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries.crossgen2.Linux.x64.checked.mch
 Total bytes of delta: -9922 (-0.60% of base)
 270 total files with Code Size differences (254 improved, 16 regressed), 2788 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries.pmi.Linux.x64.checked.mch
 Total bytes of delta: -3411 (-0.49% of base)
 145 total files with Code Size differences (140 improved, 5 regressed), 1331 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries_tests.pmi.Linux.x64.checked.mch
 Total bytes of delta: -4927 (-1.87% of base)
 233 total files with Code Size differences (218 improved, 15 regressed), 39 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\benchmarks.run.windows.arm64.checked.mch
 Total bytes of delta: -532 (-0.99% of base)
 20 total files with Code Size differences (19 improved, 1 regressed), 26 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\coreclr_tests.pmi.windows.arm64.checked.mch
 Total bytes of delta: -9484 (-0.15% of base)
 291 total files with Code Size differences (231 improved, 60 regressed), 751 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.crossgen2.windows.arm64.checked.mch
 Total bytes of delta: 26372 (1.02% of base)
 3173 total files with Code Size differences (774 improved, 2399 regressed), 978 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries_tests.pmi.windows.arm64.checked.mch
 Total bytes of delta: -1060 (-0.20% of base)
 414 total files with Code Size differences (277 improved, 137 regressed), 133 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\benchmarks.run.windows.x86.checked.mch
 Total bytes of delta: -701 (-2.86% of base)
 20 total files with Code Size differences (20 improved, 0 regressed), 12 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\coreclr_tests.pmi.windows.x86.checked.mch
 Total bytes of delta: -2992 (-0.14% of base)
 404 total files with Code Size differences (404 improved, 0 regressed), 193 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries.crossgen2.windows.x86.checked.mch
 Total bytes of delta: -25537 (-1.59% of base)
 2593 total files with Code Size differences (2361 improved, 232 regressed), 1493 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries.pmi.windows.x86.checked.mch
 Total bytes of delta: -4008 (-0.79% of base)
 116 total files with Code Size differences (114 improved, 2 regressed), 1379 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries_tests.pmi.windows.x86.checked.mch
 Total bytes of delta: -10404 (-1.24% of base)
 315 total files with Code Size differences (313 improved, 2 regressed), 432 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\coreclr_tests.pmi.Linux.arm.checked.mch
 Total bytes of delta: -90 (-0.00% of base)
 14 total files with Code Size differences (14 improved, 0 regressed), 372 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries.crossgen2.Linux.arm.checked.mch
 Total bytes of delta: -1518 (-0.15% of base)
 424 total files with Code Size differences (418 improved, 6 regressed), 1069 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries.pmi.Linux.arm.checked.mch
 Total bytes of delta: -4 (-0.00% of base)
 6 total files with Code Size differences (4 improved, 2 regressed), 44 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries_tests.pmi.Linux.arm.checked.mch
 Total bytes of delta: -222 (-0.32% of base)
 52 total files with Code Size differences (52 improved, 0 regressed), 29 unchanged.

Diff example

-; Total bytes of code 35, prolog size 7, PerfScore 16.65, instruction count 9
+; Total bytes of code 15, prolog size 3, PerfScore 9.95, instruction count 5

-G_M9563_IG01:        ; func=00, offs=000000H, size=0007H
+G_M9563_IG01:        ; func=00, offs=000000H, size=0003H

-sub      rsp, 24
-vzeroupper 
+vzeroupper 

-G_M9563_IG02:        ; offs=000007H, size=0017H
+G_M9563_IG02:        ; offs=000003H, size=000BH

-vmovupd  xmm0, xmmword ptr [rcx]
-vmovupd  xmmword ptr [V03 rsp+08H], xmm0
-vmovupd  xmm0, xmmword ptr [V03 rsp+08H]
-vmovupd  xmmword ptr [rdx], xmm0
-mov      rax, rdx
+vmovupd  xmm0, xmmword ptr [rcx]
+vmovupd  xmmword ptr [rdx], xmm0
+mov      rax, rdx

-G_M9563_IG03:        ; offs=00001EH, size=0005H
+G_M9563_IG03:        ; offs=00000EH, size=0001H

-add      rsp, 24
ret

sandreenko · 2021-09-19T18:55:59Z

src/coreclr/jit/codegenxarch.cpp

+            // It usually happens when the real type is a small struct.
+            loadIns = INS_mov;
+        }
+        emit->emitInsLoadInd(loadIns, emitTypeSize(tree), tree->GetRegNum(), tree);


This logic allows us to generate STORE_LCL_VAR struct<1> V01(IND struct(addr) as

mov al, [addr] mov [V11], al

instead of

movsx rax, [addr] mov [V11], al

The solution with additional flag does not look very nice but it was better than alternatives that I have tried:

support contained IND under STORE_LCL_VAR: it required too many changes in LSRA/codegen/emitter and it was hard to make it work for all cases and it created many new checks for this scenario;

type the IND as int so we do mov rax, [addr] instead of mov al, [addr]. This could be a solution without any additional diffs but I am not sure if this transformation is correct.
We currently use it for nullchecks already since https://github.com/dotnet/runtime/pull/37991/files so NULLCHECK byte addr is currently transformed into NULLCHECK int addr but if the addr is the last byte on a last readable page won't it cause issues? Maybe we should reconsider this transformation (in

runtime/src/coreclr/jit/lower.cpp

Line 6736 in 78593b9

ind->gtType = TYP_INT;

)
could somebody please confirm if it can't cause issues?

sandreenko · 2021-09-19T18:57:11Z

PTAL @kunalspathak @BruceForstall. Could you please also trigger JitStress/JitStressRegs and tag jitcontrib?

ghost · 2021-09-19T23:00:07Z

Tagging subscribers to this area: @JulieLeeMSFT
See info in area-owners.md if you want to be subscribed.

Issue Details

This change allows STORE_LCL_VAR(IND/LCL_FLD) to stay as is in lowering without transforming them into STORE_OBJ(ADDR(LCL_VAR), IND/LCL_FLD) and allows the local to be enregistered.

You can see the changes with COMPLUS_JitEnregStats=1, for example, for benchmarks.run.windows.x64.checked.mch:

-total number of locals: 346736, number of enregistered: 309780, notEnreg: 36956, ratio: 0.89
-total number of struct locals: 10920, number of enregistered: 86, notEnreg: 10834, ratio: 0.01
+total number of locals: 346736, number of enregistered: 309849, notEnreg: 36887, ratio: 0.89
+total number of struct locals: 10920, number of enregistered: 155, notEnreg: 10765, ratio: 0.01

-m_blockOp 449, ratio: 0.01
+m_blockOp 379, ratio: 0.01
+m_lclAddrNode 1, ratio: 0.00

there are some good diffs:

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.crossgen2.windows.x64.checked.mch
 Total bytes of delta: -18966 (-0.83% of base)
 696 total files with Code Size differences (615 improved, 81 regressed), 4834 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\coreclr_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -26884 (-0.78% of base)
 374 total files with Code Size differences (356 improved, 18 regressed), 2652 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\benchmarks.run.windows.x64.checked.mch
 Total bytes of delta: -1084 (-2.59% of base)
 31 total files with Code Size differences (29 improved, 2 regressed), 7 unchanged.

There are regressions due to SIMD register upper save/restore. Imagine that we write and read a local once, but between these 2 points we have N calls, it will cause 10 UpperVectoreStore/Restore to be generated. This is an LSRA issue, it should be able to see that SIMD lclVar is rarely used but has to be saved/restored often so it is not profitable to allocate it to a register.
I have discussed this with @kunalspathak and he showed me that there are a few places already where we make such decisions:

runtime/src/coreclr/jit/lsra.cpp

Lines 4985 to 5031 in cef245d

    
           #if FEATURE_PARTIAL_SIMD_CALLEE_SAVE 
        
                   else if (currentInterval->isUpperVector) 
        
                   { 
        
                       // This is a save or restore of the upper half of a large vector lclVar. 
        
                       Interval* lclVarInterval = currentInterval->relatedInterval; 
        
                       assert(lclVarInterval->isLocalVar); 
        
                       if (refType == RefTypeUpperVectorSave) 
        
                       { 
        
                           if ((lclVarInterval->physReg == REG_NA) || 
        
                               (lclVarInterval->isPartiallySpilled && (currentInterval->physReg == REG_STK))) 
        
                           { 
        
                               allocate = false; 
        
                           } 
        
                           else 
        
                           { 
        
                               lclVarInterval->isPartiallySpilled = true; 
        
                           } 
        
                       } 
        
                       else if (refType == RefTypeUpperVectorRestore) 
        
                       { 
        
                           assert(currentInterval->isUpperVector); 
        
                           if (lclVarInterval->isPartiallySpilled) 
        
                           { 
        
                               lclVarInterval->isPartiallySpilled = false; 
        
                           } 
        
                           else 
        
                           { 
        
                               allocate = false; 
        
                           } 
        
                       } 
        
                   } 
        
                   else if (refType == RefTypeUpperVectorSave) 
        
                   { 
        
                       assert(!currentInterval->isLocalVar); 
        
                       // Note that this case looks a lot like the case below, but in this case we need to spill 
        
                       // at the previous RefPosition. 
        
                       // We may want to consider allocating two callee-save registers for this case, but it happens rarely 
        
                       // enough that it may not warrant the additional complexity. 
        
                       if (assignedRegister != REG_NA) 
        
                       { 
        
                           unassignPhysReg(getRegisterRecord(assignedRegister), currentInterval->firstRefPosition); 
        
                           INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval)); 
        
                       } 
        
                       currentRefPosition->registerAssignment = RBM_NONE; 
        
                       continue; 
        
                   } 
        
           #endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE

https://github.com/dotnet/runtime/blob/main/src/coreclr/jit/lsra.cpp#L5351-L5359

The worst is:

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

The regressions are not in benchmarks/asp so I think we can create an issue about it but take the change as is and plan the fix for 7.0.

Full diffs

  Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\aspnet.run.windows.x64.checked.mch
 Total bytes of delta: 18 (0.00% of base)
 31 total files with Code Size differences (17 improved, 14 regressed), 261 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\benchmarks.run.windows.x64.checked.mch
 Total bytes of delta: -1084 (-2.59% of base)
 31 total files with Code Size differences (29 improved, 2 regressed), 7 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\coreclr_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -26884 (-0.78% of base)
 374 total files with Code Size differences (356 improved, 18 regressed), 2652 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.crossgen2.windows.x64.checked.mch
 Total bytes of delta: -18966 (-0.83% of base)
 696 total files with Code Size differences (615 improved, 81 regressed), 4834 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries.pmi.windows.x64.checked.mch
 Total bytes of delta: -7902 (-0.62% of base)
 406 total files with Code Size differences (369 improved, 37 regressed), 3466 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x64\libraries_tests.pmi.windows.x64.checked.mch
 Total bytes of delta: -9172 (-1.28% of base)
 565 total files with Code Size differences (454 improved, 111 regressed), 174 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\coreclr_tests.pmi.Linux.arm64.checked.mch
 Total bytes of delta: -9328 (-0.15% of base)
 282 total files with Code Size differences (223 improved, 59 regressed), 752 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.crossgen2.Linux.arm64.checked.mch
 Total bytes of delta: 26804 (1.06% of base)
 3112 total files with Code Size differences (731 improved, 2381 regressed), 970 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.pmi.Linux.arm64.checked.mch
 Total bytes of delta: 36544 (3.81% of base)
 2464 total files with Code Size differences (85 improved, 2379 regressed), 103 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries_tests.pmi.Linux.arm64.checked.mch
 Total bytes of delta: -1332 (-0.35% of base)
 235 total files with Code Size differences (192 improved, 43 regressed), 111 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\benchmarks.run.Linux.x64.checked.mch
 Total bytes of delta: -400 (-1.48% of base)
 27 total files with Code Size differences (26 improved, 1 regressed), 3 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\coreclr_tests.pmi.Linux.x64.checked.mch
 Total bytes of delta: -1749 (-0.06% of base)
 147 total files with Code Size differences (147 improved, 0 regressed), 2632 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries.crossgen2.Linux.x64.checked.mch
 Total bytes of delta: -9922 (-0.60% of base)
 270 total files with Code Size differences (254 improved, 16 regressed), 2788 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries.pmi.Linux.x64.checked.mch
 Total bytes of delta: -3411 (-0.49% of base)
 145 total files with Code Size differences (140 improved, 5 regressed), 1331 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.x64\libraries_tests.pmi.Linux.x64.checked.mch
 Total bytes of delta: -4927 (-1.87% of base)
 233 total files with Code Size differences (218 improved, 15 regressed), 39 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\benchmarks.run.windows.arm64.checked.mch
 Total bytes of delta: -532 (-0.99% of base)
 20 total files with Code Size differences (19 improved, 1 regressed), 26 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\coreclr_tests.pmi.windows.arm64.checked.mch
 Total bytes of delta: -9484 (-0.15% of base)
 291 total files with Code Size differences (231 improved, 60 regressed), 751 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.crossgen2.windows.arm64.checked.mch
 Total bytes of delta: 26372 (1.02% of base)
 3173 total files with Code Size differences (774 improved, 2399 regressed), 978 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries_tests.pmi.windows.arm64.checked.mch
 Total bytes of delta: -1060 (-0.20% of base)
 414 total files with Code Size differences (277 improved, 137 regressed), 133 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\benchmarks.run.windows.x86.checked.mch
 Total bytes of delta: -701 (-2.86% of base)
 20 total files with Code Size differences (20 improved, 0 regressed), 12 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\coreclr_tests.pmi.windows.x86.checked.mch
 Total bytes of delta: -2992 (-0.14% of base)
 404 total files with Code Size differences (404 improved, 0 regressed), 193 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries.crossgen2.windows.x86.checked.mch
 Total bytes of delta: -25537 (-1.59% of base)
 2593 total files with Code Size differences (2361 improved, 232 regressed), 1493 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries.pmi.windows.x86.checked.mch
 Total bytes of delta: -4008 (-0.79% of base)
 116 total files with Code Size differences (114 improved, 2 regressed), 1379 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.x86\libraries_tests.pmi.windows.x86.checked.mch
 Total bytes of delta: -10404 (-1.24% of base)
 315 total files with Code Size differences (313 improved, 2 regressed), 432 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\coreclr_tests.pmi.Linux.arm.checked.mch
 Total bytes of delta: -90 (-0.00% of base)
 14 total files with Code Size differences (14 improved, 0 regressed), 372 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries.crossgen2.Linux.arm.checked.mch
 Total bytes of delta: -1518 (-0.15% of base)
 424 total files with Code Size differences (418 improved, 6 regressed), 1069 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries.pmi.Linux.arm.checked.mch
 Total bytes of delta: -4 (-0.00% of base)
 6 total files with Code Size differences (4 improved, 2 regressed), 44 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm\libraries_tests.pmi.Linux.arm.checked.mch
 Total bytes of delta: -222 (-0.32% of base)
 52 total files with Code Size differences (52 improved, 0 regressed), 29 unchanged.

Diff example

-; Total bytes of code 35, prolog size 7, PerfScore 16.65, instruction count 9
+; Total bytes of code 15, prolog size 3, PerfScore 9.95, instruction count 5

-G_M9563_IG01:        ; func=00, offs=000000H, size=0007H
+G_M9563_IG01:        ; func=00, offs=000000H, size=0003H

-sub      rsp, 24
-vzeroupper 
+vzeroupper 

-G_M9563_IG02:        ; offs=000007H, size=0017H
+G_M9563_IG02:        ; offs=000003H, size=000BH

-vmovupd  xmm0, xmmword ptr [rcx]
-vmovupd  xmmword ptr [V03 rsp+08H], xmm0
-vmovupd  xmm0, xmmword ptr [V03 rsp+08H]
-vmovupd  xmmword ptr [rdx], xmm0
-mov      rax, rdx
+vmovupd  xmm0, xmmword ptr [rcx]
+vmovupd  xmmword ptr [rdx], xmm0
+mov      rax, rdx

-G_M9563_IG03:        ; offs=00001EH, size=0005H
+G_M9563_IG03:        ; offs=00000EH, size=0001H

-add      rsp, 24
ret

Author:	sandreenko
Assignees:	-
Labels:	`area-CodeGen-coreclr`
Milestone:	-

kunalspathak · 2021-09-20T16:13:59Z

@dotnet/jit-contrib

jakobbotsch · 2021-09-20T16:19:00Z

type the IND as int so we do mov rax, [addr] instead of mov al, [addr]. This could be a solution without any additional diffs but I am not sure if this transformation is correct.
We currently use it for nullchecks already since https://github.com/dotnet/runtime/pull/37991/files so NULLCHECK byte addr is currently transformed into NULLCHECK int addr but if the addr is the last byte on a last readable page won't it cause issues?

Related: #58874

However do we ever see null-check nodes on addresses that aren't pointer aligned?

jakobbotsch · 2021-09-20T16:19:36Z

/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs

azure-pipelines · 2021-09-20T16:20:18Z

Azure Pipelines successfully started running 2 pipeline(s).

sandreenko · 2021-09-21T00:34:54Z

However do we ever see null-check nodes on addresses that aren't pointer aligned?

I don't see why we can't do something like:

    static void ByteNullCheck(ref byte bytePointer)
    {
        byte v = 0;
        v = bytePointer;

    }

if roslyn does not optimize it JIT will see:
ASG(v, INDIR(bytePointer)) and will do DeadStoreElimination and replace IND(bytePointer) with NULLCHECK(bytePointer)

I have tried it and it seems working:

    [ 0]   3 (0x003) ldarg.0
    [ 1]   4 (0x004) ldind.u1
    [ 1]   5 (0x005) stloc.0

STMT00003 (IL 0x003...  ???)
               [000008] -A-XG-------              *  ASG       int
               [000007] D------N----              +--*  LCL_VAR   int    V01 loc0
               [000006] *--XG-------              \--*  IND       ubyte
               [000005] ------------                 \--*  LCL_VAR   byref  V00 arg0

    [ 0]   6 (0x006) ret

STMT00004 (IL 0x006...  ???)
               [000009] ------------              *  RETURN    void

looks like we need to stop changing type in TransformUnusedIndirection and in LowerRetStruct and use the flag that I add in this PR to avoid longer instructions on XARCH.
There is a problem though, TransformUnusedIndirection changes types to int also because in some cases the original type is STRUCT and we don't know the real size of it, the user carries this information. We will need to add an additional argument to TransformUnusedIndirection(..., size).

sandreenko · 2021-09-21T00:41:15Z

The failures look unrelated, @kunalspathak could I ask you to open an issue about upper save/restore cause you can add labels and know the area better?

kunalspathak · 2021-09-21T05:48:01Z

jitstressregs/jitstress failures related to #58481

kunalspathak

While the changes seems good to me, the arm64 regression is not small to ignore.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.crossgen2.Linux.arm64.checked.mch
 Total bytes of delta: 26804 (1.06% of base)
 3112 total files with Code Size differences (731 improved, 2381 regressed), 970 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.Linux.arm64\libraries.pmi.Linux.arm64.checked.mch
 Total bytes of delta: 36544 (3.81% of base)
 2464 total files with Code Size differences (85 improved, 2379 regressed), 103 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.crossgen2.windows.arm64.checked.mch
 Total bytes of delta: 26372 (1.02% of base)
 3173 total files with Code Size differences (774 improved, 2399 regressed), 978 unchanged.

Running asm diffs of D:\Sergey\git\runtime\artifacts\spmi\mch\5ed35c58-857b-48dd-a818-7c0136dc9f73.windows.arm64\libraries.pmi.windows.arm64.checked.mch
 Total bytes of delta: 36396 (3.65% of base)
 2508 total files with Code Size differences (109 improved, 2399 regressed), 112 unchanged.

Do you mind sharing the perf score numbers along couple of diff examples? If the arm64 regressions are related to SIMD save/restore, the sensible thing to do would be to fix it first before taking this change.

sandreenko · 2021-09-22T17:20:00Z

Do you mind sharing the perf score numbers along couple of diff examples? If the arm64 regressions are related to SIMD save/restore, the sensible thing to do would be to fix it first before taking this change.

Here are the full diffs including PerfScore and an example of JitDump diff, search for "Active intervals at end of allocation:" in the diff to see the new refPositions
spmi.diffs.IndLclFld.txt
base.txt
diff.txt
.

kunalspathak · 2021-11-01T14:01:42Z

Ping to myself.

JulieLeeMSFT · 2021-11-29T18:13:18Z

Ping @kunalspathak.

kunalspathak · 2022-01-03T14:34:34Z

Hoping to look at it this week.

kunalspathak · 2022-01-06T18:38:34Z

@sandreenko - do you mind rebasing your changes to get the fresh test result? We also have asmdiff pipeline now which will help us see the diffs across various collections/platforms.

sandreenko · 2022-01-10T17:30:04Z

@sandreenko - do you mind rebasing your changes to get the fresh test result? We also have asmdiff pipeline now which will help us see the diffs across various collections/platforms.

Sure, unfortunately, I have just moved to NC and my desktop is still on its way, once I get it and assemble it I will update the diff.

sandreenko · 2022-01-24T13:30:09Z

@sandreenko - do you mind rebasing your changes to get the fresh test result? We also have asmdiff pipeline now which will help us see the diffs across various collections/platforms.

Done. It looks like SuperPmi has unrelated failures right now so I could not repeat the testing.

kunalspathak · 2022-01-28T06:53:32Z

@sandreenko - I want to look into your PR and understand the regressions. Unfortunately, I am busy with #64130 with various errors. Once I wrap that up, I will look into this PR. Thanks!

kunalspathak · 2022-02-21T16:48:03Z

Ping to myself.

sandreenko · 2022-04-25T17:57:14Z

I am back now, I am planning to change this PR to x64/arm32/x86 so there are no regressions. We will enable arm64 when storeuppervector issue is fixed.

sandreenko · 2022-04-30T01:11:51Z

@kunalspathak the diff was updated to exclude arm64.

New diffs:

[20:27:52] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.aspnet.run.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.aspnet.run.windows.x64.checked.1\diff
 [20:28:02] Total bytes of delta: 233 (0.00% of base)
 [20:28:03] 38 total files with Code Size differences (14 improved, 24 regressed), 445 unchanged.

[20:28:09] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.x64.checked.1\diff
 [20:28:10] Total bytes of delta: -968 (-0.01% of base)
 [20:28:10] 32 total files with Code Size differences (27 improved, 5 regressed), 7 unchanged.

[20:29:12] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.x64.checked.1\diff
 [20:30:33] Total bytes of delta: -14898 (-0.01% of base)
 [20:30:37] 214 total files with Code Size differences (186 improved, 28 regressed), 3007 unchanged.

[20:30:53] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.x64.checked.1\diff
 [20:32:30] Total bytes of delta: -31151 (-0.09% of base)
 [20:32:34] 670 total files with Code Size differences (558 improved, 112 regressed), 3522 unchanged.

[20:32:58] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.x64.checked.1\diff
 [20:34:26] Total bytes of delta: -26584 (-0.05% of base)
 [20:34:30] 424 total files with Code Size differences (320 improved, 104 regressed), 3458 unchanged.

[20:35:22] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.x64.checked.1\diff
 [20:35:43] Total bytes of delta: -11866 (-0.01% of base)
 [20:35:45] 722 total files with Code Size differences (515 improved, 207 regressed), 193 unchanged.

[20:36:48] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.arm64.checked.1\diff
 [20:36:51] Total bytes of delta: -504 (-0.00% of base)
 [20:36:52] 100 total files with Code Size differences (100 improved, 0 regressed), 36 unchanged.

[20:37:00] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.arm64.checked.1\diff
 [20:37:02] Total bytes of delta: -76 (-0.00% of base)
 [20:37:03] 15 total files with Code Size differences (15 improved, 0 regressed), 77 unchanged.

[20:37:25] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.arm64.checked.1\diff
 [20:37:28] Total bytes of delta: -36 (-0.00% of base)
 [20:37:29] 7 total files with Code Size differences (7 improved, 0 regressed), 106 unchanged.

[20:38:19] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.arm64.checked.1\diff
 [20:38:22] Total bytes of delta: -300 (-0.00% of base)
 [20:38:23] 52 total files with Code Size differences (52 improved, 0 regressed), 117 unchanged.

[20:38:29] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.Linux.x64.checked.1\diff
 [20:38:42] Total bytes of delta: -567 (-0.00% of base)
 [20:38:43] 29 total files with Code Size differences (26 improved, 3 regressed), 590 unchanged.

[20:39:39] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.x64.checked.1\diff
 [20:40:52] Total bytes of delta: -3391 (-0.00% of base)
 [20:40:56] 180 total files with Code Size differences (169 improved, 11 regressed), 2998 unchanged.

[20:41:01] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.x64.checked.1\diff
 [20:41:03] Total bytes of delta: -2877 (-0.03% of base)
 [20:41:03] 65 total files with Code Size differences (64 improved, 1 regressed), 34 unchanged.

[20:41:26] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.x64.checked.1\diff
 [20:41:58] Total bytes of delta: -1894 (-0.00% of base)
 [20:42:00] 145 total files with Code Size differences (74 improved, 71 regressed), 1339 unchanged.

[20:42:49] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.x64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.x64.checked.1\diff
 [20:42:59] Total bytes of delta: -8360 (-0.01% of base)
 [20:43:00] 315 total files with Code Size differences (275 improved, 40 regressed), 99 unchanged.

[20:43:06] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.arm64.checked.1\diff
 [20:43:07] Total bytes of delta: 0 (0.00% of base)
 [20:43:07] 0 total files with Code Size differences (0 improved, 0 regressed), 23 unchanged.

[20:44:07] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.arm64.checked.1\diff
 [20:44:09] Total bytes of delta: -504 (-0.00% of base)
 [20:44:10] 100 total files with Code Size differences (100 improved, 0 regressed), 36 unchanged.

[20:44:25] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.arm64.checked.1\diff
 [20:44:27] Total bytes of delta: -144 (-0.00% of base)
 [20:44:28] 28 total files with Code Size differences (28 improved, 0 regressed), 88 unchanged.

[20:44:51] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.arm64.checked.1\diff
 [20:44:54] Total bytes of delta: -36 (-0.00% of base)
 [20:44:54] 7 total files with Code Size differences (7 improved, 0 regressed), 113 unchanged.

[20:45:48] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.arm64.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.arm64.checked.1\diff
 [20:45:52] Total bytes of delta: -300 (-0.00% of base)
 [20:45:53] 52 total files with Code Size differences (52 improved, 0 regressed), 126 unchanged.

[20:46:01] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.benchmarks.run.windows.x86.checked.1\diff
 [20:46:02] Total bytes of delta: -667 (-0.01% of base)
 [20:46:03] 22 total files with Code Size differences (19 improved, 3 regressed), 7 unchanged.

[20:47:10] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.windows.x86.checked.1\diff
 [20:47:57] Total bytes of delta: -4096 (-0.00% of base)
 [20:47:58] 296 total files with Code Size differences (290 improved, 6 regressed), 361 unchanged.

[20:48:17] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.windows.x86.checked.1\diff
 [20:49:19] Total bytes of delta: -1744 (-0.01% of base)
 [20:49:22] 140 total files with Code Size differences (103 improved, 37 regressed), 1454 unchanged.

[20:49:56] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.windows.x86.checked.1\diff
 [20:50:57] Total bytes of delta: -3330 (-0.01% of base)
 [20:50:59] 184 total files with Code Size differences (114 improved, 70 regressed), 1359 unchanged.

[20:52:16] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.x86.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.windows.x86.checked.1\diff
 [20:52:51] Total bytes of delta: -12715 (-0.01% of base)
 [20:52:53] 390 total files with Code Size differences (362 improved, 28 regressed), 473 unchanged.

[20:54:15] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.arm.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.coreclr_tests.pmi.Linux.arm.checked.1\diff
 [20:56:08] Total bytes of delta: -528 (-0.00% of base)
 [20:56:10] 167 total files with Code Size differences (158 improved, 9 regressed), 360 unchanged.

[20:56:24] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.arm.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.crossgen2.Linux.arm.checked.1\diff
 [20:56:29] Total bytes of delta: -6 (-0.00% of base)
 [20:56:29] 61 total files with Code Size differences (34 improved, 27 regressed), 31 unchanged.

[20:57:02] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.arm.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries.pmi.Linux.arm.checked.1\diff
 [20:57:06] Total bytes of delta: 288 (0.00% of base)
 [20:57:06] 57 total files with Code Size differences (4 improved, 53 regressed), 17 unchanged.

[20:58:20] Creating dasm files: D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.arm.checked.1\base D:\Sergey\git\runtime\artifacts\spmi\asm.libraries_tests.pmi.Linux.arm.checked.1\diff
 [20:58:25] Total bytes of delta: -248 (-0.00% of base)
 [20:58:26] 108 total files with Code Size differences (66 improved, 42 regressed), 25 unchanged.

the regressions are for functions with many funclet epilogues when we restore callee-save xmm in each of them, also could be improved in LSRA.

Full diffs:

spmi.base.diff.txt

sandreenko · 2022-05-10T13:09:58Z

@kunalspathak could you please review the new diff?

kunalspathak

Looks great. Few questions and it should be ready to go. I did check the diffs (specially those on x64) and most of them are coming because we need more registers and we would save the existing values on stack inside prolog before using them, so that should be fine.

E.g. Windows/x64 asp

38 total files with Code Size differences (14 improved, 24 regressed), 445 unchanged.

Top method regressions (bytes):
          74 (1.32 % of base) : 65671.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          74 (1.32 % of base) : 52754.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          56 (2.45 % of base) : 66005.dasm - <DisposeAsync>d__17:MoveNext():this
          56 (2.45 % of base) : 52980.dasm - <DisposeAsync>d__17:MoveNext():this
          36 (1.24 % of base) : 44251.dasm - <DisposeAsync>d__17:MoveNext():this
          36 (1.24 % of base) : 57988.dasm - <DisposeAsync>d__17:MoveNext():this
          26 (0.94 % of base) : 63133.dasm - <DisposeAsync>d__17:MoveNext():this
          25 (1.90 % of base) : 65969.dasm - RelationalCommandDiagnosticsLogger:CommandReaderExecutedAsync(IRelationalConnection,DbCommand,DbContext,Guid,Guid,DbDataReader,DateTimeOffset,TimeSpan,int,CancellationToken):ValueTask`1:this
          25 (1.90 % of base) : 53137.dasm - RelationalCommandDiagnosticsLogger:CommandReaderExecutedAsync(IRelationalConnection,DbCommand,DbContext,Guid,Guid,DbDataReader,DateTimeOffset,TimeSpan,int,CancellationToken):ValueTask`1:this
          17 (0.24 % of base) : 43711.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          17 (0.24 % of base) : 57590.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          15 (1.28 % of base) : 65917.dasm - RelationalCommandDiagnosticsLogger:CommandReaderExecutingAsync(IRelationalConnection,DbCommand,DbContext,Guid,Guid,DateTimeOffset,int,CancellationToken):ValueTask`1:this
          15 (1.28 % of base) : 53131.dasm - RelationalCommandDiagnosticsLogger:CommandReaderExecutingAsync(IRelationalConnection,DbCommand,DbContext,Guid,Guid,DateTimeOffset,int,CancellationToken):ValueTask`1:this
          14 (0.21 % of base) : 62916.dasm - <ExecuteReaderAsync>d__19:MoveNext():this
          12 (2.07 % of base) : 29218.dasm - KeyRing:.ctor(IKey,IEnumerable`1):this
           5 (0.12 % of base) : 60963.dasm - RBTree`1:RBInsert(int,int,int,int,bool):int:this
           3 (0.35 % of base) : 65731.dasm - RelationalCommandDiagnosticsLogger:CommandCreated(IRelationalConnection,DbCommand,int,DbContext,Guid,Guid,DateTimeOffset,TimeSpan,int):DbCommand:this
           3 (0.35 % of base) : 53129.dasm - RelationalCommandDiagnosticsLogger:CommandCreated(IRelationalConnection,DbCommand,int,DbContext,Guid,Guid,DateTimeOffset,TimeSpan,int):DbCommand:this
           1 (0.88 % of base) : 60985.dasm - RBTree`1:Minimum(int):int:this
           1 (0.76 % of base) : 60996.dasm - RBTree`1:Successor(int):int:this

kunalspathak · 2022-05-10T16:49:19Z

src/coreclr/jit/lower.cpp

+                    // Do it now.
+                    GenTreeIndir* indir = src->AsIndir();
+                    LowerIndir(indir);
+#if defined(TARGET_XARCH)


Is defined(TARGET_XARCH) needed and rest of the code inside !defined(TARGET_ARM64) includes arm as well?

We have GTF_IND_DONT_EXTEND only for x64 because there mov rax, [addr] is smaller than INS_movzx/INS_movsx al, [addr]. On other platforms it is not defined and not needed.

kunalspathak · 2022-05-10T16:50:50Z

src/coreclr/jit/lower.cpp

+                {
+                    if (src->OperIs(GT_OBJ, GT_BLK))
+                    {
+                        src->SetOper(GT_IND);


unless there is an assert somewhere that makes sure that currOper != newOper, just remove the if (src->OperIs(GT_OBJ, GT_BLK)) ?

There is RecordOperBashing calls inside and I remember that I used to have breakpoints there, I would prefer not to call it if it is not neccessary.

src/coreclr/jit/lower.cpp

sandreenko · 2022-05-21T23:11:44Z

ping @kunalspathak

kunalspathak · 2022-05-23T04:47:44Z

/azp run runtime-coreclr superpmi-diffs, runtime-coreclr superpmi-replay

azure-pipelines · 2022-05-23T04:48:00Z

Azure Pipelines successfully started running 2 pipeline(s).

kunalspathak · 2022-05-23T04:48:14Z

ping @kunalspathak

Sorry, I forgot about this. I am rerunning superpmi jobs because they timed out in last run.

kunalspathak · 2022-05-23T13:47:03Z

LGTM. Thank you!

shushanhf · 2022-05-24T11:59:18Z

Hi, @jakobbotsch
There is an error for LA liking

/home/qiao/work_qiao/push_runtime/src/coreclr/jit/lower.cpp:7259:10: error: no member named 'ClearDontExtend' in 'GenTreeIndir'
    ind->ClearDontExtend();
    ~~~  ^
1 error generated.

jakobbotsch · 2022-05-24T12:17:41Z

@shushanhf I assume the ifdef chain there should be

-#ifdef TARGET_ARM64
+#if defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64)
     bool useNullCheck = true;
 #elif TARGET_ARM
     bool           useNullCheck          = false;
 #else  // TARGET_XARCH
     bool useNullCheck = !ind->Addr()->isContained();
     ind->ClearDontExtend();
 #endif // !TARGET_XARCH

Feel free to submit this change with an upcoming PR.

shushanhf · 2022-05-24T12:34:47Z

@shushanhf I assume the ifdef chain there should be

-#ifdef TARGET_ARM64
+#if defined(TARGET_ARM64) || defined(TARGET_LOONGARCH64)
     bool useNullCheck = true;
 #elif TARGET_ARM
     bool           useNullCheck          = false;
 #else  // TARGET_XARCH
     bool useNullCheck = !ind->Addr()->isContained();
     ind->ClearDontExtend();
 #endif // !TARGET_XARCH

Feel free to submit this change with an upcoming PR.

Thanks.
I had pushed by PR #69711

kunalspathak · 2022-06-02T16:28:20Z

Improvements on windows/x64: dotnet/perf-autofiling-issues#5496, dotnet/perf-autofiling-issues#5463

sandreenko commented Sep 19, 2021

View reviewed changes

danmoseley added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Sep 19, 2021

sandreenko marked this pull request as ready for review September 21, 2021 00:36

kunalspathak reviewed Sep 21, 2021

View reviewed changes

JulieLeeMSFT assigned kunalspathak Oct 18, 2021

SingleAccretion mentioned this pull request Jan 12, 2022

Reconfigure the field sequences for statics #62632

Merged

sandreenko force-pushed the enregSupportIndAndLclFldV2 branch 2 times, most recently from 532d2f6 to a653110 Compare January 24, 2022 06:52

JulieLeeMSFT assigned sandreenko Jan 25, 2022

JulieLeeMSFT added this to the 7.0.0 milestone Jan 25, 2022

EgorBo mentioned this pull request Feb 2, 2022

AccessViolationException in MicroBenchmarks.Serializers.Json_FromString<MyEventsListerViewModel>.Jil_ benchmark #64657

Closed

kunalspathak mentioned this pull request Mar 3, 2022

Skip Upper vector save/restore for blocks that THROW #66109

Merged

ghost removed the needs-author-action An issue or pull request that requires more info or actions from the author. label Apr 25, 2022

sandreenko force-pushed the enregSupportIndAndLclFldV2 branch from a653110 to 6897d87 Compare April 30, 2022 00:31

kunalspathak requested changes May 10, 2022

View reviewed changes

ghost added needs-author-action An issue or pull request that requires more info or actions from the author. and removed needs-author-action An issue or pull request that requires more info or actions from the author. labels May 10, 2022

runfoapp bot mentioned this pull request May 12, 2022

jit.hardwareintrinsics.general.notsupported failing recently #69267

Closed

sandreenko added 4 commits May 20, 2022 17:01

Allow StoreLclVar src to be IND/FLD.

287cbcb

disable for arm64

ee9c0c6

use OperIsInitVal

32b99b5

fix format

414f3d0

sandreenko force-pushed the enregSupportIndAndLclFldV2 branch from ba27770 to 414f3d0 Compare May 20, 2022 23:35

kunalspathak approved these changes May 23, 2022

View reviewed changes

kunalspathak merged commit a0ff92c into dotnet:main May 23, 2022

kunalspathak mentioned this pull request Jun 2, 2022

[Perf] Changes at 5/23/2022 1:47:22 PM dotnet/perf-autofiling-issues#5463

Closed

sandreenko deleted the enregSupportIndAndLclFldV2 branch June 3, 2022 03:00

JulieLeeMSFT mentioned this pull request Jun 3, 2022

What's new in .NET 7 Preview 5 [WIP] dotnet/core#7441

Closed

EgorBo mentioned this pull request Jun 28, 2022

[Perf] Changes at 5/23/2022 1:47:22 PM dotnet/perf-autofiling-issues#5611

Closed

ghost locked as resolved and limited conversation to collaborators Jul 3, 2022

	#if FEATURE_PARTIAL_SIMD_CALLEE_SAVE
	else if (currentInterval->isUpperVector)
	{
	// This is a save or restore of the upper half of a large vector lclVar.
	Interval* lclVarInterval = currentInterval->relatedInterval;
	assert(lclVarInterval->isLocalVar);
	if (refType == RefTypeUpperVectorSave)
	{
	if ((lclVarInterval->physReg == REG_NA) \|\|
	(lclVarInterval->isPartiallySpilled && (currentInterval->physReg == REG_STK)))
	{
	allocate = false;
	}
	else
	{
	lclVarInterval->isPartiallySpilled = true;
	}
	}
	else if (refType == RefTypeUpperVectorRestore)
	{
	assert(currentInterval->isUpperVector);
	if (lclVarInterval->isPartiallySpilled)
	{
	lclVarInterval->isPartiallySpilled = false;
	}
	else
	{
	allocate = false;
	}
	}
	}
	else if (refType == RefTypeUpperVectorSave)
	{
	assert(!currentInterval->isLocalVar);
	// Note that this case looks a lot like the case below, but in this case we need to spill
	// at the previous RefPosition.
	// We may want to consider allocating two callee-save registers for this case, but it happens rarely
	// enough that it may not warrant the additional complexity.
	if (assignedRegister != REG_NA)
	{
	unassignPhysReg(getRegisterRecord(assignedRegister), currentInterval->firstRefPosition);
	INDEBUG(dumpLsraAllocationEvent(LSRA_EVENT_NO_REG_ALLOCATED, currentInterval));
	}
	currentRefPosition->registerAssignment = RBM_NONE;
	continue;
	}
	#endif // FEATURE_PARTIAL_SIMD_CALLEE_SAVE

	#if FEATURE_PARTIAL_SIMD_CALLEE_SAVE && defined(TARGET_XARCH)
	// We can also avoid allocating a register (in fact we don't want to) if we have
	// an UpperVectorRestore on xarch where the value is on the stack.
	if ((currentRefPosition->refType == RefTypeUpperVectorRestore) && (currentInterval->physReg == REG_NA))
	{
	assert(currentRefPosition->regOptional);
	allocate = false;
	}
	#endif

Allow StoreLclVar src to be IND/FLD. #59315

Allow StoreLclVar src to be IND/FLD. #59315

Conversation

sandreenko commented Sep 19, 2021 • edited by kunalspathak Loading

sandreenko Sep 19, 2021

Choose a reason for hiding this comment

sandreenko commented Sep 19, 2021

ghost commented Sep 19, 2021

kunalspathak commented Sep 20, 2021

jakobbotsch commented Sep 20, 2021

jakobbotsch commented Sep 20, 2021

azure-pipelines bot commented Sep 20, 2021

sandreenko commented Sep 21, 2021

sandreenko commented Sep 21, 2021

kunalspathak commented Sep 21, 2021

kunalspathak left a comment

Choose a reason for hiding this comment

sandreenko commented Sep 22, 2021

kunalspathak commented Nov 1, 2021

JulieLeeMSFT commented Nov 29, 2021

kunalspathak commented Jan 3, 2022

kunalspathak commented Jan 6, 2022

sandreenko commented Jan 10, 2022

sandreenko commented Jan 24, 2022

kunalspathak commented Jan 28, 2022

kunalspathak commented Feb 21, 2022

sandreenko commented Apr 25, 2022

sandreenko commented Apr 30, 2022 • edited Loading

sandreenko commented May 10, 2022

kunalspathak left a comment

Choose a reason for hiding this comment

kunalspathak May 10, 2022

Choose a reason for hiding this comment

sandreenko May 12, 2022

Choose a reason for hiding this comment

kunalspathak May 10, 2022

Choose a reason for hiding this comment

sandreenko May 12, 2022

Choose a reason for hiding this comment

sandreenko commented May 21, 2022

kunalspathak commented May 23, 2022

azure-pipelines bot commented May 23, 2022

kunalspathak commented May 23, 2022

kunalspathak commented May 23, 2022

shushanhf commented May 24, 2022

jakobbotsch commented May 24, 2022

shushanhf commented May 24, 2022

kunalspathak commented Jun 2, 2022 • edited Loading

sandreenko commented Sep 19, 2021 •

edited by kunalspathak

Loading

sandreenko commented Apr 30, 2022 •

edited

Loading

kunalspathak commented Jun 2, 2022 •

edited

Loading