Disable `JitDoOldStructRetyping` by default. #37745

sandreenko · 2020-06-11T11:37:24Z

Commits:
f432437: Disable retyping by default.

06e7c52: Keep block init/copy as baseline.

f53bb30: Don't mark LCL_VAR that is used in RETURN(IND(ADDR(LCL_VAR)) as address taken when possible.
It is a beginning for #11413, we fold (IND(ADDR()) and say that the user (in this case Return) has necessary information to do the case in lowering, improves return Unsafe.As<long>(double) etc.

3965e9d: Replace 1-field structs with the field for returns.
It is needed for copy propagation optimizations, like when we do ASG(LCL_FLD long V01, 1); return LCL_VAR V01 where V01 has only one field and we want to propagate 1 to the return.

f7a90a3: Add SSA support.
That is particular SSA/CSE support for ASG struct(LCL_VAR struct V01(with 1 promoted field), call struct) that fixes most regressions. I do not like that commit, it doesn't look solid and it does not support VN for that LCL_VAR struct , because we have a check that tree type is enregistarable. However, it solves the problem, passes the tests and the rest could be done later, maybe after we design a general solution for multireg SSA nodes.

Framework libraries

	Diff, bytes	Diff, %	improved	regressed
X64 windows, crossgen	-6764	-0.02	862	323
X64 windows, PMI	-11591	-0.02	1401	655
X64 Linux, crossgen	-6205	-0.01	816	310
X64 Linux, PMI	-5970	-0.01	1259	600
ARM64 Windows, crossgen	1076	0	603	317
ARM64 Linux, PMI	-2388	0	1026	756
X86 windows, crossgen	-1865	-0.01	502	250
X86 windows, PMI	-2170	0	949	614
X86 Linux, crossgen	-2719	0	504	239
X6 Linux, PMI	-1268	0	940	621
ARM Windows, crossgen	-286	0	443	955
ARM Linux, PMI	-1262	0	680	689

Runtime benchmarks

	Diff, bytes	Diff, %	improved	regressed
X64 windows, crossgen	-212	-0.06	5	1(more phi resolution moves from register to memory, instead of keeping it in memory all the time)
X64 windows, PMI	-248	-0.05	3	0
X64 Linux, crossgen	-188	-0.02	5	1
X64 Linux, PMI	-149	-0.03	2	1
ARM64 Windows, crossgen	-120	-0.01	4	0
ARM64 Linux, PMI	-44	-0.01	1	0

Improved benchmarks: SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel, SciMark\SciMark\SciMark.

Also, I was running dotnet/performance benchmarks to catch regressions, but have not seen any(there was a lot of noise in +-5% but without code size diffs), will check CI run after it is merged.
SIMD.ConsoleMandel:ScalarFloatSingleThreadADT: -77% (as expected);

Regressions analysis

In short: nothing unexpected for x64 Crossgen so far, main issues are #8016, #11413. I am analyzing x64 PMI and arm64 crossgen diffs right now, but it is a long process, I will update the list once it is done.

136 (13.49% of base) : Microsoft.CodeAnalysis.CSharp.dasm - UnopEasyOut:TypeToIndex(TypeSymbol):Nullable`1
133 (13.12% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinopEasyOut:TypeToIndex(TypeSymbol):Nullable`1
68 ( 3.82% of base) : Microsoft.CodeAnalysis.dasm - SyntaxDiffer:GetNextAction():DiffAction:this
35 (34.31% of base) : System.Private.DataContractSerialization.dasm - CodeGenerator:GetBranchCode(int):OpCode:this
34 ( 9.60% of base) : Microsoft.Extensions.DependencyInjection.dasm - CallSiteVisitor`2:VisitCallSiteMain(ServiceCallSite,__Canon):ILEmitCallSiteAnalysisResult:this
31 (13.78% of base) : Newtonsoft.Json.dasm - JsonValidatingReader:GetCurrentNodeSchemaType():Nullable`1:this
29 ( 9.03% of base) : Microsoft.Extensions.DependencyInjection.dasm - CallSiteVisitor`2:VisitCallSite(ServiceCallSite,__Canon):ILEmitCallSiteAnalysisResult:this
	can't enreg merged return struct with >1 field, https://github.com/dotnet/runtime/issues/8016
	
48 ( 5.87% of base) : Microsoft.CodeAnalysis.CSharp.dasm - TypeSymbol:ReportAnyMismatchedConstraints(MethodSymbol,TypeSymbol,MethodSymbol,DiagnosticBag)
	have to spill new enreg variables before calls, not a CQ regression.
	
	
35 ( 4.08% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MemberSignatureComparer:HaveSameParameterTypes(ImmutableArray`1,TypeMap,ImmutableArray`1,TypeMap,bool,bool,bool):bool
32 ( 5.25% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MemberSignatureComparer:HaveSameReturnTypes(Symbol,TypeMap,Symbol,TypeMap,bool,bool):bool
84 ( 6.70% of base) : System.Security.Cryptography.X509Certificates.dasm - CertificatePal:CopyWithPrivateKey(RSA):ICertificatePal:this
32 ( 6.57% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodSignatureComparer:HaveSameReturnTypes(MethodSymbol,TypeSubstitution,MethodSymbol,TypeSubstitution,bool):bool
28 ( 1.88% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - RetargetingSymbolTranslator:Retarget(NamedTypeSymbol,ubyte):NamedTypeSymbol:this
	mark as don't enreg after `OBJ<struct, 8>(ADDR(LCL_VAR ref/another struct<8>,long))` cast, https://github.com/dotnet/runtime/issues/11413

85 ( 6.31% of base) : xunit.console.dasm - ConsoleRunner:EntryPoint(ref):int:this
31 ( 1.52% of base) : System.Text.Json.dasm - JsonDocument:TryParseValue(byref,byref,bool):bool
	promote more fields, so getting more instructions to load individual fields on registers from memory, not a 100% CQ regression, but can be improved.

66 ( 5.96% of base) : Microsoft.Extensions.DependencyModel.dasm - DependencyContextJsonReader:ReadCompilationOptions(byref):CompilationOptions
	we decide to promote a local var, but it is assign as `ASG(LCL_VAR, call)`, so we have to forbid independent promotion.

56 ( 1.21% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:BindCollectionRangeVariables(QueryClauseSyntax,BoundQueryClauseBase,SeparatedSyntaxList`1,byref,DiagnosticBag):BoundQueryClauseBase:this
	can't do an assertion prop after `OBJ<struct, 8>(ADDR(LCL_VAR ref/another struct<8>,long))` cast, https://github.com/dotnet/runtime/issues/11413


50 ( 1.25% of base) : Microsoft.CodeAnalysis.dasm - CommonCompiler:RunCore(TextWriter,ErrorLogger,CancellationToken):int:this
STORE_OBJ for a ref type generates worse code(does not try to create LEA), https://github.com/dotnet/runtime/issues/38538

31 ( 1.47% of base) : System.Text.Json.dasm - JsonSerializer:ReadValueCore(JsonSerializerOptions,byref,byref):__Canon
	allow to enreg two additional fields that go to r14 and t15, but have to push them in each FunclitProlog and pop in each epilogue, so 14 additional instructions of push/pop for 4 moves that could use registers(these fields have 2 uses each, 1 def, 1 use), sound like we should keep them in memory in this case, @carol?

27 ( 4.37% of base) : System.Reflection.Metadata.dasm - PEReader:TryOpenAssociatedPortablePdb(String,Func`2,byref,byref):bool:this
don't do a copy prop, need a better support for VN for asg(LCL_VAR struct(1 promoted field), call).

This change affects many issues, the main one it fixes #1231, the rest I will check and close after it goes in.

Total bytes of diff: -21971 (-0.07% of base) 3075 total methods with Code Size differences (1589 improved, 1486 regressed), 184523 unchanged. Note: it improves code with retyping as well: 808 total methods with Code Size differences (808 improved, 0 regressed), 186790 unchanged. Found 55 files with textual diffs. Crossgen CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for default jit Summary of Code Size diffs: (Lower is better) Total bytes of diff: -22923 (-0.07% of base)

…ss taken when possible. Protect against a promoted struct with a hole like struct<8> {hole 4; int a;};

sandreenko · 2020-06-30T22:36:35Z

PTAL @CarolEidt @dotnet/jit-contrib
I keep analyzing diffs, but I think it is ready for the first round of review.

CarolEidt

Overall LGTM - I have some questions, a couple things I think need changing, and mostly minor comment suggestions.

src/coreclr/src/jit/codegenarmarch.cpp

src/coreclr/src/jit/flowgraph.cpp

src/coreclr/src/jit/lclvars.cpp

src/coreclr/src/jit/lower.cpp

src/coreclr/src/jit/morph.cpp

src/coreclr/src/jit/lower.cpp

CarolEidt · 2020-07-01T21:33:37Z

src/coreclr/src/jit/rangecheck.cpp

+    {
+        varDsc = m_pCompiler->lvaGetDesc(varDsc->lvFieldLclStart);
+    }
+    LclSsaVarDsc* ssaDef = varDsc->GetPerSsaData(ssaNum);


Perhaps you could factor this out, e.g. into lvaGetDescForSSA that takes a GT_LCL_VAR or a lclNum.

I was thinking about it and made a prototype, but would prefer to postpone it if we could to the next PR and have a more precise design discussion there.

CarolEidt · 2020-07-06T14:46:16Z

src/coreclr/src/jit/lclvars.cpp

    // If we return `struct A { SIMD16 a; }` we split the struct into several fields.
    // In order to do that we have to have its field `a` in memory. Right now lowering cannot
    // handle RETURN struct(multiple registers)->SIMD16(one register), but it can be improved.
    LclVarDsc* fieldDsc = comp->lvaGetDesc(lvFieldLclStart);
    if (fieldDsc->TypeGet() == TYP_SIMD12 || fieldDsc->TypeGet() == TYP_SIMD16)
    {
 #if defined(TARGET_ARM64)
-        return false;
+        if (!comp->isOpaqueSIMDLclVar(fieldDsc))


I think that this is backward. If it is opaque, that means that it will be passed in a single register on Arm64 (and on x64/ux after #9578 is fixed). For those cases, we can use a single field, e.g. for struct A { Vector128<T> a; } However, for struct A {Vector2 a; } we can't as it will be passed/returned in multiple registers.

I was not sure enough in this change so I stepped back and played with it for a while.

As I found out we need this block (reject replacement for a SIMD local field) for:
-Linux x64 WrappedVector3/4;
-arm64 needs it for Vector2/3/4/T and Vector128.

Linux x64 WrappedVector2 does not need it because we return SIMD8 as double; Vector<T> does not need this block because we pass it byref.

Without this block for listed cases when we were receiving IR like:

[000016] ------------ * RETURN struct (xmm0, xmm1) [000015] -------N---- \--* LCL_VAR struct<Vector3Wrapper, 12>(P) V00 loc0 \--* simd12 V00.f1 (offs=0x00) -> V03 tmp1

we were changing it to:

[000016] -----+------ * RETURN struct (xmm0, xmm1) [000015] -----+-N---- \--* LCL_VAR simd12<System.Numerics.Vector3> V03 tmp1

and we were failing in Lowering::ContainCheckRet or in codegen because RETURN and LCL_VAR had different number of registers (LCL_VAR - 1, RETURN > 1).

I was trying to make a special condition for isOpaqueSIMDLclVar but it did not work well, for example:
struct A { Vector<short> f; } returns isOpaqueSIMDLclVar on arm64, but we return it as 2 registers x0, x1. A similar scenario with Vector128, it is also
returned as x0,x1, so even for isOpaqueSIMDLclVar we were not returning the wrapping struct as one Vector register.

Edit: Vector128 is returned as 2 registers only with altjit, on bare metal we keep it in one register.

So my current solution is to always block this field promotion for SIMD vars, it won't be a regression in comparasing with oldRetyping and open an issue to support it better later.

I have added more tests for these scenarios.

Note: I have checked that some of these new tests were failing with old retyping before, meaning that we have not seen these patterns often.

On Arm64, vectors are passed in a single register, and it sounds like we need to fix the altjit to do the right thing.

my current solution is to always block this field promotion for SIMD vars, it won't be a regression in comparasing with oldRetyping and open an issue to support it better later.

That seems reasonable, though I hope that we can address that issue in the near term.

sandreenko · 2020-07-07T20:03:34Z

The pr is ready for another round, the failures are unrelated.

CarolEidt

LGTM - thanks for adding the additional tests!

sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Jun 11, 2020

sandreenko force-pushed the noRetyping branch from f5dd380 to 22c3a1b Compare June 11, 2020 11:57

jaredpar mentioned this pull request Jun 11, 2020

OSX machines are de-provisioned during CI / PR runs leading to failures #34472

Closed

sandreenko force-pushed the noRetyping branch 2 times, most recently from bcbe584 to f57c1eb Compare June 15, 2020 12:47

sandreenko mentioned this pull request Jun 17, 2020

[RyuJIT] Fully enregister structs that fit into a single register when profitable #8016

Closed

sandreenko force-pushed the noRetyping branch from f57c1eb to 1dbcae3 Compare June 18, 2020 12:49

sandreenko added the optimization label Jun 18, 2020

sandreenko force-pushed the noRetyping branch 4 times, most recently from 8f51cfa to d882b81 Compare June 21, 2020 09:19

jaredpar mentioned this pull request Jun 23, 2020

PayloadGroup0 is timing out #38284

Closed

Sergey Andreenko added 5 commits June 28, 2020 16:59

Disable retyping by default.

f432437

Don't mark LCL_VAR that is used in RETURN(IND(ADDR(LCL_VAR)) as addre…

f53bb30

…ss taken when possible. Protect against a promoted struct with a hole like struct<8> {hole 4; int a;};

Replace 1-field structs with the field for returns.

3965e9d

Add SSA support.

f7a90a3

sandreenko force-pushed the noRetyping branch from 85218f0 to f7a90a3 Compare June 29, 2020 04:31

sandreenko marked this pull request as ready for review June 29, 2020 08:52

CarolEidt suggested changes Jul 1, 2020

View reviewed changes

Sergey Andreenko added 2 commits July 2, 2020 02:59

Review response.

8911b08

isOpaqueSIMDLclVar fix

0d8cf17

CarolEidt reviewed Jul 6, 2020

View reviewed changes

Sergey Andreenko added 3 commits July 6, 2020 19:14

Add tests for structs with independently promoted SIMD fields.

c16fc9f

Old retyping fix.

e0fbd5f

Don't try to replace SIMD fields.

9daf9c1

sandreenko mentioned this pull request Jul 7, 2020

Test failure: JIT\\Directed\\StructABI\\structreturn\\structreturn.cmd #37880

Closed

CarolEidt approved these changes Jul 7, 2020

View reviewed changes

sandreenko merged commit 641161c into dotnet:master Jul 7, 2020

sandreenko deleted the noRetyping branch July 7, 2020 21:56

sandreenko mentioned this pull request Jul 9, 2020

ARM64 altjit uses wrong return type for a wrapped Vector128. #38980

Open

sandreenko mentioned this pull request Jul 18, 2020

Pmi run on some test dll-s hits some jit asserts. #39556

Closed

This was referenced Aug 24, 2020

Clean-up VN for promoted fields assigned using the whole parent. #41242

Closed

Updating genCodeForBinary to be VEX aware #1344

Merged

PathogenDavid mentioned this pull request Nov 25, 2020

Workaround unavoidable marshaling on function pointers MochiLibraries/Biohazrd#99

Closed

ghost locked as resolved and limited conversation to collaborators Dec 8, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Disable `JitDoOldStructRetyping` by default. #37745

Disable `JitDoOldStructRetyping` by default. #37745

sandreenko commented Jun 11, 2020 •

edited

Loading

sandreenko commented Jun 30, 2020

CarolEidt left a comment

CarolEidt Jul 1, 2020

sandreenko Jul 2, 2020

CarolEidt Jul 6, 2020

sandreenko Jul 7, 2020 •

edited

Loading

CarolEidt Jul 7, 2020

sandreenko commented Jul 7, 2020

CarolEidt left a comment

Disable JitDoOldStructRetyping by default. #37745

Disable JitDoOldStructRetyping by default. #37745

Conversation

sandreenko commented Jun 11, 2020 • edited Loading

sandreenko commented Jun 30, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

CarolEidt Jul 1, 2020

Choose a reason for hiding this comment

sandreenko Jul 2, 2020

Choose a reason for hiding this comment

CarolEidt Jul 6, 2020

Choose a reason for hiding this comment

sandreenko Jul 7, 2020 • edited Loading

Choose a reason for hiding this comment

CarolEidt Jul 7, 2020

Choose a reason for hiding this comment

sandreenko commented Jul 7, 2020

CarolEidt left a comment

Choose a reason for hiding this comment

Disable `JitDoOldStructRetyping` by default. #37745

Disable `JitDoOldStructRetyping` by default. #37745

sandreenko commented Jun 11, 2020 •

edited

Loading

sandreenko Jul 7, 2020 •

edited

Loading