Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Disable JitDoOldStructRetyping by default. #37745

Merged
merged 10 commits into from
Jul 7, 2020

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Jun 11, 2020

Commits:
f432437: Disable retyping by default.

06e7c52: Keep block init/copy as baseline.

f53bb30: Don't mark LCL_VAR that is used in RETURN(IND(ADDR(LCL_VAR)) as address taken when possible.
It is a beginning for #11413, we fold (IND(ADDR()) and say that the user (in this case Return) has necessary information to do the case in lowering, improves return Unsafe.As<long>(double) etc.

3965e9d: Replace 1-field structs with the field for returns.
It is needed for copy propagation optimizations, like when we do ASG(LCL_FLD long V01, 1); return LCL_VAR V01 where V01 has only one field and we want to propagate 1 to the return.

f7a90a3: Add SSA support.
That is particular SSA/CSE support for ASG struct(LCL_VAR struct V01(with 1 promoted field), call struct) that fixes most regressions. I do not like that commit, it doesn't look solid and it does not support VN for that LCL_VAR struct , because we have a check that tree type is enregistarable. However, it solves the problem, passes the tests and the rest could be done later, maybe after we design a general solution for multireg SSA nodes.

Framework libraries

  Diff, bytes Diff, % improved regressed
X64 windows, crossgen -6764 -0.02 862 323
X64 windows, PMI -11591 -0.02 1401 655
X64 Linux, crossgen -6205 -0.01 816 310
X64 Linux, PMI -5970 -0.01 1259 600
ARM64 Windows, crossgen 1076 0 603 317
ARM64 Linux, PMI -2388 0 1026 756
X86 windows, crossgen -1865 -0.01 502 250
X86 windows, PMI -2170 0 949 614
X86 Linux, crossgen -2719 0 504 239
X6 Linux, PMI -1268 0 940 621
ARM Windows, crossgen -286 0 443 955
ARM Linux, PMI -1262 0 680 689

Runtime benchmarks

  Diff, bytes Diff, % improved regressed
X64 windows, crossgen -212 -0.06 5 1(more phi resolution moves from register to memory, instead of keeping it in memory all the time)
X64 windows, PMI -248 -0.05 3 0
X64 Linux, crossgen -188 -0.02 5 1
X64 Linux, PMI -149 -0.03 2 1
ARM64 Windows, crossgen -120 -0.01 4 0
ARM64 Linux, PMI -44 -0.01 1 0

Improved benchmarks: SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel, SciMark\SciMark\SciMark.

Also, I was running dotnet/performance benchmarks to catch regressions, but have not seen any(there was a lot of noise in +-5% but without code size diffs), will check CI run after it is merged.
SIMD.ConsoleMandel:ScalarFloatSingleThreadADT: -77% (as expected);

Regressions analysis

In short: nothing unexpected for x64 Crossgen so far, main issues are #8016, #11413. I am analyzing x64 PMI and arm64 crossgen diffs right now, but it is a long process, I will update the list once it is done.

136 (13.49% of base) : Microsoft.CodeAnalysis.CSharp.dasm - UnopEasyOut:TypeToIndex(TypeSymbol):Nullable`1
133 (13.12% of base) : Microsoft.CodeAnalysis.CSharp.dasm - BinopEasyOut:TypeToIndex(TypeSymbol):Nullable`1
68 ( 3.82% of base) : Microsoft.CodeAnalysis.dasm - SyntaxDiffer:GetNextAction():DiffAction:this
35 (34.31% of base) : System.Private.DataContractSerialization.dasm - CodeGenerator:GetBranchCode(int):OpCode:this
34 ( 9.60% of base) : Microsoft.Extensions.DependencyInjection.dasm - CallSiteVisitor`2:VisitCallSiteMain(ServiceCallSite,__Canon):ILEmitCallSiteAnalysisResult:this
31 (13.78% of base) : Newtonsoft.Json.dasm - JsonValidatingReader:GetCurrentNodeSchemaType():Nullable`1:this
29 ( 9.03% of base) : Microsoft.Extensions.DependencyInjection.dasm - CallSiteVisitor`2:VisitCallSite(ServiceCallSite,__Canon):ILEmitCallSiteAnalysisResult:this
	can't enreg merged return struct with >1 field, https://github.com/dotnet/runtime/issues/8016
	
48 ( 5.87% of base) : Microsoft.CodeAnalysis.CSharp.dasm - TypeSymbol:ReportAnyMismatchedConstraints(MethodSymbol,TypeSymbol,MethodSymbol,DiagnosticBag)
	have to spill new enreg variables before calls, not a CQ regression.
	
	
35 ( 4.08% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MemberSignatureComparer:HaveSameParameterTypes(ImmutableArray`1,TypeMap,ImmutableArray`1,TypeMap,bool,bool,bool):bool
32 ( 5.25% of base) : Microsoft.CodeAnalysis.CSharp.dasm - MemberSignatureComparer:HaveSameReturnTypes(Symbol,TypeMap,Symbol,TypeMap,bool,bool):bool
84 ( 6.70% of base) : System.Security.Cryptography.X509Certificates.dasm - CertificatePal:CopyWithPrivateKey(RSA):ICertificatePal:this
32 ( 6.57% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - MethodSignatureComparer:HaveSameReturnTypes(MethodSymbol,TypeSubstitution,MethodSymbol,TypeSubstitution,bool):bool
28 ( 1.88% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - RetargetingSymbolTranslator:Retarget(NamedTypeSymbol,ubyte):NamedTypeSymbol:this
	mark as don't enreg after `OBJ<struct, 8>(ADDR(LCL_VAR ref/another struct<8>,long))` cast, https://github.com/dotnet/runtime/issues/11413

85 ( 6.31% of base) : xunit.console.dasm - ConsoleRunner:EntryPoint(ref):int:this
31 ( 1.52% of base) : System.Text.Json.dasm - JsonDocument:TryParseValue(byref,byref,bool):bool
	promote more fields, so getting more instructions to load individual fields on registers from memory, not a 100% CQ regression, but can be improved.

66 ( 5.96% of base) : Microsoft.Extensions.DependencyModel.dasm - DependencyContextJsonReader:ReadCompilationOptions(byref):CompilationOptions
	we decide to promote a local var, but it is assign as `ASG(LCL_VAR, call)`, so we have to forbid independent promotion.

56 ( 1.21% of base) : Microsoft.CodeAnalysis.VisualBasic.dasm - Binder:BindCollectionRangeVariables(QueryClauseSyntax,BoundQueryClauseBase,SeparatedSyntaxList`1,byref,DiagnosticBag):BoundQueryClauseBase:this
	can't do an assertion prop after `OBJ<struct, 8>(ADDR(LCL_VAR ref/another struct<8>,long))` cast, https://github.com/dotnet/runtime/issues/11413


50 ( 1.25% of base) : Microsoft.CodeAnalysis.dasm - CommonCompiler:RunCore(TextWriter,ErrorLogger,CancellationToken):int:this
STORE_OBJ for a ref type generates worse code(does not try to create LEA), https://github.com/dotnet/runtime/issues/38538

31 ( 1.47% of base) : System.Text.Json.dasm - JsonSerializer:ReadValueCore(JsonSerializerOptions,byref,byref):__Canon
	allow to enreg two additional fields that go to r14 and t15, but have to push them in each FunclitProlog and pop in each epilogue, so 14 additional instructions of push/pop for 4 moves that could use registers(these fields have 2 uses each, 1 def, 1 use), sound like we should keep them in memory in this case, @carol?

27 ( 4.37% of base) : System.Reflection.Metadata.dasm - PEReader:TryOpenAssociatedPortablePdb(String,Func`2,byref,byref):bool:this
don't do a copy prop, need a better support for VN for asg(LCL_VAR struct(1 promoted field), call).

This change affects many issues, the main one it fixes #1231, the rest I will check and close after it goes in.

Sergey Andreenko added 5 commits June 28, 2020 16:59
Total bytes of diff: -21971 (-0.07% of base)
3075 total methods with Code Size differences (1589 improved, 1486 regressed), 184523 unchanged.

Note: it improves code with retyping as well:
808 total methods with Code Size differences (808 improved, 0 regressed), 186790 unchanged.
Found 55 files with textual diffs.
Crossgen CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for  default jit
Summary of Code Size diffs:
(Lower is better)
Total bytes of diff: -22923 (-0.07% of base)
…ss taken when possible.

Protect against a promoted struct with a hole like struct<8> {hole 4; int a;};
@sandreenko sandreenko marked this pull request as ready for review June 29, 2020 08:52
@sandreenko
Copy link
Contributor Author

PTAL @CarolEidt @dotnet/jit-contrib
I keep analyzing diffs, but I think it is ready for the first round of review.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall LGTM - I have some questions, a couple things I think need changing, and mostly minor comment suggestions.

src/coreclr/src/jit/codegenarmarch.cpp Show resolved Hide resolved
src/coreclr/src/jit/flowgraph.cpp Show resolved Hide resolved
src/coreclr/src/jit/lclvars.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/lower.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/lower.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/morph.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/lower.cpp Outdated Show resolved Hide resolved
src/coreclr/src/jit/lower.cpp Outdated Show resolved Hide resolved
{
varDsc = m_pCompiler->lvaGetDesc(varDsc->lvFieldLclStart);
}
LclSsaVarDsc* ssaDef = varDsc->GetPerSsaData(ssaNum);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you could factor this out, e.g. into lvaGetDescForSSA that takes a GT_LCL_VAR or a lclNum.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was thinking about it and made a prototype, but would prefer to postpone it if we could to the next PR and have a more precise design discussion there.

// If we return `struct A { SIMD16 a; }` we split the struct into several fields.
// In order to do that we have to have its field `a` in memory. Right now lowering cannot
// handle RETURN struct(multiple registers)->SIMD16(one register), but it can be improved.
LclVarDsc* fieldDsc = comp->lvaGetDesc(lvFieldLclStart);
if (fieldDsc->TypeGet() == TYP_SIMD12 || fieldDsc->TypeGet() == TYP_SIMD16)
{
#if defined(TARGET_ARM64)
return false;
if (!comp->isOpaqueSIMDLclVar(fieldDsc))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that this is backward. If it is opaque, that means that it will be passed in a single register on Arm64 (and on x64/ux after #9578 is fixed). For those cases, we can use a single field, e.g. for struct A { Vector128<T> a; } However, for struct A {Vector2 a; } we can't as it will be passed/returned in multiple registers.

Copy link
Contributor Author

@sandreenko sandreenko Jul 7, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was not sure enough in this change so I stepped back and played with it for a while.

As I found out we need this block (reject replacement for a SIMD local field) for:
-Linux x64 WrappedVector3/4;
-arm64 needs it for Vector2/3/4/T and Vector128.

Linux x64 WrappedVector2 does not need it because we return SIMD8 as double; Vector<T> does not need this block because we pass it byref.

Without this block for listed cases when we were receiving IR like:

               [000016] ------------              *  RETURN    struct (xmm0, xmm1)
               [000015] -------N----              \--*  LCL_VAR   struct<Vector3Wrapper, 12>(P) V00 loc0
                                                  \--*    simd12 V00.f1 (offs=0x00) -> V03 tmp1											  

we were changing it to:

               [000016] -----+------              *  RETURN    struct (xmm0, xmm1)
               [000015] -----+-N----              \--*  LCL_VAR   simd12<System.Numerics.Vector3> V03 tmp1

and we were failing in Lowering::ContainCheckRet or in codegen because RETURN and LCL_VAR had different number of registers (LCL_VAR - 1, RETURN > 1).

I was trying to make a special condition for isOpaqueSIMDLclVar but it did not work well, for example:
struct A { Vector<short> f; } returns isOpaqueSIMDLclVar on arm64, but we return it as 2 registers x0, x1. A similar scenario with Vector128, it is also
returned as x0,x1, so even for isOpaqueSIMDLclVar we were not returning the wrapping struct as one Vector register.

Edit: Vector128 is returned as 2 registers only with altjit, on bare metal we keep it in one register.

So my current solution is to always block this field promotion for SIMD vars, it won't be a regression in comparasing with oldRetyping and open an issue to support it better later.

I have added more tests for these scenarios.

Note: I have checked that some of these new tests were failing with old retyping before, meaning that we have not seen these patterns often.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Arm64, vectors are passed in a single register, and it sounds like we need to fix the altjit to do the right thing.

my current solution is to always block this field promotion for SIMD vars, it won't be a regression in comparasing with oldRetyping and open an issue to support it better later.

That seems reasonable, though I hope that we can address that issue in the near term.

@sandreenko
Copy link
Contributor Author

The pr is ready for another round, the failures are unrelated.

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - thanks for adding the additional tests!

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI optimization
Projects
None yet
Development

Successfully merging this pull request may close these issues.

First Class Structs: stop lying about underlying type.
2 participants