-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Disable JitDoOldStructRetyping
by default.
#37745
Conversation
bcbe584
to
f57c1eb
Compare
8f51cfa
to
d882b81
Compare
Total bytes of diff: -21971 (-0.07% of base) 3075 total methods with Code Size differences (1589 improved, 1486 regressed), 184523 unchanged. Note: it improves code with retyping as well: 808 total methods with Code Size differences (808 improved, 0 regressed), 186790 unchanged. Found 55 files with textual diffs. Crossgen CodeSize Diffs for System.Private.CoreLib.dll, framework assemblies for default jit Summary of Code Size diffs: (Lower is better) Total bytes of diff: -22923 (-0.07% of base)
…ss taken when possible. Protect against a promoted struct with a hole like struct<8> {hole 4; int a;};
PTAL @CarolEidt @dotnet/jit-contrib |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM - I have some questions, a couple things I think need changing, and mostly minor comment suggestions.
{ | ||
varDsc = m_pCompiler->lvaGetDesc(varDsc->lvFieldLclStart); | ||
} | ||
LclSsaVarDsc* ssaDef = varDsc->GetPerSsaData(ssaNum); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps you could factor this out, e.g. into lvaGetDescForSSA
that takes a GT_LCL_VAR
or a lclNum.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was thinking about it and made a prototype, but would prefer to postpone it if we could to the next PR and have a more precise design discussion there.
src/coreclr/src/jit/lclvars.cpp
Outdated
// If we return `struct A { SIMD16 a; }` we split the struct into several fields. | ||
// In order to do that we have to have its field `a` in memory. Right now lowering cannot | ||
// handle RETURN struct(multiple registers)->SIMD16(one register), but it can be improved. | ||
LclVarDsc* fieldDsc = comp->lvaGetDesc(lvFieldLclStart); | ||
if (fieldDsc->TypeGet() == TYP_SIMD12 || fieldDsc->TypeGet() == TYP_SIMD16) | ||
{ | ||
#if defined(TARGET_ARM64) | ||
return false; | ||
if (!comp->isOpaqueSIMDLclVar(fieldDsc)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that this is backward. If it is opaque, that means that it will be passed in a single register on Arm64 (and on x64/ux after #9578 is fixed). For those cases, we can use a single field, e.g. for struct A { Vector128<T> a; }
However, for struct A {Vector2 a; }
we can't as it will be passed/returned in multiple registers.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was not sure enough in this change so I stepped back and played with it for a while.
As I found out we need this block (reject replacement for a SIMD local field) for:
-Linux x64 WrappedVector3/4;
-arm64 needs it for Vector2/3/4/T and Vector128.
Linux x64 WrappedVector2 does not need it because we return SIMD8
as double
; Vector<T>
does not need this block because we pass it byref.
Without this block for listed cases when we were receiving IR like:
[000016] ------------ * RETURN struct (xmm0, xmm1)
[000015] -------N---- \--* LCL_VAR struct<Vector3Wrapper, 12>(P) V00 loc0
\--* simd12 V00.f1 (offs=0x00) -> V03 tmp1
we were changing it to:
[000016] -----+------ * RETURN struct (xmm0, xmm1)
[000015] -----+-N---- \--* LCL_VAR simd12<System.Numerics.Vector3> V03 tmp1
and we were failing in Lowering::ContainCheckRet
or in codegen because RETURN
and LCL_VAR
had different number of registers (LCL_VAR
- 1, RETURN
> 1).
I was trying to make a special condition for isOpaqueSIMDLclVar
but it did not work well, for example:
struct A { Vector<short> f; }
returns isOpaqueSIMDLclVar
on arm64, but we return it as 2 registers x0, x1. A similar scenario with Vector128
, it is also
returned as x0,x1, so even for isOpaqueSIMDLclVar
we were not returning the wrapping struct as one Vector register.
Edit: Vector128
is returned as 2 registers only with altjit, on bare metal we keep it in one register.
So my current solution is to always block this field promotion for SIMD vars, it won't be a regression in comparasing with oldRetyping
and open an issue to support it better later.
I have added more tests for these scenarios.
Note: I have checked that some of these new tests were failing with old retyping before, meaning that we have not seen these patterns often.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Arm64, vectors are passed in a single register, and it sounds like we need to fix the altjit to do the right thing.
my current solution is to always block this field promotion for SIMD vars, it won't be a regression in comparasing with oldRetyping and open an issue to support it better later.
That seems reasonable, though I hope that we can address that issue in the near term.
The pr is ready for another round, the failures are unrelated. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for adding the additional tests!
Commits:
f432437: Disable retyping by default.
06e7c52: Keep block init/copy as baseline.
f53bb30: Don't mark LCL_VAR that is used in RETURN(IND(ADDR(LCL_VAR)) as address taken when possible.
It is a beginning for #11413, we fold (IND(ADDR()) and say that the user (in this case Return) has necessary information to do the case in lowering, improves
return Unsafe.As<long>(double)
etc.3965e9d: Replace 1-field structs with the field for returns.
It is needed for copy propagation optimizations, like when we do
ASG(LCL_FLD long V01, 1); return LCL_VAR V01
where V01 has only one field and we want to propagate1
to the return.f7a90a3: Add SSA support.
That is particular SSA/CSE support for
ASG struct(LCL_VAR struct V01(with 1 promoted field), call struct)
that fixes most regressions. I do not like that commit, it doesn't look solid and it does not support VN for thatLCL_VAR struct
, because we have a check that tree type is enregistarable. However, it solves the problem, passes the tests and the rest could be done later, maybe after we design a general solution for multireg SSA nodes.Framework libraries
Runtime benchmarks
Improved benchmarks: SIMD\ConsoleMandel\ConsoleMandel\ConsoleMandel, SciMark\SciMark\SciMark.
Also, I was running dotnet/performance benchmarks to catch regressions, but have not seen any(there was a lot of noise in +-5% but without code size diffs), will check CI run after it is merged.
SIMD.ConsoleMandel:ScalarFloatSingleThreadADT: -77% (as expected);
Regressions analysis
In short: nothing unexpected for x64 Crossgen so far, main issues are #8016, #11413. I am analyzing x64 PMI and arm64 crossgen diffs right now, but it is a long process, I will update the list once it is done.
This change affects many issues, the main one it fixes #1231, the rest I will check and close after it goes in.