-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Struct Improvements #35544
Struct Improvements #35544
Conversation
CarolEidt
commented
Apr 27, 2020
- Allow expanding SIMD12 field to 16 bytes if it is the only field.
- Similarly, if we have a struct with a single field we can reference it as the full size of the struct.
- Eliminate old code for an integer zero RHS with a SIMD12 LHS, and unify the codegen for zero-init for SIMD12 (between block init and assignment of SIMD init of 0)
inline bool GenTree::IsSIMDZero() | ||
{ | ||
#ifdef FEATURE_SIMD | ||
if ((gtOper == GT_SIMD) && (AsSIMD()->gtSIMDIntrinsicID == SIMDIntrinsicInit)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there any need/desire to recognize SIMDIntrinsicGetZero
or the equivalent HWIntrinsic
kinds?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SIMDIntrinsicGetZero
becomes SIMDIntrinsicInit
so that's covered, but we may want to consider the HWIntrinsic
kinds if/when we unify their handling in codegen.
@dotnet/jit-contrib PTAL - this fixes some code size regressions I encountered in promoting multireg structs that later didn't get enregistered. |
The asm diffs from a superpmi run of the tests & frameworks: |
src/coreclr/src/jit/compiler.h
Outdated
// However, we don't know this until late, so we may have already pretended the field is bigger | ||
// before that. | ||
if ((varDsc->lvSize() == 16) && !lvaIsFieldOfDependentlyPromotedStruct(varDsc)) | ||
if ((varDsc->lvSize() == 16) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we check that parent exact size is 16? To protect from cases like:
[StructLayout(LayoutKind.Explicit, Pack = 1)]
public struct A {
[StructLayout(LayoutKind.Explicit, Pack = 1)]
public struct B {
SIMD12 c; // can't be written as 16 bytes.
}
}
Right now we won't generate a LCL_FLD
node to access c
, but it could change soon.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should that be SIMD12 c;
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Have you verified that a struct such as that will occupy only 12 bytes on the stack? I would be surprised, as I would expect it to always get 16. In any case, it is an inexpensive check to add.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, I take that back - lvExactSize
will be 12 so we will miss the optimization in all cases. But if we are actually packing such a type, e.g. on a 32-bit architecture, lvSize()
will return 12. So I believe that the check is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm pretty sure that we rely in many places on lvSize()
returning the size that the variable is allocated on the stack.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If lvSize() == 16
means that it takes 16 bytes on the stack, doesn't it mean that if we map it to SIMD16
it won't overlap with any parent fields that it was not overlapping before?
If the parent struct was promoted it means there were no overlapping fields, so it is not clear for me why we need to check anything except if (varDsc->lvSize() == 16)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason we need to make the other checks is that if it has multiple fields, it is possible (in particular on 32-bit targets) that it would have a 32-bit field following it. This could even happen on 64-bit targets with explicit layout.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
it is possible (in particular on 32-bit targets) that it would have a 32-bit field following it.
do I understand that correctly:
struct C { // size 16 on x86?
SIMD12 a; // offset 0, lvSize() == 16?, lvExactSize == 12
int b; // offset 12? , size == 4
}
how do we allocate the second field starting at 12 after we passed 16
to lvaAllocLocalAndSetVirtualOffset(lclNum, lvaLclSize(lclNum), stkOffs);
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I missed this comment earlier (I'm still struggling to get green CI testing on this!). We special-case locals of SIMD12
type on 32-bit targets so that they occupy a full 16 bytes, where normally it would only be 12, since that's an even multiple of TARGET_POINTER_SIZE
. However, if it is a member of another struct we can't do that (it would violate the struct packing rules). So on a 32-bit target the size of C.a
would be 12.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've backed out this change - it's not clear from the comment here (and I hadn't remembered) that lvSize()
always returns the "fake" size of TYP_SIMD12
, even if it may later become a "dependently promoted" local. This was causing the RayTracer benchmark to fail, because the local is 12 bytes but we were storing 16.
cb4ad9a
to
2f2896a
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
- Allow expanding SIMD12 field to 16 bytes if it is the only field. - Similarly, if we have a struct with a single field we can reference it as the full size of the struct. - Eliminate old code for an integer zero RHS with a SIMD12 LHS, and unify the codegen for zero-init for SIMD12 (between block init and assignment of SIMD init of 0)
ae77566
to
2c93671
Compare