Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Struct Improvements #35544

Merged
merged 3 commits into from
May 8, 2020
Merged

Struct Improvements #35544

merged 3 commits into from
May 8, 2020

Conversation

CarolEidt
Copy link
Contributor

  • Allow expanding SIMD12 field to 16 bytes if it is the only field.
  • Similarly, if we have a struct with a single field we can reference it as the full size of the struct.
  • Eliminate old code for an integer zero RHS with a SIMD12 LHS, and unify the codegen for zero-init for SIMD12 (between block init and assignment of SIMD init of 0)

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Apr 27, 2020
inline bool GenTree::IsSIMDZero()
{
#ifdef FEATURE_SIMD
if ((gtOper == GT_SIMD) && (AsSIMD()->gtSIMDIntrinsicID == SIMDIntrinsicInit))
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any need/desire to recognize SIMDIntrinsicGetZero or the equivalent HWIntrinsic kinds?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

SIMDIntrinsicGetZero becomes SIMDIntrinsicInit so that's covered, but we may want to consider the HWIntrinsic kinds if/when we unify their handling in codegen.

@CarolEidt
Copy link
Contributor Author

@dotnet/jit-contrib PTAL - this fixes some code size regressions I encountered in promoting multireg structs that later didn't get enregistered.

@CarolEidt
Copy link
Contributor Author

The asm diffs from a superpmi run of the tests & frameworks:
Windows/x64: Total bytes of diff: -313, (26 improved, 0 regressed)
Linux/x64: Total bytes of diff: -110 (9 improved, 0 regressed)

// However, we don't know this until late, so we may have already pretended the field is bigger
// before that.
if ((varDsc->lvSize() == 16) && !lvaIsFieldOfDependentlyPromotedStruct(varDsc))
if ((varDsc->lvSize() == 16) &&
Copy link
Contributor

@sandreenko sandreenko Apr 29, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can we check that parent exact size is 16? To protect from cases like:

[StructLayout(LayoutKind.Explicit, Pack = 1)]
public struct A {
  [StructLayout(LayoutKind.Explicit, Pack = 1)]
  public struct B {
    SIMD12 c; // can't be written as 16 bytes.
  }
}

Right now we won't generate a LCL_FLD node to access c, but it could change soon.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should that be SIMD12 c;?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you verified that a struct such as that will occupy only 12 bytes on the stack? I would be surprised, as I would expect it to always get 16. In any case, it is an inexpensive check to add.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I take that back - lvExactSize will be 12 so we will miss the optimization in all cases. But if we are actually packing such a type, e.g. on a 32-bit architecture, lvSize() will return 12. So I believe that the check is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm pretty sure that we rely in many places on lvSize() returning the size that the variable is allocated on the stack.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If lvSize() == 16 means that it takes 16 bytes on the stack, doesn't it mean that if we map it to SIMD16 it won't overlap with any parent fields that it was not overlapping before?
If the parent struct was promoted it means there were no overlapping fields, so it is not clear for me why we need to check anything except if (varDsc->lvSize() == 16)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason we need to make the other checks is that if it has multiple fields, it is possible (in particular on 32-bit targets) that it would have a 32-bit field following it. This could even happen on 64-bit targets with explicit layout.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it is possible (in particular on 32-bit targets) that it would have a 32-bit field following it.

do I understand that correctly:

struct C {   // size 16 on x86?
  SIMD12 a;  // offset 0, lvSize() == 16?, lvExactSize == 12
  int b;     // offset 12? , size == 4
}

how do we allocate the second field starting at 12 after we passed 16 to lvaAllocLocalAndSetVirtualOffset(lclNum, lvaLclSize(lclNum), stkOffs);

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I missed this comment earlier (I'm still struggling to get green CI testing on this!). We special-case locals of SIMD12 type on 32-bit targets so that they occupy a full 16 bytes, where normally it would only be 12, since that's an even multiple of TARGET_POINTER_SIZE. However, if it is a member of another struct we can't do that (it would violate the struct packing rules). So on a 32-bit target the size of C.a would be 12.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've backed out this change - it's not clear from the comment here (and I hadn't remembered) that lvSize() always returns the "fake" size of TYP_SIMD12, even if it may later become a "dependently promoted" local. This was causing the RayTracer benchmark to fail, because the local is 12 bytes but we were storing 16.

Copy link
Contributor

@sandreenko sandreenko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

- Allow expanding SIMD12 field to 16 bytes if it is the only field.
- Similarly, if we have a struct with a single field we can reference it as the full size of the struct.
- Eliminate old code for an integer zero RHS with a SIMD12 LHS, and unify the codegen for zero-init for SIMD12 (between block init and assignment of SIMD init of 0)
@CarolEidt CarolEidt merged commit 57408d0 into dotnet:master May 8, 2020
@CarolEidt CarolEidt deleted the StructImprovements branch May 8, 2020 15:07
@ghost ghost locked as resolved and limited conversation to collaborators Dec 9, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants