Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Keep correct fieldSeq for 0-offset fields. #32085

Merged
merged 5 commits into from
Feb 15, 2020

Conversation

sandreenko
Copy link
Contributor

@sandreenko sandreenko commented Feb 11, 2020

Note: The PR was rewritten, the body description has changed.

There are two changes in that PR:
e3f0fe0 Adds missing zero-offset field seq, this is a correctness fix;
e820fa8 Optimize ADD(val, 0) simple cases to avoid LEA(val, 0) creation, fixes #13548.

Diffs for SPC x64:

Total bytes of diff: -15 (-0.00% of base)
    diff is an improvement.

Diffs for FX libraries x64:

Total bytes of diff: 51 (0.00% of base)
    diff is a regression.
The diffs for FX libaries x64 and their analysis.

The improved methods are now optimizing ADD(val, 0) that we were creating for dst in the past (and keep creating), note there was no offset != 0 check there

GenTree* fieldOffsetNode = gtNewIconNode(lvaTable[fieldLclNum].lvFldOffset, TYP_I_IMPL);

like:

         -13 (-26.53% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Ecma335.LiteralEncoder:TaggedVector(byref,byref):this
         -13 (-26.53% of base) : System.Reflection.Metadata.dasm - System.Reflection.Metadata.Ecma335.LiteralEncoder:TaggedScalar(byref,byref):this

The regressed method now can't cse trees like:

N007 ( 20, 14)              [000300] --CXG-------              \--*  IND       long   <l:$1c6, c:$1c5>
N006 ( 18, 12)              [000294] --CXG--N----                 \--*  ADD       byref  <l:$286, c:$285>
N004 ( 17, 11) CSE #03 (def)[000295] --CXG-------                    +--*  IND       ref    <l:$2c3, c:$185>
N003 ( 15,  9)              [000296] --CXG--N----                    |  \--*  ADD       byref  $282
N001 ( 14,  5) CSE #02 (def)[000297] H-CXG-------                    |     +--*  CALL help r2r_ind byref  HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE $280
N002 (  1,  4)              [000298] ------------                    |     \--*  CNS_INT   int    984 Fseq[MaxDaylightDelta] $44
N005 (  1,  1)              [000299] ------------                    \--*  CNS_INT   long   8 Fseq[#FirstElem, _ticks] $340

with

N008 ( 20, 14)              [000023] --CXG------- arg0 in rcx     \--*  IND       long   <l:$1cc, c:$1cb>
N007 ( 18, 12)              [000022] --CXG--N----                    \--*  ADD       byref  <l:$286, c:$288>
N005 ( 17, 11) CSE #03 (use)[000020] --CXG-------                       +--*  IND       ref    <l:$2c3, c:$18d>
N004 ( 15,  9)              [000019] --CXG--N----                       |  \--*  ADD       byref  $282
N002 ( 14,  5) CSE #02 (use)[000017] H-CXG-------                       |     +--*  CALL help r2r_ind byref  HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE $280
N003 (  1,  4)              [000018] ------------                       |     \--*  CNS_INT   int    984 Fseq[MaxDaylightDelta] $44
N006 (  1,  1)              [000021] ------------                       \--*  CNS_INT   long   8 Fseq[#FirstElem] $340

because codeGen sees that one access the struct #FirstElem and the other access its field #FirstElem._ticks.

In this case, it was not dangerous to CSE them in the past, but soon, with struct indirs, we would get a CSE assert from IsCompatibleType when the struct is a struct and its field is, for example, long.

The another case is that now assertion propogation keeps tree like:

[000189] -A-XG-------              |  |  +--*  COMMA     void  
[000184] -A-XG---R---              |  |  |  +--*  not important trees here
[000188] -A-X----R---              |  |  |  \--*  ASG       ref   
[000185] D------N----              |  |  |     +--*  LCL_VAR   ref    V20 tmp15        d:2
[000187] ---X--------              |  |  |     \--*  IND       ref   
[000186] ------------              |  |  |        \--*  LCL_VAR   byref  V25 tmp20        u:2 Zero Fseq[Syntax]
[000195] ---X--------              |  |  \--*  COMMA     void  
[000194] ---X--------              |  |     +--*  IND       int   
[000193] -------N----              |  |     |  \--*  ADD       byref 
[000191] ------------              |  |     |     +--*  LCL_VAR   byref  V25 tmp20        u:2
[000192] ------------              |  |     |     \--*  CNS_INT   long   8 Fseq[PrecedingInitializersLength] 

000194 is useless because IND can't fail, because we have 000187, but after the change they have different VN, so optAssertionIsNonNull doesn't recognize that. I filled #32248 to track that, I believe there were preexisting cases like this.

The commit ADD(val, 0) catches some cases revealed by #1735 in Compiler::fgMorphSmpOp(GenTree* tree, MorphAddrContext* mac) for GT_ADD, when our cns1 + ns2 == 0, but we don't delete that tree there. The whole method could be improved, there is a PR that makes it better #656, probably after that we will have time to refactor and clean up that logic.

All these changes are preparing for actual struct indirections and return struct, struct call changes,. From my experience, such changes always cause regressions somewhere, so I would like to merge them in as many separate PRs as possible to be able to find which change caused which regression faster.

@sandreenko sandreenko added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Feb 11, 2020
@mikedn
Copy link
Contributor

mikedn commented Feb 11, 2020

The first change adds CNS_INT long 0 Fseq[_00] for source field in the field by field assignments even when offset is 0.

Hmm, I suspect that the "proper" way to do this is to add the address to the "zero offset field map". But then I happen to think that this kind of side maps are a terrible idea and that the it's best to keep this information in the IR, in special nodes - PTR_FLD and PTR_IDX. Basically LEAs with field sequences but without suffering from LEAs unary/binary split personality disorder. ADD(x, 0) should do, with some extra special cases. Haven't checked the actual changes yet but I suppose you may want to make such nodes non-CSE, 0-size/execution cost, 0-level in eval order etc.

Optimize ADD(val, 0) in lower.

Maybe this will also fix 13548?

@sandreenko
Copy link
Contributor Author

sandreenko commented Feb 11, 2020

Maybe this will also fix 13548?

Thanks, have not seen that yet. I think yes. Link #13548.

@sandreenko
Copy link
Contributor Author

sandreenko commented Feb 11, 2020

it's best to keep this information in the IR, in special nodes - PTR_FLD and PTR_IDX.

I think I do not understand, could you give me an example? Is it like IND(PTR_FLD(LCL_VAR or ADD))? How should it look if we access a field of a field?

Haven't checked the actual changes yet but I suppose you may want to make such nodes non-CSE, 0-size/execution cost, 0-level in eval order etc.

execution cost at this phase doesn't matter, so I did not bother setting that
Edit: it does matter, gtSetEvalOrder could be updated to have 0 cost for ADD(val, 0).

0-level in eval is not required, as non-CSE flag.

The main issue that this PR solves is CSE when INDs have non-compatible types, but identical trees under IND nodes, after this change it can't happen because one tree will have + 0 and another won't.

@mikedn
Copy link
Contributor

mikedn commented Feb 11, 2020

I think I do not understand, could you give me an example? Is it like IND(PTR_FLD(LCL_VAR or ADD))?

IND(PTR_FLD(any TYP_REF, TYP_BYREF, TYP_I_IMPL tree))

Basically - an unary operator that also contains an unsigned (or maybe target_size_t) m_offset field (like LEA's gtOffset) and a FieldSeqNode* m_fieldSeq (possibly NotAField) corresponding to that offset. The result is of course - gtOp1's value + m_offset. m_offset may be 0 in order to have a field sequence associated with the address.

PTR_IDX (or maybe PTR_INDEX to match INDEX) is basically a LEA with always non-null Base and Index so it is always binary. It too carries a FieldSeqNode* that may describe an array element access + a sequence of struct field accesses (for arrays of struct type).

PTR_IDX is probably a more complicated issue and requires further thought but PTR_FLD seems like an obvious win:

  • Smaller IR (it's one node instead of ADD + CNS_INT) for a common pattern
  • Different oper than ADD requires specific transform code (e.g. folding) and minimizes the risk of losing field sequences by forgetting to check if CNS_INT has a field sequence
  • Better matches codegen reality - there have been discussions of using LEA's in frontend to avoid having to recognize LEA patterns in frontend and disable CSE of such patterns because they may end up being contained memory operands.
  • Could probably remove FieldSeqNode from GT_CNS_INT which is a grab bag of everything today (real constants, handles, field offsets and whatnot). Though you probably need PTR_IDX too for this.

// Add the fieldSeq offset for all fields, even for 0 offset.
// We will read these field seq when we ask for `CORINFO_CLASS_HANDLE` for these trees.
// We can use `fieldSeq` from the dst because fieldHnd for src and dst must match.
GenTreeIntCon* fldOffset = new (this, GT_CNS_INT)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like you could shorten this a bit by using the gtNewIconNode(unsigned fieldOffset, FieldSeqNode* fieldSeq) overload I added a while ago.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That helped me to find another case where we did not create if the offset was zero:

GenTree* fldOffsetNode = new (this, GT_CNS_INT) GenTreeIntCon(TYP_INT, fldOffset, fieldSeq);

the strange thing is that it was created with TYP_INT instead of TYP_I_IMPL. probably does't matter, but I will check the diffs.

// We should not access it after lowering, so we can drop it now.
GenTree* zero = nullptr;
GenTree* value = nullptr;
if (op1->IsCnsIntOrI() && (op1->AsIntCon()->IconValue() == 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IsIntegralConst(0) ?

zero = op1;
value = op2;
}
else if (op2->IsCnsIntOrI() && (op2->AsIntCon()->IconValue() == 0))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typically it's enough to check only op2 for constants because gtSetEvalOrder tries to put constants in the second op.


if (zero != nullptr)
{
LIR::Use use;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's already a "use" variable, you may want to reuse it.

@sandreenko
Copy link
Contributor Author

Hmm, I suspect that the "proper" way to do this is to add the address to the "zero offset field map". But then I happen to think that this kind of side maps are a terrible idea and that the it's best to keep this information in the IR, in special nodes - PTR_FLD and PTR_IDX.

I have checked how it works with "zero offset field map" and you are right, overall results are better.
fgAddFieldSeqForZeroOffset(tlsRef, fieldSeq); has almost all logic that we need, we just not calling it sometimes.

The only downside is that optAssertionIsNonNull now cannot understand that it doesn't need a null check in cases like:

-A-X----R---              |  \--*  ASG       ref    <l:$22a, c:$229>
D------N----              |     +--*  LCL_VAR   ref    V20 tmp15        d:2 <l:$227, c:$155>
---X--------              |     \--*  IND       ref    <l:$22a, c:$229>
------------              |        \--*  LCL_VAR   byref  V25 tmp20        u:2 Zero Fseq[Syntax] $441
---X--------              \--*  COMMA     void   $580
---X--------                 +--*  IND       int    <l:$34b, c:$34a>
-------N----                 |  \--*  ADD       byref  $442
------------                 |     +--*  LCL_VAR   byref  V25 tmp20        u:2 $440
------------                 |     \--*  CNS_INT   long   8 

The first indirection is done on LCL_VAR byref V25 tmp20 u:2 Zero Fseq[Syntax] $441, the second on LCL_VAR byref V25 tmp20 u:2 $440 and after the change they have different VNs, so too hard for optAssertionIsNonNull. Probably that behaviour is responsible for other uneccessary null checks, I will look for an exisitng issue tomorrow.
When we had add we skipped it in optAssertionProp_Ind:

    // Check for add of a constant.
    GenTree* op1 = tree->AsIndir()->Addr();
    while ((op1->gtOper == GT_ADD) && (op1->AsOp()->gtOp2->gtOper == GT_CNS_INT))
    {
        op1 = op1->AsOp()->gtOp1;
    }

It is almost 3am, but my brain says that Fseq should not affect VN of LCL_VAR (because it is a pointer type, not a real value), but should affect IND node on top of it. Does it make sense?

@mikedn
Copy link
Contributor

mikedn commented Feb 12, 2020

It is almost 3am, but my brain says that Fseq should not affect VN of LCL_VAR (because it is a pointer type, not a real value), but should affect IND node on top of it. Does it make sense?

Working late? Not good.

Does that lclvar contain a pointer (byref) to a local variable perhaps? VN attempts to track such pointers, including the associated field sequence, using VNF_PtrToLoc. So yes, VN's are likely to end up being different if the field sequence is different.

The interesting part is that optAssertionIsNonNull uses IsKnownNonNull which checks for VNFOA_KnownNonNull. VNF_PtrToLoc doesn't seem to have it set, despite supposedly representing a pointer to a local variable.

@sandreenko
Copy link
Contributor Author

PTAL @dotnet/jit-contrib

Copy link
Contributor

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Left a few comments

@@ -4498,12 +4504,43 @@ bool Lowering::TryCreateAddrMode(GenTree* addr, bool isContainable)
// Arguments:
// node - the node we care about
//
void Lowering::LowerAdd(GenTreeOp* node)
// Returns:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This also needs to be in the (missing) function header for
GenTree* Lowering::LowerNode(GenTree* node)

GenTree* next = LowerAdd(node->AsOp());
if (next != nullptr)
{
return next;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is missing a proper function header comment

Copy link
Contributor

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd like to see you capture 2-3 examples of regressions, and file new (or annotate existing) issues to ensure that they are addressed going forward.

{
// Append the zero field sequences
zeroFieldSeq = GetFieldSeqStore()->Append(existingZeroOffsetFldSeq, zeroFieldSeq);
}
// Transfer the annotation to the new GT_ADDR node.
fgAddFieldSeqForZeroOffset(op1, zeroFieldSeq);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be an assert here that zeroFieldSeq != nullptr or is that self-evident from the fact that isZeroOffset is true?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We could add it, but it looks self-evident.
Both values are initialized a few lines above with:
bool isZeroOffset = GetZeroOffsetFieldMap()->Lookup(tree, &zeroFieldSeq);
so isZeroOffset == true gurantees zeroFieldSeq != nullptr.

The problem with that code was that the condition was written before fgAddFieldSeqForZeroOffset was introduced. When fgAddFieldSeqForZeroOffset was created that logic was encapsulated there, but the condition was not deleted.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the clarification

@AndyAyersMS
Copy link
Member

@sandreenko do you see any systematic fix for the general issue of losing these zero offset annotations? This is not the first time we've run into this.

Or is there at least some way to write a checker?

@sandreenko
Copy link
Contributor Author

sandreenko commented Feb 13, 2020

I'd like to see you capture 2-3 examples of regressions, and file new (or annotate existing) issues to ensure that they are addressed going forward.

Yes, so there are 2 types of regressions:

  1. we can't CSE field and its parent struct now, even if they have the same size, like struct A { native int a; }, regressions like that we can see, for example, in SPC AdjustmentRule:AdjustDaylightDeltaToExpectedRange(byref,byref), I wrote that example in the body description,
before: def: N007 ( 20, 14) CSE #4 (def)[000300] --CXG------- \--* IND long N006 ( 18, 12) [000294] --CXG--N---- \--* ADD byref N004 ( 17, 11) CSE #3 (def)[000295] --CXG------- +--* IND ref N003 ( 15, 9) [000296] --CXG--N---- | \--* ADD byref $282 N001 ( 14, 5) CSE #2 (def)[000297] H-CXG------- | +--* CALL help r2r_ind byref HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE $280 N002 ( 1, 4) [000298] ------------ | \--* CNS_INT int 984 Fseq[MaxDaylightDelta] $44 N005 ( 1, 1) [000299] ------------ \--* CNS_INT long 8 Fseq[#FirstElem] $340

use:
N008 ( 20, 14) CSE #4 (use)[000023] --CXG------- arg0 in rcx --* IND long <l:$1c6, c:$1c9>
N007 ( 18, 12) [000022] --CXG--N---- --* ADD byref <l:$286, c:$288>
N005 ( 17, 11) CSE #3 (use)[000020] --CXG------- +--* IND ref <l:$2c3, c:$18d>
N004 ( 15, 9) [000019] --CXG--N---- | --* ADD byref $282
N002 ( 14, 5) CSE #2 (use)[000017] H-CXG------- | +--* CALL help r2r_ind byref HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE $280
N003 ( 1, 4) [000018] ------------ | --* CNS_INT int 984 Fseq[MaxDaylightDelta] $44
N006 ( 1, 1) [000021] ------------ --* CNS_INT long 8 Fseq[#FirstElem] $340

after:
previous def:
N007 ( 20, 14) [000300] --CXG------- --* IND long <l:$1c6, c:$1c5>
N006 ( 18, 12) [000294] --CXG--N---- --* ADD byref <l:$286, c:$285>
N004 ( 17, 11) CSE #3 (def)[000295] --CXG------- +--* IND ref <l:$2c3, c:$185>
N003 ( 15, 9) [000296] --CXG--N---- | --* ADD byref $282
N001 ( 14, 5) CSE #2 (def)[000297] H-CXG------- | +--* CALL help r2r_ind byref HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE $280
N002 ( 1, 4) [000298] ------------ | --* CNS_INT int 984 Fseq[MaxDaylightDelta] $44
N005 ( 1, 1) [000299] ------------ --* CNS_INT long 8 Fseq[#FirstElem, _ticks] $340

previous use:
N008 ( 20, 14) [000023] --CXG------- arg0 in rcx --* IND long <l:$1cc, c:$1cb>
N007 ( 18, 12) [000022] --CXG--N---- --* ADD byref <l:$286, c:$288>
N005 ( 17, 11) CSE #3 (use)[000020] --CXG------- +--* IND ref <l:$2c3, c:$18d>
N004 ( 15, 9) [000019] --CXG--N---- | --* ADD byref $282
N002 ( 14, 5) CSE #2 (use)[000017] H-CXG------- | +--* CALL help r2r_ind byref HELPER.CORINFO_HELP_READYTORUN_STATIC_BASE $280
N003 ( 1, 4) [000018] ------------ | --* CNS_INT int 984 Fseq[MaxDaylightDelta] $44
N006 ( 1, 1) [000021] ------------ --* CNS_INT long 8 Fseq[#FirstElem] $340

so before both indirections had VN <l:$1c6, c:$1c5>, now they are different. I have comments about that in #1231, I will create a separate issue after #1231 is closed, when I have a complete understanding of its impact.

[000023] IND has type LONG, but really it has type STRUCT, that we retype to LONG in import.

  1. Assertion propagation doesn't delete null checks and X flags from indirections for field/struct accesses, the example was also in the body, I have created an issue for it optAssertionProp_Ind should eliminate GTF_EXCEPT better. #32248.
    We could probably try to iterate over different VN optAssertionIsNonNull (check if we have information about the struct, check if we have information about any of its fields, if we do then optAssertionIsNonNull should return true).

@sandreenko
Copy link
Contributor Author

@sandreenko do you see any systematic fix for the general issue of losing these zero offset annotations? This is not the first time we've run into this.

Or is there at least some way to write a checker?

That is a great question, as I discussed with Mike above, I think we could have FldSeq on indirection nodes, not on addresses, but it will be hard to do (because we don't have parent links).
Mike suggested to add PTR_FLD, PTR_IND nodes and remove FieldSeqNode from GT_CNS_INT that is easier to implement and it will be easier to catch all places like that.

However, my coming changes will help with that as well, we will have real STRUCT types on indirection nodes, so CSE will assert if it sees identical VN on IND that are not compatible, like STRUCT and not a struct (that how the places from this PR were found).

And yes, I would like to refactor fgMorphCopyBlock function in the near future, its complexity allows these bugs to sneak out.

FieldSeqNode* zeroOffsetFldSeq = nullptr;
if (GetZeroOffsetFieldMap()->Lookup(srcAddr, &zeroOffsetFldSeq))
{
fieldSeqVN =
vnStore->FieldSeqVNAppend(fieldSeqVN, vnStore->VNForFieldSeq(zeroOffsetFldSeq));
// Check that the zero offset field seq was attached for `srcAddr`.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There was an x86 assert failure, that Brian helped me to debug and fix, the PR was updated.
I have also run SPMI tests over pri1 and internal collections and they have not shown any other asserts.

The previous code was added before TFS so I don't know the context, but I suspect it was dead code because we were checking this GetZeroOffsetFieldMap for srcAddr twice and it was either both true and then the assert should happen, of both false and we did not pass the check.

@sandreenko
Copy link
Contributor Author

The failures are infra.

@sandreenko sandreenko merged commit 7d3de08 into dotnet:master Feb 15, 2020
@sandreenko sandreenko deleted the prepareForMike'sChanges branch February 15, 2020 02:37
briansull added a commit to briansull/runtime that referenced this pull request Feb 25, 2020
briansull added a commit to briansull/runtime that referenced this pull request Feb 25, 2020
briansull added a commit that referenced this pull request Feb 26, 2020
…31834)

* Added ValueNumbering support for GT_SIMD and GT_HWINTRINSIC tree nodes

* Allow SIMD and HW Intrinsics to be CSE candidates

* Correctness fix for optAssertionPropMain
  - Zero out the bbAssertionIn values, as these can be referenced in RangeCheck::MergeAssertion
    and this is shared state with the CSE phase: bbCseIn

* Improve the VNFOA_ArityMask

* Update to use the new TARGET macros

* Include node type when value numbering SIMDIntrinsicInit
Mutate the gloabl heap when performing a HW_INTRINSIC memory store operation
Printing of SIMD constants only support 0

* Disable CSE's for some special HW_INTRINSIC categories

* Code review feedback

* Record csdStructHnd; // The class handle, currently needed to create a SIMD LclVar in PerformCSE

* Instead of asserting on a struct handle mismatch, we record it in csdStructHndMismatch and avoid making the candidate into a CSE

* Fix the JITDUMP messages to print the CseIndex

* add check for (newElemStructHnd != NO_CLASS_HANDLE)

* Additional checks for SIMD struct types when setting csdStructHnd
Added Mismatched Struct Handle assert in ConsiderCandidates

* fix GenTreeSIMD::OperIsMemoryLoad for ARM64
Removed ismatched Struct Handle assert

* Fix the printing of BitSets on Linux, change the printf format specifier

* Added check for simdNode->OperIsMemoryLoad()) to fgValueNumberSimd
Added bool OperIsMemoryLoad() to GenTreeSIMD, returns true for SIMDIntrinsicInitArray
Added valuenumfuncs.h to src/coreclr/src/jit/CMakeLists.txt

* Avoid calling gtGetStructHandleIfPresent to set csdStructHnd when we have a GT_IND node

* Fix check for (newElemStructHnd != hashDsc->csdStructHnd)

* added extra value number argument VNF_SimdType for Most SIMD operations
added VNF_SimdType // A value number function to compose a SIMD type
added vnDumpSimdType

* Added bool methods vnEncodesResultTypeForSIMDIntrinsic and vnEncodesResultTypeForHWIntrinsic
these return true when a SIMD or HW Instrinsic will use an extra arg to record the result type during value numbering
Changes the ValueNumFuncDef to set the arity to zero when a -1 value is passed in
Updated InitValueNumStoreStatics to fixup the arity of SIMD or HW Instrinsic that have an extra arg to record the result type
Allow a type mismatchj when we have a GT_BLK as the lhs of an assignment, as it is used to zero out Simd structs

* Fix for SIMD_WidenLo arg count

* fix typo

* Fix x86 build breaks
Fix SIMD_WidenHi

* Added method header comment for vnEncodesResultTypeForHWIntrinsic
Added & VNFOA_ArityMask when assigning to vnfOpAttribs[]

* Codereview feedback and some more comments

* fix typo

* Moved the code that sets the arg count for the three SIMD intrinsics

* clang-format

* Adjust CSE for SIMD types that are live across a call

* Proposed fix for #32085

* Revert "Proposed fix for #32085"

This reverts commit 169c24e.

* Added better comments for optcse SIMD caller saved register heuristics

* Added CONFIG_INTEGER: JitDisableSimdVN,
   Default 0, ValueNumbering of SIMD nodes and HW Intrinsic nodes enabled
   If 1, then disable ValueNumbering of SIMD nodes
   If 2, then disable ValueNumbering of HW Intrinsic nodes
   If 3, disable both SIMD and HW Intrinsic nodes

* Moved JitDisableSimdVN from DEBUG to RETAIL
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Do not create unnecessary LEA(b+0) nodes
5 participants