Simplify some relop/jtrue related optimizations #14027

mikedn · 2017-09-17T09:49:44Z

These optimizations tend to be spread across multiple functions and add more work to common code paths.

SIMD equality is first recognized in ContainCheckJTrue and then it needs to be recognized again during codegen

Lines 6289 to 6293 in c90a41b

    
           if ((targetReg == REG_NA) && tree->OperIs(GT_EQ, GT_NE)) 
        
           { 
        
               // Is it a SIMD (in)Equality that doesn't need to materialize result into a register? 
        
               if ((op1->gtRegNum == REG_NA) && op1->IsSIMDEqualityOrInequality()) 
        
               {

because ContainCheckJTrue did not actually remove the redundant compare. SIMD equality compares are obviously rare and we're trying to recognize them every time we handle a JTRUE or a compare.

Similarly, the case where the condition flags set by a previous instruction can be used instead of emitting a zero test is handled in ContainCheckCompare and genCompareInt. The later is supposed to recognize such redundant compares by checking the GTF_USE_FLAGS but that flag is really intended to indicate that the node is consuming the condition flags, not that no code should be generated for it. And like in the SIMD case it's again more work pushed down to a common code path, this optimization kicks in for less than 0.5% of all compares.

In both cases lowering can take advantage of CMP, SETCC and JCC to alter the IR in such a way that no special casing is required in subsequent phases.

mikedn · 2017-09-18T16:06:37Z

@CarolEidt @pgavlin I keep toying with the idea of lowering more RELOP/JTRUE nodes to CMP/TEST/SETCC/JCC. The main advantage is that various pieces of logic end up concentrated in one place (LowerCompare or LowerSIMD for SIMD equality/inequality) instead of spanning lowering/tree node info init/codegen. The main disadvantage would be that we need to use TryGetUse more often. However:

TryGetUse should be pretty fast in this case as the use typically follows the relop
In the case of LowerSIMD it doesn't matter as it's not common to have SIMD == and !=
When a JTRUE/JCC uses the flags set by a previous instruction the RELOP/CMP node gets removed, so there are less nodes to process in subsequent phases

The impact on JIT throughput seems to be very small, below 0.05% instructions retired. This is basically below the noise level I get with ETW profiling. And it seems that there are fewer branch mispredictions, but that too is below the noise level.

Opinions? Even if this does have impact on JIT throughput would you consider this a worthwhile change due to code simplification?

pgavlin · 2017-09-19T03:51:41Z

The main disadvantage would be that we need to use TryGetUse more often. However:

It would be interesting to instrument TGU s.t. we can figure out the average search length before it returns. We might be able to get away with limiting the window in which it searches, especially for throughput-oriented scenarios (e.g. minopts/debuggable code).

Opinions? Even if this does have impact on JIT throughput would you consider this a worthwhile change due to code simplification?

Can you try getting an instructions retired count using pin or callgrind? If the impact is as low as you say, then I'd imagine that this could be worth it.

mikedn · 2017-09-19T18:07:16Z

Can you try getting an instructions retired count using pin or callgrind?

Ha ha, I'm still trying to build pin.

mikedn · 2017-09-20T05:20:05Z

@pgavlin Can you tell me what make/mingw did you use to build pin? It would seem that those makefiles do not work well with mingw at least. They pass compiler parameters using / instead of - and the / gets treated by the mingw shell as if it is a path. I tried fixing the makefile and now building doesn't show any errors anymore, it just hangs. Sheesh.

mikedn · 2017-09-20T14:13:59Z

I tried fixing the makefile and now building doesn't show any errors anymore, it just hangs

Figured it out, I converted one too many / into -.

I used the icount pin tool and I've got numbers that are similar to what ETW reports but with a smaller (~order of magnitude) standard deviation. Now to analyze the numbers...

mikedn · 2017-09-20T15:48:56Z

ETW and PIN data here: https://1drv.ms/x/s!Av4baJYSo5pjgrkusSKacdbhZjDttg

Both show a 0.03-0.04% increase in instructions retired. Let me see if I can improve this.

mikedn · 2017-09-20T18:33:28Z

It would be interesting to instrument TGU s.t. we can figure out the average search length before it returns.

Function	Before	After	Increase
`LIR::TryGetUse`	108590	108725	135
`GenTree::TryGetUse`	152056	152191	135
Ratio	1.40027	1.39977

Hrm, the increase is so small that it makes ETW/PIN number completely irrelevant. These additional 135 calls can't possibly turn into a few million instructions. And it's not like I only added code, I also removed code.

mikedn · 2017-09-25T20:21:17Z

I've improved a few things and now ETW/PIN show a 0.01-0.02% improvement. I'd take that with a grain of salt. Let's just say that it's as fast as the old code.

It's probably more useful to look at various counts reported by manual instrumentation (crossgen corelib numbers):

Code	Count
`Lowering::LowerCompare`	60011
`LIR::TryGetUse`	109801
"Flags reuse" optimization	249 (0.4% of compares)
"Flags reuse" calls to `LIR::TryGetUse`	14 (0.01% of all `TryGetUse` calls)

Improvements since the my first attempt:

The common case (95%) of GT_JTRUE immediately following a relop is handled without calling TryGetUse.
Got rid of GTF_ZSF_SET. That flag was set on ALL nodes that may trigger this optimization but only a few nodes will actually be used by a relop. It's preferable to simply use OperIs(GT_AND, GT_OR, GT_XOR, GT_ADD, GT_SUB) in LowerCompare than to waste time on all GT_AND & co. nodes.
Extended the optimization to all relops. It was somewhat arbitrarily limited to EQ/NE and that required yet another conditional branch.

There are some additional improvements that may be made in the codegen code for JCC/SETCC (it's rather convoluted) but I'll leave that for another PR, if any).

CarolEidt

I think this looks good overall, though it should be squashed & merged.
Have you run diffs?

CarolEidt · 2017-09-26T00:25:10Z

src/jit/lower.cpp

@@ -2153,7 +2142,7 @@ GenTree* Lowering::LowerTailCallViaHelper(GenTreeCall* call, GenTree* callTarget
 //      be used for ARM as well if support for GT_TEST_EQ/GT_TEST_NE is added).
 //    - Transform TEST(x, LSH(1, y)) into BT(x, y) (XARCH specific)

-void Lowering::LowerCompare(GenTree* cmp)


The function header needs to be updated to describe the return value.

CarolEidt · 2017-09-26T00:33:18Z

src/jit/lowerxarch.cpp

+    {
+        LIR::Use simdUse;
+
+        if (BlockRange().TryGetUse(simdNode, &simdUse) && simdUse.User()->OperIs(GT_EQ, GT_NE) &&


This code needs comments explaining what is being done. The code removed from lower.cpp was pretty well described, and I think we need similar level of detail here.

CarolEidt · 2017-09-26T00:37:45Z

src/jit/simdcodegenxarch.cpp

@@ -2148,8 +2148,10 @@ void CodeGen::genSIMDIntrinsicRelOp(GenTreeSIMD* simdNode)
                getEmitter()->emitIns_R_I(INS_cmp, EA_4BYTE, intReg, mask);
            }

-            if (targetReg != REG_NA)
+            if ((simdNode->gtFlags & GTF_SET_FLAGS) == 0)


I would add an else clause here, asserting that targetReg == REG_NA. I would actually be more inclined to reverse the sense of this. That is, I would check whether targetReg != REG_NA, and then assert that GTF_SET_FLAGS is not set/set in the if and else clause, but I guess they are basically equivalent.

Yes, the existing if should stay as is but I have problems getting this to work because the RA insists on allocating a register even if dstCount is 0. Need to look into this a bit more.

This is caused by this piece of code in lsra.cpp

coreclr/src/jit/lsra.cpp

Lines 4821 to 4828 in 10c320c

TreeNodeInfoInit(node);

// If the node produces an unused value, mark it as a local def-use

if (node->IsValue() && node->IsUnusedValue())

{

node->gtLsraInfo.isLocalDefUse = true;

node->gtLsraInfo.dstCount = 0;

}

It forces isLocalDefUse to true for unused value nodes. That's probably intended to cover the common case of x86 instructions (e.g. add eax, ebx) but it's not suitable in this situation.

Ah, I think this is similar to compares, where I had to add an isNoRegCompare bit to the TreeNodeInfo to handle the case where you've got a compare that you don't want to allocate a register for.

I looked into isNoRegCompare before but it's purpose seems to be rather different, it's used by "Contain" code to communicate to "TreeNodeInfoInit" code that dstCount should be 0. But LSRA itself doesn't appear to use isNoRegCompare in any way so setting it on the SIMD node has no effect.

What's not clear to me is why LSRA forces isLocalDefUse to true based on IsValue/IsUnusedValue instead of relying on the information provided by TreeNodeInfoInit.

Also note that the TreeNodeInfoInit function LSRA calls ends with the following piece of code:

coreclr/src/jit/lsraxarch.cpp

Lines 719 to 725 in 43cf34f

if (tree->IsUnusedValue() && (info->dstCount != 0))

{

info->isLocalDefUse = true;

}

// We need to be sure that we've set info->srcCount and info->dstCount appropriately

assert((info->dstCount < 2) || (tree->IsMultiRegCall() && info->dstCount == MAX_RET_REG_COUNT));

}

This is very similar to the code I quoted above yet slightly different. Are these 2 pieces of code correct?

What's not clear to me is why LSRA forces isLocalDefUse to true based on IsValue/IsUnusedValue instead of relying on the information provided by TreeNodeInfoInit.

Ultimately, the setting of dstCount is/should be based on:

If the node is contained it is 0 (though that's largely irrelevant because the value won't be used)

If !IsValue() it is 0.

If IsNoRegCompare() it is 0

If IsValue() it is 1, or more if it is a node that defines multiple registers

There are places outside of LSRA that need to know the number of registers defined by a node. So the plan is that one should be able to determine the dstCount with gtLsraInfo, which is being eliminated. And in my next round of changes, I'm adding an assert at the end of LinearScan::TreeNodeInfoInit():

assert(info->dstCount == tree->GetRegisterDstCount());

Where GetRegisterDstCount() is a new method that does the above checks.

CarolEidt

The changes look great overall. I had a couple of requests for additional comments.
I would also like to see the instrumenting of TGU in a separate PR, if that's not too much trouble.
(And it would be great to squash the remaining commits).

mikedn · 2017-09-27T23:45:22Z

Thanks, the instrumentation code should have not been committed, looks like forgot to unstage a file. I also need to fix conflicts and merge with ARM64 work.

I'm not feeling so well at the moment so I'm not sure I'll be able to finish this in the next couple of days. Besides, you have your own changes that will like conflict with this (especially isNoRegCompare).

mikedn · 2017-09-29T20:12:07Z

src/jit/lowerxarch.cpp

+
+        if (BlockRange().TryGetUse(simdNode, &simdUse) && simdUse.User()->OperIs(GT_EQ, GT_NE) &&
+            simdUse.User()->gtGetOp2()->IsCnsIntOrI())
+        {


The above condition mirrors the original code but it appears to be incomplete because it doesn't check that there are no nodes between JTRUE and SIMD (except the relop and its second op) that may change flags.

mikedn · 2017-09-29T22:15:08Z

I wonder if adding more uses of isNoRegCompare is a good idea. The only other use of this flag is in ContainCheckJTrue and there we can replace it with something like:

    node->ChangeOper(GT_JCC);
    GenTreeCC* cc = node->AsCC();
    cc->gtCondition = node->OperGet();
    cc->gtFlags |= (node->gtFlags & GTF_UNSIGNED);
    node->SetOper(GT_CMP);

More expensive, yes. But then this code runs only for JTRUE nodes while isNoRegCompare has to be tested on every used value, these are far more common than JTRUE nodes.

A better solution would be to add GT_SIMD_CMP - a node similar to GT_CMP that doesn't produce a value and sets the flags. Then there would be no need for isNoRegCompare.

mikedn · 2017-10-01T13:51:32Z

@sdmaclea I moved all the code from ContainCheckCompare, see 2nd and 4th commits. It would be nice to run an ARM64 build to be sure it works fine.

mikedn · 2017-10-01T15:53:18Z

Have you run diffs?

The 3rd commit (Extend flag reuse optimization to all relops) generates diffs:

Total bytes of diff: -345 (0.00% of base)
    diff is an improvement.
Total byte diff includes 0 bytes from reconciling methods
        Base had    0 unique methods,        0 unique bytes
        Diff had    0 unique methods,        0 unique bytes
Top file improvements by size (bytes):
        -133 : System.Private.CoreLib.dasm (0.00% of base)
        -116 : System.Text.RegularExpressions.dasm (-0.12% of base)
         -27 : Microsoft.CSharp.dasm (-0.01% of base)
         -13 : Microsoft.CodeAnalysis.CSharp.dasm (0.00% of base)
         -12 : System.Collections.Concurrent.dasm (-0.02% of base)
14 total files with size differences (14 improved, 0 regressed), 65 unchanged.
Top method regessions by size (bytes):
           1 : System.Private.CoreLib.dasm - Task:FinishSlow(bool):this
           1 : System.Private.CoreLib.dasm - Task:ProcessChildCompletion(ref):this
           1 : System.Private.CoreLib.dasm - Task:WaitAllBlockingCore(ref,int,struct):bool
           1 : System.Private.CoreLib.dasm - SetOnCountdownMres:Invoke(ref):this
           1 : System.Private.CoreLib.dasm - WhenAllPromise:Invoke(ref):this
Top method improvements by size (bytes):
         -38 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:PickPivotAndPartition(ref,int,int):int (18 methods)
         -22 : System.Text.RegularExpressions.dasm - RegexParser:ScanCharClass(bool,bool):ref:this
         -20 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:DownHeap(ref,int,int,int) (18 methods)
         -18 : System.Text.RegularExpressions.dasm - RegexParser:ScanGroupOpen():ref:this
         -15 : System.Private.CoreLib.dasm - GenericArraySortHelper`1:SwapIfGreaterWithItems(ref,int,int) (18 methods)
88 total methods with size differences (80 improved, 8 regressed), 66865 unchanged.

I was after better throughput and simpler code but this is also a CQ improvement, we have #7566 for this.

Sample diffs:

  sub      ecx, dword ptr [rsi+56]
- test     ecx, ecx
  jle      SHORT G_M41791_IG61

- dec      eax
+ add      eax, -1
  jne      SHORT G_M51025_IG12

- dec      ecx
- test     ecx, ecx
+ add      ecx, -1
  jle      SHORT G_M59796_IG05

As seen above this sometimes results in a regression because it prevents ADD(x, +/-1) from being transformed into INC/DEC. This is because INC/DEC instructions do not set the CF flag that is required by many relops other than EQ/NE. This is ultimately a compromise between sometimes wasting a code byte versus having more complex means to communicate to codegen what condition flags are actually needed.

mikedn · 2017-10-01T15:56:47Z

And it would be great to squash the remaining commits

I squashed the original changes down to 3 commits and added another one for JCMP. They're pretty much independent and I don't think squashing to a single commit is helpful.

mikedn · 2017-10-01T16:02:01Z

I wonder if adding more uses of isNoRegCompare is a good idea. The only other use of this flag is in ContainCheckJTrue and there we can replace it with something like:

Unfortunately it's not that simple, SETCC/JCC do not currently support floating point conditions. We'll see, I'll have to run more throughput tests to see if such an approach is feasible. For now I changed the lowering of SIMD<OpEquality|OpInEquality> so that it always sets the condition flags and never produces a value. The 0/1 value, if needed, is produced via a SETCC.

mikedn · 2017-10-01T16:16:29Z

@CarolEidt This will conflict with your own changes. Would you prefer to rebase this on top of your ElimLsraInfo branch to avoid conflicts?

CarolEidt · 2017-10-02T04:02:35Z

This will conflict with your own changes. Would you prefer to rebase this on top of your ElimLsraInfo branch to avoid conflicts?

No; after getting to zero diffs with eliminating gtLsraInfo, I found that compile time had actually increased, due to accessing the map twice per node. So, I'm reworking and I expect that it will take some time. I'll review this again tomorrow, and consider your thoughts on NoRegCompare. But I wouldn't hold off for my changes at this point.

mikedn · 2017-10-02T04:18:57Z

src/jit/lsraxarch.cpp

                info->internalFloatCount = 1;
                info->setInternalCandidates(this, allSIMDRegs());
            }
-            if (info->isNoRegCompare)
+            info->dstCount = 0;
+            // Codegen of SIMD (in)Equality uses target integer reg only for setting flags.


This comment needs updating, it still mentions the target register even though it is never used.

mikedn · 2017-10-02T04:31:31Z

I found that compile time had actually increased, due to accessing the map twice per node

Hmm, that's unfortunate. But I see that TreeNodeInfo still exists, wasn't the initial idea to completely remove it? I was hoping that instead of setting things like dstCount TreeNodeInfoInit* functions would simply build the necessary ref positions to avoid doing intermediary setup work.

Unlike many other relop transforms we do this one is only triggerred by the presence of a conditional branch (JTRUE) so it makes more sense to do it when lowering JTRUE nodes, avoids unnecessary calls to TryGetUse.

CarolEidt · 2017-10-02T14:45:54Z

I was hoping that instead of setting things like dstCount TreeNodeInfoInit* functions would simply build the necessary ref positions to avoid doing intermediary setup work.

Yes, I am hoping to do that eventually (i.e. #7257, the next step after this), but I was hoping to make a smaller increment by breaking into two issues (#7255 then #7257) (and, obviously, without actually making things worse in the meantime).

It will still be the case that we'll have to find the use information (TreeNodeInfo now, Def RefPositions/Intervals later) in the map when building the use RefPositions. So I think the work I'm doing now will be directly leveragable to the next step.

sdmaclea · 2017-10-02T17:00:13Z

test Windows_NT arm64 Cross Checked Build and Test

CarolEidt

LGTM with one suggested comment update

CarolEidt · 2017-10-03T15:36:20Z

src/jit/lower.cpp

@@ -2156,8 +2146,11 @@ GenTree* Lowering::LowerTailCallViaHelper(GenTreeCall* call, GenTree* callTarget
 //    - Transform cmp(and(x, y), 0) into test(x, y) (XARCH/Arm64 specific but could
 //      be used for ARM as well if support for GT_TEST_EQ/GT_TEST_NE is added).
 //    - Transform TEST(x, LSH(1, y)) into BT(x, y) (XARCH specific)
+//    - Transform RELOP(OP, 0) into SETCC(OP) or JCC(OP) if OP can set the
+//      condition flags appropriately (XARCH/ARM64 specific but could be extended


I believe this is now handling ARM64, so this comment should be updated.

Hmm, it already says that ARM64 is handled, only ARM32 left to do.

Right; sorry for the confusion.

CarolEidt · 2017-10-03T15:37:12Z

src/jit/lower.cpp

+            else // The relop is not used by a JTRUE or it is not used at all.
+            {
+                // Transform the relop node it into a SETCC. If it's not used we could remove
+                // it completely but that means doing more work to handle a rare case.


If/when we support some sort of limited or general data flow on the flags, this would be something we would expect liveness to do, as it is generally responsible for eliminating dead definitions after Lowering

Incidentally, the lack of such data flow analysis prevents us from lowering all JTRUEs to JCCs. If we try this then we'll end up with cases where JCCs are removed by liveness but the associated CMPs are not.

CarolEidt · 2017-10-03T16:58:07Z

@sdmaclea @jashook - do you have any clues about the arm64 failure? It doesn't appear to be JIT-related.

sdmaclea · 2017-10-03T17:02:43Z

Failure was an orthogonal issue fixed in tip.

test Windows_NT arm64 Cross Checked Build and Test

CarolEidt · 2017-10-03T17:03:04Z

@mikedn - I think this is nearly ready to merge. One question - the changes to the handling of SIMD won't really show up as diffs if anything changed in that codegen (it doesn't look like it should, but ...) because jit-diff uses crossgen. Have you looked at any of the JIT tests, e.g. JIT/SIMD/VectorRelOp.cs, to see whether the codegen is the same?

mikedn · 2017-10-03T17:09:36Z

Have you looked at any of the JIT tests, e.g. JIT/SIMD/VectorRelOp.cs, to see whether the codegen is the same?

Hmm, I ran the first 2 commits through jit-diff --tests so SIMD differences should have appeared if present. AFAIK crossgen does handle SIMD, the only difference from normal jitting is that it doesn't do AVX, only SSE, right?

In any case, I did manually check (using corerun) that the generated code looks as expected.

CarolEidt · 2017-10-03T17:12:58Z

crossgen does handle SIMD, the only difference from normal jitting is that it doesn't do AVX, only SSE, right?

No, when you crossgen it skips anything with Vector<T> since the size can't be determined until runtime. But if you did manual checks, that's fine. I'll try to run desktop diffs (since they run in JIT mode, not crossgen), but I wouldn't wait on that.

mikedn · 2017-10-03T17:18:39Z

No, when you crossgen it skips anything with Vector since the size can't be determined until runtime.

Ah, of course. I was confusing this with the more general case of using SSE or AVX instructions.

mikedn · 2017-10-03T17:23:42Z

Quick example:

[MethodImpl(MethodImplOptions.NoInlining)]
static bool Test(Vector<int> x, Vector<int> y) => x != y;

G_M1752_IG02:
       C4E17D1001           vmovupd  ymm0, ymmword ptr[rcx]
       C4E17D100A           vmovupd  ymm1, ymmword ptr[rdx]
       C4E17C28D0           vmovaps  ymm2, ymm0
       C4E16D76D1           vpcmpeqd ymm2, ymm1
       C4E17DD7C2           vpmovmskb eax, ymm2
       83F8FF               cmp      eax, -1
       0F95C0               setne    al
       0FB6C0               movzx    rax, al
       0FB6C0               movzx    rax, al

Even the redundant movzx is still there. I was tempted to get rid of it but it's really a different issue.

dnfclas added the cla-already-signed label Sep 17, 2017

mikedn force-pushed the simd-eq-opt branch 2 times, most recently from 018cc67 to bbc477f Compare September 18, 2017 05:24

This was referenced Sep 20, 2017

[Arm64] Use GTF_SET_FLAGS/GTF_USE_FLAGS #14041

Merged

[Arm64] Implement JCC/SETCC nodes #14101

Merged

mikedn force-pushed the simd-eq-opt branch 2 times, most recently from 523e66d to 911f621 Compare September 23, 2017 19:53

CarolEidt reviewed Sep 26, 2017

View reviewed changes

CarolEidt suggested changes Sep 27, 2017

View reviewed changes

mikedn commented Sep 29, 2017

View reviewed changes

mikedn force-pushed the simd-eq-opt branch 2 times, most recently from e793359 to a6371be Compare October 1, 2017 09:06

mikedn changed the title ~~[WIP] Simplify SIMD EQ/NE optimization~~ [WIP] Simplify some relop/jtrue related optimizations Oct 1, 2017

mikedn commented Oct 2, 2017

View reviewed changes

mikedn force-pushed the simd-eq-opt branch from a22f15e to f5c1ced Compare October 2, 2017 05:07

mikedn added 4 commits October 2, 2017 08:30

Simplify SIMD EQ/NE optimization

0b637a6

Reimplement compare flags reuse using SETCC/JCC

f1f8cd2

Extend flag reuse optimization to all relops

848fd74

Move JCMP transform to LowerJTrue

5402ec5

Unlike many other relop transforms we do this one is only triggerred by the presence of a conditional branch (JTRUE) so it makes more sense to do it when lowering JTRUE nodes, avoids unnecessary calls to TryGetUse.

mikedn force-pushed the simd-eq-opt branch from f5c1ced to 5402ec5 Compare October 2, 2017 05:30

CarolEidt approved these changes Oct 3, 2017

View reviewed changes

mikedn changed the title ~~[WIP] Simplify some relop/jtrue related optimizations~~ Simplify some relop/jtrue related optimizations Oct 3, 2017

CarolEidt merged commit a27c269 into dotnet:master Oct 3, 2017

mikedn mentioned this pull request Oct 4, 2017

Fix condition flags reuse optimization #14323

Merged

mikedn deleted the simd-eq-opt branch December 16, 2017 09:16

mikedn mentioned this pull request Jan 26, 2019

Lower SSE compare scalar and test nodes #22043

Merged

mikedn mentioned this pull request Jan 31, 2020

[RyuJIT] The "reuse flags" optimization handles compares incorrectly dotnet/runtime#9059

Closed

anthonycanino mentioned this pull request Oct 14, 2021

Inefficient codegen for certain op and comparison dotnet/runtime#11414

Closed

anthonycanino mentioned this pull request Nov 3, 2021

XARCH: Remove redudant tests for GT_LT/GT_GE relops. dotnet/runtime#61152

Merged

	if ((targetReg == REG_NA) && tree->OperIs(GT_EQ, GT_NE))
	{
	// Is it a SIMD (in)Equality that doesn't need to materialize result into a register?
	if ((op1->gtRegNum == REG_NA) && op1->IsSIMDEqualityOrInequality())
	{

	TreeNodeInfoInit(node);

	// If the node produces an unused value, mark it as a local def-use
	if (node->IsValue() && node->IsUnusedValue())
	{
	node->gtLsraInfo.isLocalDefUse = true;
	node->gtLsraInfo.dstCount = 0;
	}

	if (tree->IsUnusedValue() && (info->dstCount != 0))
	{
	info->isLocalDefUse = true;
	}
	// We need to be sure that we've set info->srcCount and info->dstCount appropriately
	assert((info->dstCount < 2) \|\| (tree->IsMultiRegCall() && info->dstCount == MAX_RET_REG_COUNT));
	}

Simplify some relop/jtrue related optimizations #14027

Simplify some relop/jtrue related optimizations #14027

Conversation

mikedn commented Sep 17, 2017 • edited Loading

mikedn commented Sep 18, 2017

pgavlin commented Sep 19, 2017

mikedn commented Sep 19, 2017

mikedn commented Sep 20, 2017

mikedn commented Sep 20, 2017

mikedn commented Sep 20, 2017

mikedn commented Sep 20, 2017

mikedn commented Sep 25, 2017 • edited Loading

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt left a comment

Choose a reason for hiding this comment

mikedn commented Sep 27, 2017

Choose a reason for hiding this comment

mikedn commented Sep 29, 2017

mikedn commented Oct 1, 2017

mikedn commented Oct 1, 2017

mikedn commented Oct 1, 2017

mikedn commented Oct 1, 2017

mikedn commented Oct 1, 2017

CarolEidt commented Oct 2, 2017

Choose a reason for hiding this comment

mikedn commented Oct 2, 2017

CarolEidt commented Oct 2, 2017

sdmaclea commented Oct 2, 2017

CarolEidt left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

CarolEidt commented Oct 3, 2017

sdmaclea commented Oct 3, 2017

CarolEidt commented Oct 3, 2017

mikedn commented Oct 3, 2017

CarolEidt commented Oct 3, 2017

mikedn commented Oct 3, 2017

mikedn commented Oct 3, 2017

mikedn commented Sep 17, 2017 •

edited

Loading

mikedn commented Sep 25, 2017 •

edited

Loading