Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

[Arm64] Use GTF_SET_FLAGS/GTF_USE_FLAGS #14041

Merged
merged 3 commits into from
Sep 25, 2017

Conversation

sdmaclea
Copy link

Uses same approach as XARCH to eliminate easiest to identify redundant compares

Includes #13941 changes to resolve merge conflict.

@sdmaclea
Copy link
Author

@dotnet/arm64-contrib @dotnet/jit-contrib @mikedn FYI

assert(genTypeSize(op1Type) == genTypeSize(op2Type));

if (varTypeIsFloating(op1Type))
if ((tree->gtFlags & GTF_USE_FLAGS) == 0)
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is consistent with use of flags for XARCH.

However, IMO, this flag is not really necessary and should be inverted and called GTF_SET_FLAGS

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's really neither, flags are set by the ADD/SUB/AND instruction and used by the conditional branch instruction. This is the result of this solution being a hack :)

Copy link
Author

@sdmaclea sdmaclea Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, but when they are not set by the ADD/SUB/AND instruction, they are set by the compare and used by the conditional branch instruction

So using GTF_SET_FLAGS would be consistent when the compare should be setting flags.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm? We're talking about the compare instruction, an instruction which is not actually emitted in this case. So it can't set nor use the flags.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which is why it should logically be
if ((tree->gtFlags & GTF_SET_FLAGS) != 0) (implied by inverted above). If we are not setting flags it makes no sense to emit a compare instruction.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably should be if ((op1-gtFlags & GTF_SET_FLAGS) == 0). That is, if op1 doesn't set flags we need to emit a compare instruction.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We agreed to leave this consistent with XARCH

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right approach. However, since it's rather unexpected to consume operands and generate no code, it might be good to add a comment explaining what's going on and/or refer to the similar code in codegenxarch.cpp

{
assert(varTypeIsFloating(op2Type));
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rest of changes in the file are whitespace

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably should be if ((op1-gtFlags & GTF_SET_FLAGS) == 0). That is, if op1 doesn't set flags we need to emit a compare instruction.

But isn't the point of the code in Lowering to set GTF_USE_FLAGS on the compare when it can simply use the flags set by the operand?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK

@@ -1529,7 +1529,7 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits
#define FEATURE_MULTIREG_STRUCT_PROMOTE 1 // True when we want to promote fields of a multireg struct into registers
#define FEATURE_FASTTAILCALL 1 // Tail calls made as epilog+jmp
#define FEATURE_TAILCALL_OPT 1 // opportunistic Tail calls (i.e. without ".tail" prefix) made as fast tail calls.
#define FEATURE_SET_FLAGS 1 // Set to true to force the JIT to mark the trees with GTF_SET_FLAGS when the flags need to be set
#define FEATURE_SET_FLAGS 0 // Set to true to force the JIT to mark the trees with GTF_SET_FLAGS when the flags need to be set
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We are using XARCH solution not ARM solution

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this comment just adds to the confusion. AFAICT, this behavior is, in fact, required, but in the !LEGACY_BACKEND && !FEATURE_SET_FLAGS path vs. the LEGACY_BACKEND && FEATURE_SET_FLAGS path.
Am I missing something here?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

FEATURE_SET_FLAGS bring is a lot of code which is designed and tested with Legacy JIT and ARM32. Since XARCH has this feature disabled but has functionally equivalent code working with RyuJIT, I took the approach to follow XARCH/RyuJIT pattern.

Copy link

@briansull briansull Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we should rename FEATURE_SET_FLAGS to LEGACY_SET_FLAGS

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want it this change or a separate change.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely a separate change, or just open an issue for someone else to do

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

#14175

// GenTree node we need to set the info whether its codegen
// will modify flags.
if (op2->IsIntegralConst(0) && (op1->gtNext == op2) && (op2->gtNext == cmp) &&
cmp->OperIs(GT_EQ, GT_NE, GT_GT, GT_GE, GT_LT, GT_LE) && op1->OperIs(GT_ADD, GT_AND, GT_SUB))
Copy link
Author

@sdmaclea sdmaclea Sep 18, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ARM64 can only set flags on adc, add, and, bic, sub, sbc (and their aliases cmp, cmn, tst ...).

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this is the only difference between the ARM64 version and the XARCH version. It really should be in LowerCompare, with or without my refactoring.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR? Or a subsequent PR?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the very restrictive set of ops which can set flags, it may be easier to relax the
&& (op1->gtNext == op2) && (op2->gtNext == cmp) && restriction.

Also cbz, cbnz, tbz, tbnz will also change things.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess my preference would be to refactor a bit later. (Next week?)

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given the very restrictive set of ops which can set flags, it may be easier to relax the
&& (op1->gtNext == op2) && (op2->gtNext == cmp) && restriction.

Right, unlike on xarch few instruction set flags and only when the S version is used. On xarch we need to be more careful because most typical instructions do set the flags.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This PR? Or a subsequent PR?

IMO subsequent. Let's see what happens with my proposed refactoring

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The handling of flags is pretty confusing; it's unclear to me that these flags are used consistently. These changes seem fine to me, but I'm quite uncomfortable with the (lack of) documentation on these flags plus FEATURE_SET_FLAGS. @briansull @BruceForstall could one of you take a look?

@@ -8296,7 +8296,7 @@ bool GenTree::gtSetFlags() const
#endif

#else // !LEGACY_BACKEND
#ifdef _TARGET_XARCH_
#if defined(_TARGET_XARCH_) || defined(_TARGET_ARM64_)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comments in this method need a serious overhaul, including a better explanation of why the code for FEATURE_SET_FLAGS and !FEATURE_SET_FLAGS && ~LEGACY_BACKEND is the same.
However, I suppose that's a fix for another PR.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That method should be a giant #if LEGACY_BACKEND/#else/#endif where #else contains just (((gtFlags & GTF_SET_FLAGS) != 0) && (gtOper != GT_IND)). For some reason it keeps getting patched up and it every time it becomes more convoluted.

@@ -1529,7 +1529,7 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits
#define FEATURE_MULTIREG_STRUCT_PROMOTE 1 // True when we want to promote fields of a multireg struct into registers
#define FEATURE_FASTTAILCALL 1 // Tail calls made as epilog+jmp
#define FEATURE_TAILCALL_OPT 1 // opportunistic Tail calls (i.e. without ".tail" prefix) made as fast tail calls.
#define FEATURE_SET_FLAGS 1 // Set to true to force the JIT to mark the trees with GTF_SET_FLAGS when the flags need to be set
#define FEATURE_SET_FLAGS 0 // Set to true to force the JIT to mark the trees with GTF_SET_FLAGS when the flags need to be set

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To me, this comment just adds to the confusion. AFAICT, this behavior is, in fact, required, but in the !LEGACY_BACKEND && !FEATURE_SET_FLAGS path vs. the LEGACY_BACKEND && FEATURE_SET_FLAGS path.
Am I missing something here?

{
assert(varTypeIsFloating(op2Type));

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It probably should be if ((op1-gtFlags & GTF_SET_FLAGS) == 0). That is, if op1 doesn't set flags we need to emit a compare instruction.

But isn't the point of the code in Lowering to set GTF_USE_FLAGS on the compare when it can simply use the flags set by the operand?

@mikedn
Copy link

mikedn commented Sep 18, 2017

@CarolEidt For some reason your comments are posted in the wrong place (wrong line number) and cannot be replied to.

But isn't the point of the code in Lowering to set GTF_USE_FLAGS on the compare when it can simply use the flags set by the operand?

@pgavlin should comment on this as he introduced GTF_USE_FLAGS in #12892. My understanding is that they're supposed to indicate nodes that defined and use the flags register for liveness purposes. Setting GTF_USE_FLAGS on the compare is dubious considering that no compare instruction is actually emitted.

@CarolEidt
Copy link

@CarolEidt For some reason your comments are posted in the wrong place (wrong line number) and cannot be replied to.

Sorry; I was having similar issues with responding to @sdmaclea's comments.

My understanding is that they're supposed to indicate nodes that defined and use the flags register for liveness purposes. Setting GTF_USE_FLAGS on the compare is dubious considering that no compare instruction is actually emitted.

If you look at this code: https://github.com/dotnet/coreclr/blob/master/src/jit/codegenxarch.cpp#L6256, it is checking the value of this flag, and then, if it is set, it doesn't generate anything (just consumes the registers). This is also a case where GTF_USE_FLAGS is set on the compare, but no compare instruction is emitted.

@mikedn
Copy link

mikedn commented Sep 18, 2017

If you look at this code: https://github.com/dotnet/coreclr/blob/master/src/jit/codegenxarch.cpp#L6256, it is checking the value of this flag, and then, if it is set, it doesn't generate anything (just consumes the registers). This is also a case where GTF_USE_FLAGS is set on the compare, but no compare instruction is emitted.

Yes, that's what I'm talking about. GTF_USE_FLAGS has overloaded meaning. In some cases it indicates a node that uses the flags register (usually JCC/SETCC nodes). In this case it just indicates that a cmp instruction should not be generated.

@sdmaclea sdmaclea changed the title [Arm64] Use GTF_SET_FLAGS/GTF_USE_FLAGS WIP [Arm64] Use GTF_SET_FLAGS/GTF_USE_FLAGS Sep 19, 2017
@sdmaclea
Copy link
Author

GTF_USE_FLAGS has overloaded meaning.

I think the implementation is just incomplete and therefore inconsistent.

Set GTF_USE_FLAGS if and only if a node consumes Flags register
Set GTF_SET_FLAGS if and only if a node produces Flags register

When we LowerCompare we are not setting GTF_SET_FLAGS on compare nodes which will write the Flag register.

When we lower JTRUE (in ContainCheckJTrue), and set cmp->gtLsraInfo.isNoRegCompare, we fail to clear the GTF_USE_FLAGS

Let me rework the Arm64 patch to be what I think would be consistent use of the flags.

I'll also add support for cbz, cbnz since that will make the discussion more interesting.

@mikedn
Copy link

mikedn commented Sep 19, 2017

Let me rework the Arm64 patch to be what I think would be consistent use of the flags.

I'm not sure if you can do that without also changing the xarch code.

I'll also add support for cbz, cbnz since that will make the discussion more interesting.

tbz might be more interesting :)

@sdmaclea
Copy link
Author

I added cbz, cbnz, tbz, tbnz lowering.

I made the flag usage more consistent with meaning.

I think the only controversial things would be

  • Leaving the LIR node structure as is while using the GTF_SET_FLAGS and GTF_USE_FLAGS to guide code generation.
  • Allowing genCodeForCompare to call genConsumeOperands() while the registers are being consumed by the JTRUE node

There is one test DevDiv_278372 failing when I run on my local CI. I just got it to reproduce with COMPlus_JitStressRegs=0x200. I will look at it tonight/tommorow.

PTAL.

@sdmaclea
Copy link
Author

I will fix formatting

@mikedn
Copy link

mikedn commented Sep 20, 2017

I made the flag usage more consistent with meaning.

It seems to me that this just adds yet another meaning. We're going from

  • If GT_SET_FLAGS is set then this instruction must set the flags (on XARCH this prevents codegen from changing ADD to LEA) and must not be removed (by liveness)

to

  • If GT_SET_FLAGS is NOT set on a compare node then a compare instruction must not be generated.

Not to say that this is bad.

It's just that it's still a bit convoluted and potentially problematic. For example, even if a compare instruction must be generated genConsumeOperands must still be called because LSRA doesn't know that the instruction won't be generated. The "normal" hint to LSRA that an instruction won't be generated is containment, not these flags.

@sdmaclea
Copy link
Author

sdmaclea commented Sep 20, 2017

@CarolEidt @briansull @mikedn @pgavlin Please review.

Key areas.

  • Addition of new emit code for cbz, cbnz, tbz, tbnz
  • New proposed semantics of GTF_SET_FLAGS & GTF_USE_FLAGS
  • Lowering of JTRUE
  • genCodeForJumpTrue with special attention to usage of the cmp registers
  1. What is the right way for genCodeForJumpTrue to be able to use the cmp input operands?
  2. Should Lowering::ContainCheckJTrue call MakeSrcContained(node, cmp);? It looks like this would be a new type of containment

@CarolEidt
Copy link

I am not in favor of the proposed change to the flags values. While I don't think any of this is adequately documented, I don't think there's a strong case to change the existing usage (i.e. in codegenxarch.cpp, CodeGen::genCompareInt()) where GTF_USE_FLAGS implies that, instead of generating a compare, it should use the flags set by a previous instruction.

@mikedn said:

GTF_USE_FLAGS has overloaded meaning. In some cases it indicates a node that uses the flags register (usually JCC/SETCC nodes). In this case it just indicates that a cmp instruction should not be generated.

I don't see that this is inconsistent - just poorly documented. The cmp doesn't need to be generated because it is "using" the flags register as it was previously set (rather than setting it itself). I agree that it is rather weird that the compare codegen just consumes the register, but it doesn't seem as potentially problematic as the alternative.

@sdmaclea
Copy link
Author

@CarolEidt

I would have expected the logical meaning of the two flags to be

  • GTF_SET_FLAGS node produces the Flags register
  • GTF_USE_FLAGS node consumes the Flags register

The relop were not setting the GTF_SET_FLAGS, because the flag was implicit for this class of nodes.
JTrue was not setting GTF_USE_FLAGS, because the flag was implicit for this class of nodes. Or potentially the containment of the relop was implicit on this type of node.

So my proposed change was trying to change the implicit flags to explicit flags. Effectively completing what appeared to be an incomplete implementation.

I guess you and @mikedn are suggesting the meaning of the flags should be

  • GTF_SET_FLAGS non-compare node must produce the Flags register
  • GTF_USE_FLAGS compare node consumes the Flags register instead of producing the flags register

So how do we handle the fused compare and branch instructions cbz et al.?

  • Add GTF_NO_SET_FLAGS meaning compare node not required to set the Flags register?
  • Explicitly contain the Relop?

@mikedn
Copy link

mikedn commented Sep 20, 2017

GTF_SET_FLAGS non-compare node must produce the Flags register

Pretty much yes. I'd add that GTF_SET_FLAGS should be set on all nodes that are supposed to set the flags, including relops and GT_CMP. Though I don't think we set it now on relops.

GTF_USE_FLAGS compare node consumes the Flags register instead of producing the flags register

That is the current and only meaning. It turns out that this flag is only checked by codegenxarch's genCompareInt. However, it is set on other nodes (e.g. GT_JCC) to indicate that the node uses the flags register, not to indicate that an instruction should not be generated for that node. That's why I say that it is inconsistent.

So how do we handle the fused compare and branch instructions cbz et al.?

Containment might make more sense than this GTF acrobatics but I don't know if it's currently possible. Usually only constants and indirs are contained. Personally I'd prefer adding GT_CBZ & co. but adding 4 new GT opers might be hard to swallow.

@CarolEidt
Copy link

Containment might make more sense than this GTF acrobatics but I don't know if it's currently possible. Usually only constants and indirs are contained.

Actually LEAs are also contained. It seems like the right answer is one of:

  • Allow compares to be contained, when the compare and branch are generated together without modifying condition flags.
    OR
  • Add new opcodes (GT_CBZ et al).

In either case, neither the SET_FLAGS nor USE_FLAGS would be set.

Probably the containment option would be the cleanest and most efficient (it would not require modifying the IR as much), but there are probably dragons there ...

Other thoughts on this?

@sdmaclea
Copy link
Author

sdmaclea commented Sep 20, 2017

Other thoughts on this?

I like the containment approach. I will try to fix the containment dragons

I think there should be a separate effort to clean up these Flags. I will open an issue. #14093

@mikedn
Copy link

mikedn commented Sep 20, 2017

Add new opcodes (GT_CBZ et al).

If we do that we might get away with only one, say, GT_CB. And a new GenTree class that is an unary op (well, binary op really, AFAIK we can't derive from GenTreeUnOp) and has 2 new data members:

char kind; // Z, NZ, TZ, TNZ
char testBit; // the bit index used by TZ and TNZ

@sdmaclea
Copy link
Author

I have drafted the contain change. It is mostly passing, but there are two new crossgen asserts. In addition to the earlier mention DevDiv fail.

The one new failure seems to be related to liveness and the other related to calling genConsumeRegs() on Op1 of the contained Relop. I am probably missing something required in Lsra when a node is contained.

@sdmaclea
Copy link
Author

@miedn @CarolEidt

Adding a new node is more consistent with #14027. Also Note : Arm64 does not yet support GT_SETCC or GT_JCC

Looking at 4351a27 & 7a4b14e it does not look too difficult to introduce a new node.

say, GT_CB

CB is a little too cryptic. I would prefer GT_CMP_TEST_J, but I am open.

I'll try drafting this change.

@sdmaclea sdmaclea changed the title WIP [Arm64] Use GTF_SET_FLAGS/GTF_USE_FLAGS [Arm64] Use GTF_SET_FLAGS/GTF_USE_FLAGS Sep 22, 2017
@sdmaclea
Copy link
Author

OK since we agreed to leave the flags as is, I reverted the flag changes and the addition of the cbz, cbnz, tbz, and tbnz. I think these are largely orthogonal and can be done in a separate PR.

@CarolEidt Please take a look. This is effectively the original patch with revision of gentree code per discussion.

FEATURE_SET_FLAGS is not set because it is a LEGACY feature which enables early phase compiler changes. The RyuJIT/XARCH approach of deferring this to target lowering seems right.

if (op2->IsIntegralConst(0))
assert(genTypeSize(op1Type) == genTypeSize(op2Type));

if (varTypeIsFloating(op1Type))
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this optimization ever apply to floating point compares? Should the GTF_USE_FLAGS check be done only for integer compares perhaps?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not yet. I just glanced through the armv8 reference manual. It looks like there might eventually be cases we could support. Unless there is an objection, I am happy with it here as it is gated by the setting of the flag.

@sdmaclea
Copy link
Author

@dotnet-bot test Windows_NT arm64 Cross Checked Build and Test

@sdmaclea
Copy link
Author

test Windows_NT arm Cross Checked Build and Test
test Windows_NT armlb Cross Checked Build and Test

@sdmaclea
Copy link
Author

@dotnet/jit-contrib All tests have passed. Including Windows Arm64. This can be reviewed/merged.

@CarolEidt wanted someone else to look at this. @pgavlin was suggested. PTAL

@sdmaclea
Copy link
Author

Sorry it was

@briansull @BruceForstall could one of you take a look?

Copy link

@CarolEidt CarolEidt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, and thanks for doing the cleanup in gentree.cpp.
I still hope that @BruceForstall or @briansull can have a look.

assert(genTypeSize(op1Type) == genTypeSize(op2Type));

if (varTypeIsFloating(op1Type))
if ((tree->gtFlags & GTF_USE_FLAGS) == 0)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is the right approach. However, since it's rather unexpected to consume operands and generate no code, it might be good to add a comment explaining what's going on and/or refer to the similar code in codegenxarch.cpp


#ifdef LEGACY_BACKEND
#ifdef _TARGET_XARCH_
#if defined(LEGACY_BACKEND) && !FEATURE_SET_FLAGS && defined(_TARGET_XARCH_)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for cleaning up this code. We should probably have an issue to better document existing usage and/or determine what the ideal usage would be and what should change. But for now, I'm very happy with this improvement :-)

Copy link

@briansull briansull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approve with comments

ins = INS_ands;
break;
default:
assert(!"Unexpected BinaryOp with GTF_SET_FLAGS set");

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't this be a noway_assert instead?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@briansull I would have to defer to your judgment. I assume noway_assert exists in release mode too.

Copy link

@briansull briansull Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, these are "release" asserts, where as a vanilla assert is a checked only assert.
If you hit a noway_assert, when running a release build :

  1. We can get telemetry indicating that it was hit
  2. And the VM layer catches the noway_assert,and will retry the JIT compilation once with optimization disabled. (Often the assert won't get hit the on the second attempt to JIT the method)


if (op2->IsIntegralConst(0))
{
emit->emitIns_R_F(INS_fcmp, cmpSize, op1->gtRegNum, 0.0);

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we also assert that op2->isContained() here.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can add it, but this section of changed code was previously only whitespace.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This code is "containing" the const. If lower is out of sync it could reserve an extra register for the constant and possibly lead to sub-optimal register allocation when the codegen phase doesn't use the "extra" register.

// GenTree node we need to set the info whether its codegen
// will modify flags.
if (op2->IsIntegralConst(0) && (op1->gtNext == op2) && (op2->gtNext == cmp) &&
cmp->OperIs(GT_EQ, GT_NE, GT_GT, GT_GE, GT_LT, GT_LE) && op1->OperIs(GT_ADD, GT_AND, GT_SUB))

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of cmp->OperIs(GT_EQ, GT_NE, GT_GT, GT_GE, GT_LT, GT_LE) we typically use cmp->OperIsCompare() , I note that it now also contains TEST_EQ and TEST_NE

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of large OperIs(...) calls:

    template <typename... T>
    bool OperIs(genTreeOps oper, T... rest)
    {
        return OperIs(oper) || OperIs(rest...);
    }

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK. I was hoping it was more efficient than that.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You should check how the TEST_EQ and TEST_NE cases shodul be handled here.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not a big fan of large OperIs(...) calls:

When I introduced OperIs I verified that C++ compilers handle it as expected. For example, this particular use ends up as sub/cmp/ja (which happens to be more efficient than OperIsCompare but that's another story).

Copy link

@briansull briansull Sep 25, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For a call with 6 arguments it would still be expanded into 6 consecutive compare|subtract and branch sequences right?

What was/is the issue with OperIsCompare ?
It should be one memory fetch and a bit test right?

static bool OperIsCompare(genTreeOps gtOper)
{
    return (OperKind(gtOper) & GTK_RELOP) != 0;
}
static unsigned OperKind(unsigned gtOper)
{
    assert(gtOper < GT_COUNT);

    return gtOperKindTable[gtOper];
}

So for a small number (3 or less) OperIs would be OK, but not for 5 or more...

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would still be expanded into 6 consecutive compare|subtract and branch sequences right?

Nope. GT_EQ & co. form a continuous range so it's something like if ((x - GT_EQ) < 6).

What was/is the issue with OperIsCompare ?

Yes, one memory fetch. Even if the memory is in cache it's still 4-5 cycles, that's already more than the 2 cycles (sub+cmp) for the OperIs version. And the bit test adds another cycle.

@sdmaclea
Copy link
Author

@mikedn @CarolEidt @briansull Revised per comments. Restructured the changes in genCodeForCompare to match xarch genCompareInt.

@sdmaclea
Copy link
Author

@dotnet-bot test Windows_NT arm64 Cross Checked Build and Test

@sdmaclea
Copy link
Author

@CarolEidt @briansull All tests have passed including Windows Arm64 Checked. I think this is ready to be merged

@briansull briansull merged commit 40f63d9 into dotnet:master Sep 25, 2017
@sdmaclea sdmaclea deleted the PR-ARM64-SET-FLAGS branch December 4, 2017 16:34
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants