-
Notifications
You must be signed in to change notification settings - Fork 2.7k
[Arm64] Use GTF_SET_FLAGS/GTF_USE_FLAGS #14041
Conversation
@dotnet/arm64-contrib @dotnet/jit-contrib @mikedn FYI |
src/jit/codegenarm64.cpp
Outdated
assert(genTypeSize(op1Type) == genTypeSize(op2Type)); | ||
|
||
if (varTypeIsFloating(op1Type)) | ||
if ((tree->gtFlags & GTF_USE_FLAGS) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is consistent with use of flags for XARCH.
However, IMO, this flag is not really necessary and should be inverted and called GTF_SET_FLAGS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's really neither, flags are set by the ADD/SUB/AND instruction and used by the conditional branch instruction. This is the result of this solution being a hack :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but when they are not set by the ADD/SUB/AND instruction, they are set by the compare and used by the conditional branch instruction
So using GTF_SET_FLAGS
would be consistent when the compare should be setting flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm? We're talking about the compare instruction, an instruction which is not actually emitted in this case. So it can't set nor use the flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Which is why it should logically be
if ((tree->gtFlags & GTF_SET_FLAGS) != 0)
(implied by inverted above). If we are not setting flags it makes no sense to emit a compare instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably should be if ((op1-gtFlags & GTF_SET_FLAGS) == 0)
. That is, if op1 doesn't set flags we need to emit a compare instruction.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We agreed to leave this consistent with XARCH
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right approach. However, since it's rather unexpected to consume operands and generate no code, it might be good to add a comment explaining what's going on and/or refer to the similar code in codegenxarch.cpp
src/jit/codegenarm64.cpp
Outdated
{ | ||
assert(varTypeIsFloating(op2Type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rest of changes in the file are whitespace
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably should be if ((op1-gtFlags & GTF_SET_FLAGS) == 0). That is, if op1 doesn't set flags we need to emit a compare instruction.
But isn't the point of the code in Lowering
to set GTF_USE_FLAGS
on the compare when it can simply use the flags set by the operand?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK
@@ -1529,7 +1529,7 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits | |||
#define FEATURE_MULTIREG_STRUCT_PROMOTE 1 // True when we want to promote fields of a multireg struct into registers | |||
#define FEATURE_FASTTAILCALL 1 // Tail calls made as epilog+jmp | |||
#define FEATURE_TAILCALL_OPT 1 // opportunistic Tail calls (i.e. without ".tail" prefix) made as fast tail calls. | |||
#define FEATURE_SET_FLAGS 1 // Set to true to force the JIT to mark the trees with GTF_SET_FLAGS when the flags need to be set | |||
#define FEATURE_SET_FLAGS 0 // Set to true to force the JIT to mark the trees with GTF_SET_FLAGS when the flags need to be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We are using XARCH solution not ARM solution
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this comment just adds to the confusion. AFAICT, this behavior is, in fact, required, but in the !LEGACY_BACKEND && !FEATURE_SET_FLAGS
path vs. the LEGACY_BACKEND && FEATURE_SET_FLAGS
path.
Am I missing something here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FEATURE_SET_FLAGS bring is a lot of code which is designed and tested with Legacy JIT and ARM32. Since XARCH has this feature disabled but has functionally equivalent code working with RyuJIT, I took the approach to follow XARCH/RyuJIT pattern.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe we should rename FEATURE_SET_FLAGS
to LEGACY_SET_FLAGS
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want it this change or a separate change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely a separate change, or just open an issue for someone else to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
#14175
src/jit/lowerarmarch.cpp
Outdated
// GenTree node we need to set the info whether its codegen | ||
// will modify flags. | ||
if (op2->IsIntegralConst(0) && (op1->gtNext == op2) && (op2->gtNext == cmp) && | ||
cmp->OperIs(GT_EQ, GT_NE, GT_GT, GT_GE, GT_LT, GT_LE) && op1->OperIs(GT_ADD, GT_AND, GT_SUB)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ARM64 can only set flags on adc
, add
, and
, bic
, sub
, sbc
(and their aliases cmp
, cmn
, tst
...).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So this is the only difference between the ARM64 version and the XARCH version. It really should be in LowerCompare
, with or without my refactoring.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR? Or a subsequent PR?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the very restrictive set of ops which can set flags, it may be easier to relax the
&& (op1->gtNext == op2) && (op2->gtNext == cmp) &&
restriction.
Also cbz
, cbnz
, tbz
, tbnz
will also change things.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess my preference would be to refactor a bit later. (Next week?)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Given the very restrictive set of ops which can set flags, it may be easier to relax the
&& (op1->gtNext == op2) && (op2->gtNext == cmp) && restriction.
Right, unlike on xarch few instruction set flags and only when the S version is used. On xarch we need to be more careful because most typical instructions do set the flags.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR? Or a subsequent PR?
IMO subsequent. Let's see what happens with my proposed refactoring
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The handling of flags is pretty confusing; it's unclear to me that these flags are used consistently. These changes seem fine to me, but I'm quite uncomfortable with the (lack of) documentation on these flags plus FEATURE_SET_FLAGS
. @briansull @BruceForstall could one of you take a look?
src/jit/gentree.cpp
Outdated
@@ -8296,7 +8296,7 @@ bool GenTree::gtSetFlags() const | |||
#endif | |||
|
|||
#else // !LEGACY_BACKEND | |||
#ifdef _TARGET_XARCH_ | |||
#if defined(_TARGET_XARCH_) || defined(_TARGET_ARM64_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comments in this method need a serious overhaul, including a better explanation of why the code for FEATURE_SET_FLAGS
and !FEATURE_SET_FLAGS && ~LEGACY_BACKEND
is the same.
However, I suppose that's a fix for another PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That method should be a giant #if LEGACY_BACKEND
/#else
/#endif
where #else
contains just (((gtFlags & GTF_SET_FLAGS) != 0) && (gtOper != GT_IND))
. For some reason it keeps getting patched up and it every time it becomes more convoluted.
@@ -1529,7 +1529,7 @@ typedef unsigned short regPairNoSmall; // arm: need 12 bits | |||
#define FEATURE_MULTIREG_STRUCT_PROMOTE 1 // True when we want to promote fields of a multireg struct into registers | |||
#define FEATURE_FASTTAILCALL 1 // Tail calls made as epilog+jmp | |||
#define FEATURE_TAILCALL_OPT 1 // opportunistic Tail calls (i.e. without ".tail" prefix) made as fast tail calls. | |||
#define FEATURE_SET_FLAGS 1 // Set to true to force the JIT to mark the trees with GTF_SET_FLAGS when the flags need to be set | |||
#define FEATURE_SET_FLAGS 0 // Set to true to force the JIT to mark the trees with GTF_SET_FLAGS when the flags need to be set |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To me, this comment just adds to the confusion. AFAICT, this behavior is, in fact, required, but in the !LEGACY_BACKEND && !FEATURE_SET_FLAGS
path vs. the LEGACY_BACKEND && FEATURE_SET_FLAGS
path.
Am I missing something here?
src/jit/codegenarm64.cpp
Outdated
{ | ||
assert(varTypeIsFloating(op2Type)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably should be if ((op1-gtFlags & GTF_SET_FLAGS) == 0). That is, if op1 doesn't set flags we need to emit a compare instruction.
But isn't the point of the code in Lowering
to set GTF_USE_FLAGS
on the compare when it can simply use the flags set by the operand?
@CarolEidt For some reason your comments are posted in the wrong place (wrong line number) and cannot be replied to.
@pgavlin should comment on this as he introduced GTF_USE_FLAGS in #12892. My understanding is that they're supposed to indicate nodes that defined and use the flags register for liveness purposes. Setting |
Sorry; I was having similar issues with responding to @sdmaclea's comments.
If you look at this code: https://github.com/dotnet/coreclr/blob/master/src/jit/codegenxarch.cpp#L6256, it is checking the value of this flag, and then, if it is set, it doesn't generate anything (just consumes the registers). This is also a case where |
Yes, that's what I'm talking about. |
I think the implementation is just incomplete and therefore inconsistent. Set When we LowerCompare we are not setting When we lower JTRUE (in Let me rework the Arm64 patch to be what I think would be consistent use of the flags. I'll also add support for |
I'm not sure if you can do that without also changing the xarch code.
tbz might be more interesting :) |
77af575
to
d13274a
Compare
I added I made the flag usage more consistent with meaning. I think the only controversial things would be
There is one test PTAL. |
I will fix formatting |
a139380
to
8f42b9f
Compare
It seems to me that this just adds yet another meaning. We're going from
to
Not to say that this is bad. It's just that it's still a bit convoluted and potentially problematic. For example, even if a compare instruction must be generated |
@CarolEidt @briansull @mikedn @pgavlin Please review. Key areas.
|
I am not in favor of the proposed change to the flags values. While I don't think any of this is adequately documented, I don't think there's a strong case to change the existing usage (i.e. in codegenxarch.cpp, @mikedn said:
I don't see that this is inconsistent - just poorly documented. The cmp doesn't need to be generated because it is "using" the flags register as it was previously set (rather than setting it itself). I agree that it is rather weird that the compare codegen just consumes the register, but it doesn't seem as potentially problematic as the alternative. |
I would have expected the logical meaning of the two flags to be
The relop were not setting the So my proposed change was trying to change the implicit flags to explicit flags. Effectively completing what appeared to be an incomplete implementation. I guess you and @mikedn are suggesting the meaning of the flags should be
So how do we handle the fused compare and branch instructions
|
Pretty much yes. I'd add that GTF_SET_FLAGS should be set on all nodes that are supposed to set the flags, including relops and GT_CMP. Though I don't think we set it now on relops.
That is the current and only meaning. It turns out that this flag is only checked by codegenxarch's
Containment might make more sense than this GTF acrobatics but I don't know if it's currently possible. Usually only constants and indirs are contained. Personally I'd prefer adding GT_CBZ & co. but adding 4 new GT opers might be hard to swallow. |
Actually LEAs are also contained. It seems like the right answer is one of:
In either case, neither the SET_FLAGS nor USE_FLAGS would be set. Probably the containment option would be the cleanest and most efficient (it would not require modifying the IR as much), but there are probably dragons there ... Other thoughts on this? |
I like the containment approach. I will try to fix the containment dragons I think there should be a separate effort to clean up these Flags. I will open an issue. #14093 |
If we do that we might get away with only one, say, char kind; // Z, NZ, TZ, TNZ
char testBit; // the bit index used by TZ and TNZ |
I have drafted the contain change. It is mostly passing, but there are two new crossgen asserts. In addition to the earlier mention DevDiv fail. The one new failure seems to be related to liveness and the other related to calling |
@miedn @CarolEidt Adding a new node is more consistent with #14027. Also Note : Arm64 does not yet support Looking at 4351a27 & 7a4b14e it does not look too difficult to introduce a new node.
I'll try drafting this change. |
8f42b9f
to
9c91233
Compare
OK since we agreed to leave the flags as is, I reverted the flag changes and the addition of the @CarolEidt Please take a look. This is effectively the original patch with revision of gentree code per discussion.
|
src/jit/codegenarm64.cpp
Outdated
if (op2->IsIntegralConst(0)) | ||
assert(genTypeSize(op1Type) == genTypeSize(op2Type)); | ||
|
||
if (varTypeIsFloating(op1Type)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this optimization ever apply to floating point compares? Should the GTF_USE_FLAGS
check be done only for integer compares perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does not yet. I just glanced through the armv8 reference manual. It looks like there might eventually be cases we could support. Unless there is an objection, I am happy with it here as it is gated by the setting of the flag.
@dotnet-bot test Windows_NT arm64 Cross Checked Build and Test |
test Windows_NT arm Cross Checked Build and Test |
@dotnet/jit-contrib All tests have passed. Including Windows Arm64. This can be reviewed/merged. @CarolEidt wanted someone else to look at this. @pgavlin was suggested. PTAL |
Sorry it was
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, and thanks for doing the cleanup in gentree.cpp.
I still hope that @BruceForstall or @briansull can have a look.
src/jit/codegenarm64.cpp
Outdated
assert(genTypeSize(op1Type) == genTypeSize(op2Type)); | ||
|
||
if (varTypeIsFloating(op1Type)) | ||
if ((tree->gtFlags & GTF_USE_FLAGS) == 0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is the right approach. However, since it's rather unexpected to consume operands and generate no code, it might be good to add a comment explaining what's going on and/or refer to the similar code in codegenxarch.cpp
|
||
#ifdef LEGACY_BACKEND | ||
#ifdef _TARGET_XARCH_ | ||
#if defined(LEGACY_BACKEND) && !FEATURE_SET_FLAGS && defined(_TARGET_XARCH_) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for cleaning up this code. We should probably have an issue to better document existing usage and/or determine what the ideal usage would be and what should change. But for now, I'm very happy with this improvement :-)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Approve with comments
src/jit/codegenarm64.cpp
Outdated
ins = INS_ands; | ||
break; | ||
default: | ||
assert(!"Unexpected BinaryOp with GTF_SET_FLAGS set"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this be a noway_assert instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@briansull I would have to defer to your judgment. I assume noway_assert exists in release mode too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, these are "release" asserts, where as a vanilla assert is a checked only assert.
If you hit a noway_assert, when running a release build :
- We can get telemetry indicating that it was hit
- And the VM layer catches the noway_assert,and will retry the JIT compilation once with optimization disabled. (Often the assert won't get hit the on the second attempt to JIT the method)
src/jit/codegenarm64.cpp
Outdated
|
||
if (op2->IsIntegralConst(0)) | ||
{ | ||
emit->emitIns_R_F(INS_fcmp, cmpSize, op1->gtRegNum, 0.0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we also assert that op2->isContained() here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can add it, but this section of changed code was previously only whitespace.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This code is "containing" the const. If lower is out of sync it could reserve an extra register for the constant and possibly lead to sub-optimal register allocation when the codegen phase doesn't use the "extra" register.
src/jit/lowerarmarch.cpp
Outdated
// GenTree node we need to set the info whether its codegen | ||
// will modify flags. | ||
if (op2->IsIntegralConst(0) && (op1->gtNext == op2) && (op2->gtNext == cmp) && | ||
cmp->OperIs(GT_EQ, GT_NE, GT_GT, GT_GE, GT_LT, GT_LE) && op1->OperIs(GT_ADD, GT_AND, GT_SUB)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Instead of cmp->OperIs(GT_EQ, GT_NE, GT_GT, GT_GE, GT_LT, GT_LE)
we typically use cmp->OperIsCompare()
, I note that it now also contains TEST_EQ and TEST_NE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of large OperIs(...) calls:
template <typename... T>
bool OperIs(genTreeOps oper, T... rest)
{
return OperIs(oper) || OperIs(rest...);
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK. I was hoping it was more efficient than that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should check how the TEST_EQ and TEST_NE cases shodul be handled here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not a big fan of large OperIs(...) calls:
When I introduced OperIs
I verified that C++ compilers handle it as expected. For example, this particular use ends up as sub
/cmp
/ja
(which happens to be more efficient than OperIsCompare
but that's another story).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For a call with 6 arguments it would still be expanded into 6 consecutive compare|subtract and branch sequences right?
What was/is the issue with OperIsCompare
?
It should be one memory fetch and a bit test right?
static bool OperIsCompare(genTreeOps gtOper)
{
return (OperKind(gtOper) & GTK_RELOP) != 0;
}
static unsigned OperKind(unsigned gtOper)
{
assert(gtOper < GT_COUNT);
return gtOperKindTable[gtOper];
}
So for a small number (3 or less) OperIs
would be OK, but not for 5 or more...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It would still be expanded into 6 consecutive compare|subtract and branch sequences right?
Nope. GT_EQ & co. form a continuous range so it's something like if ((x - GT_EQ) < 6)
.
What was/is the issue with OperIsCompare ?
Yes, one memory fetch. Even if the memory is in cache it's still 4-5 cycles, that's already more than the 2 cycles (sub+cmp) for the OperIs
version. And the bit test adds another cycle.
9c91233
to
d11ca21
Compare
@mikedn @CarolEidt @briansull Revised per comments. Restructured the changes in |
@dotnet-bot test Windows_NT arm64 Cross Checked Build and Test |
@CarolEidt @briansull All tests have passed including Windows Arm64 Checked. I think this is ready to be merged |
Uses same approach as XARCH to eliminate easiest to identify redundant compares
Includes #13941 changes to resolve merge conflict.