-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Updating genCodeForBinary to be VEX aware #1344
Conversation
CC. @CarolEidt |
This comment has been minimized.
This comment has been minimized.
The majority of diffs are positive changes, such as: - vdivsd xmm0, qword ptr [reloc @RWD00]
- vmovsd qword ptr [rbp-48H], xmm0
+ vdivsd xmm6, xmm0, qword ptr [reloc @RWD00] However, their are a few cases (such as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks OK overall, but I'm not comfortable with continuing to have fundamental support for 3-operand encodings under FEATURE_HW_INTRINSICS
. I think that's something that should be cleaned up.
Right, and I called that out above. I'm working on updating the function to work without FEATURE_HW_INTRINSIC. |
25026ae
to
628ce55
Compare
@@ -1336,6 +1336,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX | |||
#if defined(_TARGET_XARCH_) | |||
void inst_RV_RV_IV(instruction ins, emitAttr size, regNumber reg1, regNumber reg2, unsigned ival); | |||
void inst_RV_TT_IV(instruction ins, emitAttr attr, regNumber reg1, GenTree* rmOp, int ival); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was conditioned under FEATURE_HW_INTRINSICS
for the implementation, but not the declaration here. So, I did the minimal fixup to allow it to be available without `FEATURE_HW_INTRINSICS.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
@@ -909,6 +909,18 @@ void CodeGen::genCodeForBinary(GenTreeOp* treeNode) | |||
regNumber op1reg = op1->isUsedFromReg() ? op1->GetRegNum() : REG_NA; | |||
regNumber op2reg = op2->isUsedFromReg() ? op2->GetRegNum() : REG_NA; | |||
|
|||
if (varTypeIsFloating(treeNode->TypeGet())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This just hijacks all the floating-point types and forwards to inst_RV_RV_TT
, the emit helpers (such as emitIns_SIMD_R_R_R
) deal with the difference between VEX and non-VEX for dst
and op1Reg
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great - I think that's the right approach.
@@ -5810,6 +5807,7 @@ void emitter::emitIns_SIMD_R_R_S( | |||
} | |||
} | |||
|
|||
#ifdef FEATURE_HW_INTRINSICS |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I only made the minimum number of emitIns_SIMD_*
methods needed available outside FEATURE_HW_INTRINSICS
as the others can't be encountered during normal codegen (and likely never will be).
CC. @dotnet/jit-contrib |
// isRMW -- true if the instruction is RMW; otherwise, false | ||
// | ||
void CodeGen::inst_RV_RV_TT( | ||
instruction ins, emitAttr size, regNumber targetReg, regNumber op1Reg, GenTree* op2, bool isRMW) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
isRMW
should come before targetReg
& operands. It's already pretty dubious that such information has to be communicated separately from instructions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I put this at the end to keep parameter ordering with other overloads consistent and because this should likely be looked up a different way eventually (either based on the instruction or the node
)
Any other feedback here? If not, I think it should be good to merge once CI passes |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - thanks for eliminating the FEATURE_HW_INTRINSICS
dependencies.
@@ -909,6 +909,18 @@ void CodeGen::genCodeForBinary(GenTreeOp* treeNode) | |||
regNumber op1reg = op1->isUsedFromReg() ? op1->GetRegNum() : REG_NA; | |||
regNumber op2reg = op2->isUsedFromReg() ? op2->GetRegNum() : REG_NA; | |||
|
|||
if (varTypeIsFloating(treeNode->TypeGet())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great - I think that's the right approach.
not sure what you mean and why it is important now but on the left part the main problem was that these ADT were not put in registers. This problem was fixed by #37745. |
Hmm, I'm seeing 4.4s in 3.1 and 0.6s in 5.0; with it being 2.1s if just this PR is reverted; the additional moves inserted from the ops being interpreted as RMW were making a big difference in the pipelining of the code (at least from what I was able to discern from uProf). |
yeah, maybe it will be 3x time slower without this PR, I am just saying that these diffs that you have shown are not all caused by this PR and this PR, applied to 3.1 alone, won't produce any diffs on this test. |
This is a rough draft attempt at resolving #1342
This essentially just moves some of the
genHWIntrinsic_R_R_RM
logic into ainst_RV_RV_TT
method and updatesgenCodeForBinary
to call it for floating-point types ifFEATURE_HW_INTRINSICS
is defined.As can be seen below, the numbers are overall fairly promising.