-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Delete mov_i2xmm and mov_xmm2i. #47843
Conversation
0fb409e
to
2207b2f
Compare
src/coreclr/jit/emitxarch.cpp
Outdated
if (ins == INS_movd) | ||
{ | ||
assert(isFloatReg(reg1) != isFloatReg(reg2)); | ||
if (isFloatReg(reg1)) | ||
{ | ||
code = insCodeRM(ins); | ||
} | ||
else | ||
{ | ||
code = insCodeMR(ins); | ||
} | ||
} | ||
else | ||
{ | ||
code = insCodeRM(ins); | ||
} | ||
code = AddVexPrefixIfNeeded(ins, code, size); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we simplify to:
assert((ins != INS_movd) || (isFloatReg(reg1) != isFloatReg(reg2)));
if ((ins != INS_movd) || isFloatReg(reg1))
{
code = insCodeRM(ins);
}
else
{
code = insCodeMR(ins);
}
I'd expect ins != INS_movd
is the common case for codegen and it helps reduce nesting
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
will do before the merge, thanks for the suggestion.
@@ -14681,18 +14698,6 @@ emitter::insExecutionCharacteristics emitter::getInsExecutionCharacteristics(ins | |||
result.insThroughput = PERFSCORE_THROUGHPUT_25C; | |||
break; | |||
|
|||
case INS_mov_xmm2i: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks like these perfscores and the ones used for INS_movd
are tracked slightly differently?
Sorry, @tannergooding I have not marked as draft, could you please review it later after I fix the failures and add a description? |
f4014fa
to
d8c7247
Compare
d8c7247
to
c9357cb
Compare
/azp list |
/azp run runtime-coreclr jitstress, runtime-coreclr jitstressregs, runtime-coreclr outerloop |
Azure Pipelines successfully started running 3 pipeline(s). |
The failure is unrelated (#47096). |
PTAL @dotnet/jit-contrib , the PR fixes a recent regression and fixes wrong abstraction for |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. look like a nice cleanup.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Overall LGTM with few questions.
@@ -11461,6 +11466,9 @@ BYTE* emitter::emitOutputRR(BYTE* dst, instrDesc* id) | |||
} | |||
else if ((ins == INS_movsx) || (ins == INS_movzx) || (insIsCMOV(ins))) | |||
{ | |||
assert(hasCodeRM(ins) && !hasCodeMI(ins) && !hasCodeMR(ins)); | |||
code = insCodeRM(ins); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you confirm why we added insCodeRM()
and AddVexPrefixIfNeeded()
here and below?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was it missing currently?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we had logic like:
// Get the 'base' opcode
code = insCodeRM(ins);
code = AddVexPrefixIfNeeded(ins, code, size);
if ()
{
code = insEncodeRMreg(ins, code); // use the existing value.
}
else if ()
{
code = insCodeMR(ins); //rewrite the existing value without reading
}
so it was hard to follow, so I moved this
code = insCodeRM(ins);
code = AddVexPrefixIfNeeded(ins, code, size);
under conditions where the result code is used.
see 11451 in the base.
@@ -1451,7 +1451,7 @@ void CodeGen::genSSE2Intrinsic(GenTreeHWIntrinsic* node) | |||
{ | |||
assert(baseType == TYP_INT || baseType == TYP_UINT || baseType == TYP_LONG || baseType == TYP_ULONG); | |||
op1Reg = op1->GetRegNum(); | |||
emit->emitIns_R_R(ins, emitActualTypeSize(baseType), op1Reg, targetReg); | |||
emit->emitIns_R_R(ins, emitActualTypeSize(baseType), targetReg, op1Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was this an existing bug?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
the instruction that we were generating here was xmm2i
and for it we were passing dst
as reg2
. After this PR we pass it as reg1
as we do for all other instructions (dst first, source second).
@@ -1688,7 +1688,7 @@ void CodeGen::genAvxOrAvx2Intrinsic(GenTreeHWIntrinsic* node) | |||
assert(numArgs == 1); | |||
assert((baseType == TYP_INT) || (baseType == TYP_UINT)); | |||
instruction ins = HWIntrinsicInfo::lookupIns(intrinsicId, baseType); | |||
emit->emitIns_R_R(ins, emitActualTypeSize(baseType), op1Reg, targetReg); | |||
emit->emitIns_R_R(ins, emitActualTypeSize(baseType), targetReg, op1Reg); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am surprised this we never noticed any failure because of this.
src/coreclr/jit/simdcodegenxarch.cpp
Outdated
instruction ins = ins_CopyFloatToInt(TYP_FLOAT, TYP_INT); | ||
// (Note that for mov_xmm2i, the int register is always in the reg2 position. | ||
inst_RV_RV(ins, op2Reg, tmpReg, baseType); | ||
// Another error, op2Reg is our xmm source but it was passed second. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess you want to delete this comment before merging?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah, I thought I did, thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
the pr was updated with the review feedback, once tests pass I will merge it. @echesakovMSFT asked me to run spmi diffs and they showed some improvements because of:
becomes
because runtime/src/coreclr/jit/emitxarch.cpp Lines 190 to 194 in c29727a
|
The issue was in the xmm2i definition that was accepting the source argument on the first position, so each time we returned it from
ins_Copy
we had a potential bad code because the natural order for the Jit and for Intel is dst first.The strange order was implemented because of intel encoding for this instruction, but even in intel disasm and manual the instruction is shown naturally with dst on the first place.
So the fix is to make the right level of abstraction and hide this logic in the instruction encoding, then delete all places where we passed source xmm as the first argument.
It makes our disasm looking like Intel and prevents issues with
emitIns_R_R(ins_Copy(src, destType), size, dst, src)
.The issue was tricky because jit-diff does not show such diffs (because it was using JitDisasm that printed wrong order).
I have checked that there are no stress failures and that all possible pairs can be encoded.
A simple test
Thanks to @echesakovMSFT for helping with the fix.
Fixes #47259.
There is a follow-up issue to separate mod/movq for general reg/movq for xmm reg #47943