Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ARM64 - Emitting msub instruction #66621

Merged
merged 7 commits into from
Apr 12, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/coreclr/jit/codegen.h
Original file line number Diff line number Diff line change
Expand Up @@ -1378,6 +1378,7 @@ XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX
#if defined(TARGET_ARM64)
void genCodeForJumpCompare(GenTreeOp* tree);
void genCodeForMadd(GenTreeOp* tree);
void genCodeForMsub(GenTreeOp* tree);
void genCodeForBfiz(GenTreeOp* tree);
void genCodeForAddEx(GenTreeOp* tree);
#endif // TARGET_ARM64
Expand Down
27 changes: 26 additions & 1 deletion src/coreclr/jit/codegenarm64.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -10040,7 +10040,7 @@ void CodeGen::instGen_MemoryBarrier(BarrierKind barrierKind)
}

//-----------------------------------------------------------------------------------
// genCodeForMadd: Emit a madd/msub (Multiply-Add) instruction
// genCodeForMadd: Emit a madd (Multiply-Add) instruction
//
// Arguments:
// tree - GT_MADD tree where op1 or op2 is GT_ADD
Expand Down Expand Up @@ -10084,6 +10084,31 @@ void CodeGen::genCodeForMadd(GenTreeOp* tree)
genProduceReg(tree);
}

//-----------------------------------------------------------------------------------
// genCodeForMsub: Emit a msub (Multiply-Subtract) instruction
//
// Arguments:
// tree - GT_MSUB tree where op2 is GT_MUL
//
void CodeGen::genCodeForMsub(GenTreeOp* tree)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Have you considered to just handle it as part of GT_MADD? I bet it'd be much less lines of code

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did consider it - I guess it's a design choice whether or not to use GT_MADD or introduce GT_MSUB.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm actually conflicted on it. On the one hand, just using GT_MADD is sufficient, but on the other hand, the codegen for GT_MADD is a bit complicated since we are making GT_NEG nodes as contained.

The goal of introducing GT_MSUB and its code-gen was to make it easy to understand.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name GT_MADD as a lowering-only op that is specific for ARM64 which reflects the actual instruction 'madd' - at least for me, I wouldn't it to expect to emit 'msub', but I totally understand why it does though.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider actually just making GT_MADD only emit 'madd' ? Then you could do the decision to use GT_MADD or GT_MSUB in lowering rather than in code-gen. It would make containment less complicated and code-gen simpler.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we consider actually just making GT_MADD only emit 'madd'

I like this and it makes it have "parity" with GT_ADD and GT_SUB.

{
assert(tree->OperIs(GT_MSUB) && varTypeIsIntegral(tree) && !(tree->gtFlags & GTF_SET_FLAGS));
genConsumeOperands(tree);

assert(tree->gtGetOp2()->OperIs(GT_MUL));
assert(tree->gtGetOp2()->isContained());

GenTree* a = tree->gtGetOp1();
GenTree* b = tree->gtGetOp2()->gtGetOp1();
GenTree* c = tree->gtGetOp2()->gtGetOp2();

// d = a - b * c
// MSUB d, b, c, a
GetEmitter()->emitIns_R_R_R_R(INS_msub, emitActualTypeSize(tree), tree->GetRegNum(), b->GetRegNum(), c->GetRegNum(),
a->GetRegNum());
genProduceReg(tree);
}

//------------------------------------------------------------------------
// genCodeForBfiz: Generates the code sequence for a GenTree node that
// represents a bitfield insert in zero with sign/zero extension.
Expand Down
4 changes: 4 additions & 0 deletions src/coreclr/jit/codegenarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -306,6 +306,10 @@ void CodeGen::genCodeForTreeNode(GenTree* treeNode)
genCodeForMadd(treeNode->AsOp());
break;

case GT_MSUB:
genCodeForMsub(treeNode->AsOp());
break;

case GT_INC_SATURATE:
genCodeForIncSaturate(treeNode);
break;
Expand Down
4 changes: 3 additions & 1 deletion src/coreclr/jit/gtlist.h
Original file line number Diff line number Diff line change
Expand Up @@ -221,8 +221,10 @@ GTNODE(MUL_LONG , GenTreeOp ,1,GTK_BINOP|DBK_NOTHIR)
GTNODE(AND_NOT , GenTreeOp ,0,GTK_BINOP|DBK_NOTHIR)

#ifdef TARGET_ARM64
GTNODE(MADD , GenTreeOp ,0,GTK_BINOP|DBK_NOTHIR) // Generates the Multiply-Add instruction (madd/msub) In the future, we might consider
GTNODE(MADD , GenTreeOp ,0,GTK_BINOP|DBK_NOTHIR) // Generates the Multiply-Add instruction. In the future, we might consider
// enabling it for both armarch and xarch for floating-point MADD "unsafe" math.
GTNODE(MSUB , GenTreeOp ,0,GTK_BINOP|DBK_NOTHIR) // Generates the Multiply-Subtract instruction. In the future, we might consider
// enabling it for both armarch and xarch for floating-point MSUB "unsafe" math.
GTNODE(ADDEX, GenTreeOp ,0,GTK_BINOP|DBK_NOTHIR) // Add with sign/zero extension.
GTNODE(BFIZ , GenTreeOp ,0,GTK_BINOP|DBK_NOTHIR) // Bitfield Insert in Zero.
#endif
Expand Down
74 changes: 45 additions & 29 deletions src/coreclr/jit/lowerarmarch.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -1609,43 +1609,59 @@ void Lowering::ContainCheckBinary(GenTreeOp* node)
CheckImmedAndMakeContained(node, op2);

#ifdef TARGET_ARM64
// Find "a * b + c" or "c + a * b" in order to emit MADD/MSUB
if (comp->opts.OptimizationEnabled() && varTypeIsIntegral(node) && !node->isContained() && node->OperIs(GT_ADD) &&
!node->gtOverflow() && (op1->OperIs(GT_MUL) || op2->OperIs(GT_MUL)))
if (comp->opts.OptimizationEnabled() && varTypeIsIntegral(node) && !node->isContained())
{
GenTree* mul;
GenTree* c;
if (op1->OperIs(GT_MUL))
// Find "a * b + c" or "c + a * b" in order to emit MADD/MSUB
if (node->OperIs(GT_ADD) && !node->gtOverflow() && (op1->OperIs(GT_MUL) || op2->OperIs(GT_MUL)))
{
mul = op1;
c = op2;
}
else
{
mul = op2;
c = op1;
}
GenTree* mul;
GenTree* c;
if (op1->OperIs(GT_MUL))
{
mul = op1;
c = op2;
}
else
{
mul = op2;
c = op1;
}

GenTree* a = mul->gtGetOp1();
GenTree* b = mul->gtGetOp2();
GenTree* a = mul->gtGetOp1();
GenTree* b = mul->gtGetOp2();

if (!mul->isContained() && !mul->gtOverflow() && !a->isContained() && !b->isContained() && !c->isContained() &&
varTypeIsIntegral(mul))
{
if (a->OperIs(GT_NEG) && !a->gtGetOp1()->isContained() && !a->gtGetOp1()->IsRegOptional())
if (!mul->isContained() && !mul->gtOverflow() && !a->isContained() && !b->isContained() &&
!c->isContained() && varTypeIsIntegral(mul))
{
// "-a * b + c" to MSUB
MakeSrcContained(mul, a);
if (a->OperIs(GT_NEG) && !a->gtGetOp1()->isContained() && !a->gtGetOp1()->IsRegOptional())
{
// "-a * b + c" to MSUB
MakeSrcContained(mul, a);
}
if (b->OperIs(GT_NEG) && !b->gtGetOp1()->isContained())
{
// "a * -b + c" to MSUB
MakeSrcContained(mul, b);
}
// If both 'a' and 'b' are GT_NEG - MADD will be emitted.

node->ChangeOper(GT_MADD);
MakeSrcContained(node, mul);
}
if (b->OperIs(GT_NEG) && !b->gtGetOp1()->isContained())
}
// Find "a - b * c" in order to emit MSUB
else if (node->OperIs(GT_SUB) && !node->gtOverflow() && op2->OperIs(GT_MUL) && !op2->isContained() &&
!op2->gtOverflow() && varTypeIsIntegral(op2))
{
GenTree* a = op1;
GenTree* b = op2->gtGetOp1();
GenTree* c = op2->gtGetOp2();

if (!a->isContained() && !b->isContained() && !c->isContained())
{
// "a * -b + c" to MSUB
MakeSrcContained(mul, b);
node->ChangeOper(GT_MSUB);
MakeSrcContained(node, op2);
}
// If both 'a' and 'b' are GT_NEG - MADD will be emitted.

node->ChangeOper(GT_MADD);
MakeSrcContained(node, mul);
}
}

Expand Down