Skip to content

Commit

Permalink
[NVPTX] Prefer prmt.b32 over bfi.b32 (llvm#110766)
Browse files Browse the repository at this point in the history
In [[NVPTX] Improve lowering of
v4i8](llvm@cbafb6f)
@Artem-B add the ability to lower ISD::BUILD_VECTOR with bfi PTX
instructions. @Artem-B did this because:
([source](llvm#67866 (comment)))

> Under the hood byte extraction/insertion ends up as BFI/BFE
instructions, so we may as well do that in PTX, too.
https://godbolt.org/z/Tb3zWbj9b

However, the example that @Artem-B linked was targeting sm_52. On modern
architectures, ptxas uses prmt.b32.
[Example](https://godbolt.org/z/Ye4W1n84o).

Thus, remove uses of NVPTXISD::BFI in favor of NVPTXISD::PRMT.
  • Loading branch information
justinfargnoli authored Oct 10, 2024
1 parent 43ba97e commit 3f9998a
Show file tree
Hide file tree
Showing 3 changed files with 335 additions and 328 deletions.
31 changes: 17 additions & 14 deletions llvm/lib/Target/NVPTX/NVPTXISelLowering.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2332,20 +2332,23 @@ SDValue NVPTXTargetLowering::LowerBUILD_VECTOR(SDValue Op,
// Lower non-const v4i8 vector as byte-wise constructed i32, which allows us
// to optimize calculation of constant parts.
if (VT == MVT::v4i8) {
SDValue C8 = DAG.getConstant(8, DL, MVT::i32);
SDValue E01 = DAG.getNode(
NVPTXISD::BFI, DL, MVT::i32,
DAG.getAnyExtOrTrunc(Op->getOperand(1), DL, MVT::i32),
DAG.getAnyExtOrTrunc(Op->getOperand(0), DL, MVT::i32), C8, C8);
SDValue E012 =
DAG.getNode(NVPTXISD::BFI, DL, MVT::i32,
DAG.getAnyExtOrTrunc(Op->getOperand(2), DL, MVT::i32),
E01, DAG.getConstant(16, DL, MVT::i32), C8);
SDValue E0123 =
DAG.getNode(NVPTXISD::BFI, DL, MVT::i32,
DAG.getAnyExtOrTrunc(Op->getOperand(3), DL, MVT::i32),
E012, DAG.getConstant(24, DL, MVT::i32), C8);
return DAG.getNode(ISD::BITCAST, DL, VT, E0123);
SDValue PRMT__10 = DAG.getNode(
NVPTXISD::PRMT, DL, MVT::v4i8,
{DAG.getAnyExtOrTrunc(Op->getOperand(0), DL, MVT::i32),
DAG.getAnyExtOrTrunc(Op->getOperand(1), DL, MVT::i32),
DAG.getConstant(0x3340, DL, MVT::i32),
DAG.getConstant(NVPTX::PTXPrmtMode::NONE, DL, MVT::i32)});
SDValue PRMT32__ = DAG.getNode(
NVPTXISD::PRMT, DL, MVT::v4i8,
{DAG.getAnyExtOrTrunc(Op->getOperand(2), DL, MVT::i32),
DAG.getAnyExtOrTrunc(Op->getOperand(3), DL, MVT::i32),
DAG.getConstant(0x4033, DL, MVT::i32),
DAG.getConstant(NVPTX::PTXPrmtMode::NONE, DL, MVT::i32)});
SDValue PRMT3210 = DAG.getNode(
NVPTXISD::PRMT, DL, MVT::v4i8,
{PRMT__10, PRMT32__, DAG.getConstant(0x5410, DL, MVT::i32),
DAG.getConstant(NVPTX::PTXPrmtMode::NONE, DL, MVT::i32)});
return DAG.getNode(ISD::BITCAST, DL, VT, PRMT3210);
}
return Op;
}
Expand Down
Loading

0 comments on commit 3f9998a

Please sign in to comment.