[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… #78414

mariusz-sikora-at-amd · 2024-01-17T09:13:28Z

…bf8 instructions

Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
instructions that were supported on GFX940 (MI300):
- V_CVT_F32_FP8
- V_CVT_F32_BF8
- V_CVT_PK_F32_FP8
- V_CVT_PK_F32_BF8
- V_CVT_PK_FP8_F32
- V_CVT_PK_BF8_F32
- V_CVT_SR_FP8_F32
- V_CVT_SR_BF8_F32

…bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32

llvmbot · 2024-01-17T09:13:58Z

@llvm/pr-subscribers-mc

@llvm/pr-subscribers-backend-amdgpu

Author: Mariusz Sikora (mariusz-sikora-at-amd)

Changes

…bf8 instructions

Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16
instructions that were supported on GFX940 (MI300):
- V_CVT_F32_FP8
- V_CVT_F32_BF8
- V_CVT_PK_F32_FP8
- V_CVT_PK_F32_BF8
- V_CVT_PK_FP8_F32
- V_CVT_PK_BF8_F32
- V_CVT_SR_FP8_F32
- V_CVT_SR_BF8_F32

Patch is 106.83 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/78414.diff

29 Files Affected:

(modified) llvm/lib/Target/AMDGPU/AMDGPU.td (+3-1)
(modified) llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp (+66-2)
(modified) llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp (+49-1)
(modified) llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp (+3-1)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp (+11)
(modified) llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h (+3)
(modified) llvm/lib/Target/AMDGPU/VOP1Instructions.td (+93-5)
(modified) llvm/lib/Target/AMDGPU/VOP3Instructions.td (+49-4)
(added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll (+105)
(added) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.mir (+197)
(modified) llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.ll (+309-61)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop1.s (+45)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop1_dpp16.s (+12)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop1_dpp8.s (+12)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop3.s (+36)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp16.s (+108)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop3_dpp8.s (+48)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop3_from_vop1.s (+138)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop3_from_vop1_dpp16.s (+12)
(modified) llvm/test/MC/AMDGPU/gfx12_asm_vop3_from_vop1_dpp8.s (+12)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop1.txt (+36)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop1_dpp16.txt (+12)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop1_dpp8.txt (+12)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3.txt (+36)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp16.txt (+108)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_dpp8.txt (+48)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_from_vop1.txt (+36)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_from_vop1_dpp16.txt (+12)
(modified) llvm/test/MC/Disassembler/AMDGPU/gfx12_dasm_vop3_from_vop1_dpp8.txt (+12)

diff --git a/llvm/lib/Target/AMDGPU/AMDGPU.td b/llvm/lib/Target/AMDGPU/AMDGPU.td
index c1c863d885c3a9e..852a99786efffd6 100644
--- a/llvm/lib/Target/AMDGPU/AMDGPU.td
+++ b/llvm/lib/Target/AMDGPU/AMDGPU.td
@@ -1495,6 +1495,7 @@ def FeatureISAVersion12 : FeatureSet<
    FeatureFlatAtomicFaddF32Inst,
    FeatureImageInsts,
    FeatureExtendedImageInsts,
+   FeatureFP8Insts,
    FeaturePackedTID,
    FeatureVcmpxPermlaneHazard,
    FeatureSALUFloatInsts,
@@ -1502,7 +1503,8 @@ def FeatureISAVersion12 : FeatureSet<
    FeatureHasRestrictedSOffset,
    FeatureVGPRSingleUseHintInsts,
    FeatureMADIntraFwdBug,
-   FeatureScalarDwordx3Loads]>;
+   FeatureScalarDwordx3Loads,
+   FeatureDPPSrc1SGPR]>;
 
 //===----------------------------------------------------------------------===//
 
diff --git a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
index ba79affe683d6f7..b2b81446016ec71 100644
--- a/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
+++ b/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp
@@ -3500,6 +3500,9 @@ bool AMDGPUAsmParser::usesConstantBus(const MCInst &Inst, unsigned OpIdx) {
     return !isInlineConstant(Inst, OpIdx);
   } else if (MO.isReg()) {
     auto Reg = MO.getReg();
+    if (!Reg) {
+      return false;
+    }
     const MCRegisterInfo *TRI = getContext().getRegisterInfo();
     auto PReg = mc2PseudoReg(Reg);
     return isSGPR(PReg, TRI) && PReg != SGPR_NULL;
@@ -8273,6 +8276,16 @@ void AMDGPUAsmParser::cvtVOP3(MCInst &Inst, const OperandVector &Operands,
     ((AMDGPUOperand &)*Operands[I++]).addRegOperands(Inst, 1);
   }
 
+  if (isVOP1Cvt_F32_Fp8_Bf8_e64(Opc) &&
+      Opc != AMDGPU::V_CVT_PK_F32_BF8_e64_gfx12 &&
+      Opc != AMDGPU::V_CVT_PK_F32_FP8_e64_gfx12) {
+    AMDGPUOperand &Op = ((AMDGPUOperand &)*Operands[I++]);
+    Op.addRegOrImmWithFPInputModsOperands(Inst, 1); // src0
+    // Add dummy src1
+    Inst.addOperand(MCOperand::createImm(0));
+    Inst.addOperand(MCOperand::createReg(AMDGPU::getMCReg(0, getSTI())));
+  }
+
   for (unsigned E = Operands.size(); I != E; ++I) {
     AMDGPUOperand &Op = ((AMDGPUOperand &)*Operands[I]);
     if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {
@@ -8321,12 +8334,20 @@ void AMDGPUAsmParser::cvtVOP3P(MCInst &Inst, const OperandVector &Operands,
   const bool IsPacked = (Desc.TSFlags & SIInstrFlags::IsPacked) != 0;
 
   if (Opc == AMDGPU::V_CVT_SR_BF8_F32_vi ||
-      Opc == AMDGPU::V_CVT_SR_FP8_F32_vi) {
+      Opc == AMDGPU::V_CVT_SR_FP8_F32_vi ||
+      Opc == AMDGPU::V_CVT_SR_BF8_F32_e64_gfx12 ||
+      Opc == AMDGPU::V_CVT_SR_FP8_F32_e64_gfx12) {
     Inst.addOperand(MCOperand::createImm(0)); // Placeholder for src2_mods
     Inst.addOperand(Inst.getOperand(0));
   }
 
-  if (AMDGPU::hasNamedOperand(Opc, AMDGPU::OpName::vdst_in)) {
+  // Adding vdst_in operand is already covered for these DPP instructions in
+  // cvtVOP3DPP.
+  if (AMDGPU::hasNamedOperand(Opc, AMDGPU::OpName::vdst_in) &&
+      !(Opc == AMDGPU::V_CVT_PK_BF8_F32_e64_dpp_gfx12 ||
+        Opc == AMDGPU::V_CVT_PK_FP8_F32_e64_dpp_gfx12 ||
+        Opc == AMDGPU::V_CVT_PK_BF8_F32_e64_dpp8_gfx12 ||
+        Opc == AMDGPU::V_CVT_PK_FP8_F32_e64_dpp8_gfx12)) {
     assert(!IsPacked);
     Inst.addOperand(Inst.getOperand(0));
   }
@@ -8765,6 +8786,11 @@ void AMDGPUAsmParser::cvtVOP3DPP(MCInst &Inst, const OperandVector &Operands,
   int OldIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::old);
   int Src2ModIdx =
       AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src2_modifiers);
+  int VdstInIdx = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::vdst_in);
+  bool IsVOP3CvtSrDpp = Opc == AMDGPU::V_CVT_SR_BF8_F32_e64_dpp8_gfx12 ||
+                        Opc == AMDGPU::V_CVT_SR_FP8_F32_e64_dpp8_gfx12 ||
+                        Opc == AMDGPU::V_CVT_SR_BF8_F32_e64_dpp_gfx12 ||
+                        Opc == AMDGPU::V_CVT_SR_FP8_F32_e64_dpp_gfx12;
   bool IsMAC = OldIdx != -1 && Src2ModIdx != -1 &&
                Desc.getOperandConstraint(OldIdx, MCOI::TIED_TO) == -1;
 
@@ -8788,6 +8814,20 @@ void AMDGPUAsmParser::cvtVOP3DPP(MCInst &Inst, const OperandVector &Operands,
       }
     }
 
+    if (VdstInIdx != -1) {
+      int NumOperands = Inst.getNumOperands();
+      if (VdstInIdx == NumOperands)
+        Inst.addOperand(Inst.getOperand(0));
+    }
+
+    if (IsVOP3CvtSrDpp) {
+      int NumOperands = Inst.getNumOperands();
+      if (Src2ModIdx == NumOperands) {
+        Inst.addOperand(MCOperand::createImm(0));
+        Inst.addOperand(MCOperand::createReg(AMDGPU::getMCReg(0, getSTI())));
+      }
+    }
+
     auto TiedTo = Desc.getOperandConstraint(Inst.getNumOperands(),
                                             MCOI::TIED_TO);
     if (TiedTo != -1) {
@@ -8801,6 +8841,13 @@ void AMDGPUAsmParser::cvtVOP3DPP(MCInst &Inst, const OperandVector &Operands,
       Fi = Op.getImm();
     } else if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {
       Op.addRegOrImmWithFPInputModsOperands(Inst, 2);
+      if (isVOP1Cvt_F32_Fp8_Bf8_e64(Inst.getOpcode()) &&
+          Inst.getOpcode() != AMDGPU::V_CVT_PK_F32_BF8_e64_gfx12 &&
+          Inst.getOpcode() != AMDGPU::V_CVT_PK_F32_FP8_e64_gfx12) {
+        // Add dummy src1
+        Inst.addOperand(MCOperand::createImm(0));
+        Inst.addOperand(MCOperand::createReg(AMDGPU::getMCReg(0, getSTI())));
+      }
     } else if (Op.isReg()) {
       Op.addRegOperands(Inst, 1);
     } else if (Op.isImm() &&
@@ -8847,6 +8894,7 @@ void AMDGPUAsmParser::cvtDPP(MCInst &Inst, const OperandVector &Operands, bool I
   OptionalImmIndexMap OptionalIdx;
 
   unsigned I = 1;
+  const unsigned Opc = Inst.getOpcode();
   const MCInstrDesc &Desc = MII.get(Inst.getOpcode());
   for (unsigned J = 0; J < Desc.getNumDefs(); ++J) {
     ((AMDGPUOperand &)*Operands[I++]).addRegOperands(Inst, 1);
@@ -8874,6 +8922,14 @@ void AMDGPUAsmParser::cvtDPP(MCInst &Inst, const OperandVector &Operands, bool I
         Op.addImmOperands(Inst, 1);
       } else if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {
         Op.addRegWithFPInputModsOperands(Inst, 2);
+        if (Opc == AMDGPU::V_CVT_F32_BF8_dpp_gfx12 ||
+            Opc == AMDGPU::V_CVT_F32_FP8_dpp_gfx12 ||
+            Opc == AMDGPU::V_CVT_F32_BF8_dpp8_gfx12 ||
+            Opc == AMDGPU::V_CVT_F32_FP8_dpp8_gfx12) {
+          // Add dummy src1
+          Inst.addOperand(MCOperand::createImm(0));
+          Inst.addOperand(MCOperand::createReg(AMDGPU::getMCReg(0, getSTI())));
+        }
       } else if (Op.isDppFI()) {
         Fi = Op.getImm();
       } else if (Op.isReg()) {
@@ -8884,6 +8940,14 @@ void AMDGPUAsmParser::cvtDPP(MCInst &Inst, const OperandVector &Operands, bool I
     } else {
       if (isRegOrImmWithInputMods(Desc, Inst.getNumOperands())) {
         Op.addRegWithFPInputModsOperands(Inst, 2);
+        if (Opc == AMDGPU::V_CVT_F32_BF8_dpp_gfx12 ||
+            Opc == AMDGPU::V_CVT_F32_FP8_dpp_gfx12 ||
+            Opc == AMDGPU::V_CVT_F32_BF8_dpp8_gfx12 ||
+            Opc == AMDGPU::V_CVT_F32_FP8_dpp8_gfx12) {
+          // Add dummy src1
+          Inst.addOperand(MCOperand::createImm(0));
+          Inst.addOperand(MCOperand::createReg(AMDGPU::getMCReg(0, getSTI())));
+        }
       } else if (Op.isReg()) {
         Op.addRegOperands(Inst, 1);
       } else if (Op.isDPPCtrl()) {
diff --git a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
index 9dff3f6c2efd025..75d0511b567bbef 100644
--- a/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
+++ b/llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp
@@ -522,6 +522,15 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
           convertVOPCDPPInst(MI); // Special VOP3 case
         } else {
           assert(MCII->get(MI.getOpcode()).TSFlags & SIInstrFlags::VOP3);
+
+          if (AMDGPU::isVOP1Cvt_F32_Fp8_Bf8_e64(MI.getOpcode())) {
+            // Add omod and clamp modifiers.
+            insertNamedMCOperand(MI, MCOperand::createImm(0),
+                                 AMDGPU::OpName::omod);
+            insertNamedMCOperand(MI, MCOperand::createImm(0),
+                                 AMDGPU::OpName::clamp);
+          }
+
           convertVOP3DPPInst(MI); // Regular VOP3 case
         }
       };
@@ -691,8 +700,15 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
 
     Res = tryDecodeInst(DecoderTableGFX1264, DecoderTableGFX12_FAKE1664, MI, QW,
                         Address, CS);
-    if (Res)
+    if (Res) {
+      if (AMDGPU::isVOP1Cvt_F32_Fp8_Bf8_e64(MI.getOpcode())) {
+        // Add omod and clamp modifiers.
+        insertNamedMCOperand(MI, MCOperand::createImm(0), AMDGPU::OpName::omod);
+        insertNamedMCOperand(MI, MCOperand::createImm(0),
+                             AMDGPU::OpName::clamp);
+      }
       break;
+    }
 
     Res = tryDecodeInst(DecoderTableGFX1164, DecoderTableGFX11_FAKE1664, MI, QW,
                         Address, CS);
@@ -708,6 +724,13 @@ DecodeStatus AMDGPUDisassembler::getInstruction(MCInst &MI, uint64_t &Size,
                          AMDGPU::OpName::src2_modifiers);
   }
 
+  if (Res && (MI.getOpcode() == AMDGPU::V_CVT_SR_BF8_F32_e64_dpp ||
+              MI.getOpcode() == AMDGPU::V_CVT_SR_FP8_F32_e64_dpp)) {
+    // Insert dummy unused src2_modifiers.
+    insertNamedMCOperand(MI, MCOperand::createImm(0),
+                         AMDGPU::OpName::src2_modifiers);
+  }
+
   if (Res && (MCII->get(MI.getOpcode()).TSFlags & SIInstrFlags::DS) &&
       !AMDGPU::hasGDS(STI)) {
     insertNamedMCOperand(MI, MCOperand::createImm(0), AMDGPU::OpName::gds);
@@ -938,6 +961,13 @@ void AMDGPUDisassembler::convertMacDPPInst(MCInst &MI) const {
 // first add optional MI operands to check FI
 DecodeStatus AMDGPUDisassembler::convertDPP8Inst(MCInst &MI) const {
   unsigned Opc = MI.getOpcode();
+
+  if (AMDGPU::isVOP1Cvt_F32_Fp8_Bf8_e64(Opc)) {
+    // Add omod and clamp modifiers.
+    insertNamedMCOperand(MI, MCOperand::createImm(0), AMDGPU::OpName::omod);
+    insertNamedMCOperand(MI, MCOperand::createImm(0), AMDGPU::OpName::clamp);
+  }
+
   if (MCII->get(Opc).TSFlags & SIInstrFlags::VOP3P) {
     convertVOP3PDPPInst(MI);
   } else if ((MCII->get(Opc).TSFlags & SIInstrFlags::VOPC) ||
@@ -947,6 +977,15 @@ DecodeStatus AMDGPUDisassembler::convertDPP8Inst(MCInst &MI) const {
     if (isMacDPP(MI))
       convertMacDPPInst(MI);
 
+    int VDstInIdx =
+        AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdst_in);
+    if (VDstInIdx != -1)
+      insertNamedMCOperand(MI, MI.getOperand(0), AMDGPU::OpName::vdst_in);
+
+    if (MI.getOpcode() == AMDGPU::V_CVT_SR_BF8_F32_e64_dpp8_gfx12 ||
+        MI.getOpcode() == AMDGPU::V_CVT_SR_FP8_F32_e64_dpp8_gfx12)
+      insertNamedMCOperand(MI, MI.getOperand(0), AMDGPU::OpName::src2);
+
     unsigned DescNumOps = MCII->get(Opc).getNumOperands();
     if (MI.getNumOperands() < DescNumOps &&
         AMDGPU::hasNamedOperand(Opc, AMDGPU::OpName::op_sel)) {
@@ -973,6 +1012,15 @@ DecodeStatus AMDGPUDisassembler::convertVOP3DPPInst(MCInst &MI) const {
   if (isMacDPP(MI))
     convertMacDPPInst(MI);
 
+  int VDstInIdx =
+      AMDGPU::getNamedOperandIdx(MI.getOpcode(), AMDGPU::OpName::vdst_in);
+  if (VDstInIdx != -1)
+    insertNamedMCOperand(MI, MI.getOperand(0), AMDGPU::OpName::vdst_in);
+
+  if (MI.getOpcode() == AMDGPU::V_CVT_SR_BF8_F32_e64_dpp_gfx12 ||
+      MI.getOpcode() == AMDGPU::V_CVT_SR_FP8_F32_e64_dpp_gfx12)
+    insertNamedMCOperand(MI, MI.getOperand(0), AMDGPU::OpName::src2);
+
   unsigned Opc = MI.getOpcode();
   unsigned DescNumOps = MCII->get(Opc).getNumOperands();
   if (MI.getNumOperands() < DescNumOps &&
diff --git a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
index 6c7977e22599c6e..1fc70f0bbbd2d9a 100644
--- a/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
+++ b/llvm/lib/Target/AMDGPU/MCTargetDesc/AMDGPUInstPrinter.cpp
@@ -1300,7 +1300,9 @@ void AMDGPUInstPrinter::printOpSel(const MCInst *MI, unsigned,
                                    const MCSubtargetInfo &STI,
                                    raw_ostream &O) {
   unsigned Opc = MI->getOpcode();
-  if (isPermlane16(Opc)) {
+  if (isPermlane16(Opc) || (isVOP1Cvt_F32_Fp8_Bf8_e64(Opc) &&
+                            Opc != AMDGPU::V_CVT_PK_F32_BF8_e64_gfx12 &&
+                            Opc != AMDGPU::V_CVT_PK_F32_FP8_e64_gfx12)) {
     auto FIN = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src0_modifiers);
     auto BCN = AMDGPU::getNamedOperandIdx(Opc, AMDGPU::OpName::src1_modifiers);
     unsigned FI = !!(MI->getOperand(FIN).getImm() & SISrcMods::OP_SEL_0);
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
index 26ba2575ff34ac0..ae197ee83acc053 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.cpp
@@ -503,6 +503,17 @@ bool isPermlane16(unsigned Opc) {
          Opc == AMDGPU::V_PERMLANEX16_VAR_B32_e64_gfx12;
 }
 
+bool isVOP1Cvt_F32_Fp8_Bf8_e64(unsigned Opc) {
+  return Opc == AMDGPU::V_CVT_F32_BF8_e64_gfx12 ||
+         Opc == AMDGPU::V_CVT_F32_FP8_e64_gfx12 ||
+         Opc == AMDGPU::V_CVT_F32_BF8_e64_dpp_gfx12 ||
+         Opc == AMDGPU::V_CVT_F32_FP8_e64_dpp_gfx12 ||
+         Opc == AMDGPU::V_CVT_F32_BF8_e64_dpp8_gfx12 ||
+         Opc == AMDGPU::V_CVT_F32_FP8_e64_dpp8_gfx12 ||
+         Opc == AMDGPU::V_CVT_PK_F32_BF8_e64_gfx12 ||
+         Opc == AMDGPU::V_CVT_PK_F32_FP8_e64_gfx12;
+}
+
 bool isGenericAtomic(unsigned Opc) {
   return Opc == AMDGPU::G_AMDGPU_ATOMIC_FMIN ||
          Opc == AMDGPU::G_AMDGPU_ATOMIC_FMAX ||
diff --git a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
index 50c741760d7143e..9d0bac084feabe2 100644
--- a/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
+++ b/llvm/lib/Target/AMDGPU/Utils/AMDGPUBaseInfo.h
@@ -542,6 +542,9 @@ bool isPermlane16(unsigned Opc);
 LLVM_READNONE
 bool isGenericAtomic(unsigned Opc);
 
+LLVM_READNONE
+bool isVOP1Cvt_F32_Fp8_Bf8_e64(unsigned Opc);
+
 namespace VOPD {
 
 enum Component : unsigned {
diff --git a/llvm/lib/Target/AMDGPU/VOP1Instructions.td b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
index d604990dc88c207..48202e2250c8500 100644
--- a/llvm/lib/Target/AMDGPU/VOP1Instructions.td
+++ b/llvm/lib/Target/AMDGPU/VOP1Instructions.td
@@ -571,6 +571,7 @@ let SubtargetPredicate = isGFX9Only in {
 } // End SubtargetPredicate = isGFX9Only
 
 class VOPProfile_Base_CVT_F32_F8<ValueType vt> : VOPProfileI2F <vt, i32> {
+  let HasExtDPP = 1;
   let HasExtSDWA = 1;
   let HasExtSDWA9 = 1;
   let HasExt = 1;
@@ -599,6 +600,7 @@ class Cvt_F32_F8_Pat<SDPatternOperator node, int index,
     (inst_sdwa 0, $src, 0, 0, index)
 >;
 
+let SubtargetPredicate = isGFX9Only in {
 let OtherPredicates = [HasCvtFP8VOP1Bug] in {
   def : GCNPat<(f32 (int_amdgcn_cvt_f32_fp8 i32:$src, 0)),
                (V_CVT_F32_FP8_sdwa 0, $src, 0, 0, 0)>;
@@ -617,6 +619,7 @@ foreach Index = [1, 2, 3] in {
   def : Cvt_F32_F8_Pat<int_amdgcn_cvt_f32_fp8, Index, V_CVT_F32_FP8_sdwa>;
   def : Cvt_F32_F8_Pat<int_amdgcn_cvt_f32_bf8, Index, V_CVT_F32_BF8_sdwa>;
 }
+} // End SubtargetPredicate = isGFX9Only
 
 class Cvt_PK_F32_F8_Pat<SDPatternOperator node, int index,
     VOP1_Pseudo inst_e32, VOP1_SDWA_Pseudo inst_sdwa> : GCNPat<
@@ -626,11 +629,82 @@ class Cvt_PK_F32_F8_Pat<SDPatternOperator node, int index,
          (inst_e32 $src))
 >;
 
-foreach Index = [0, -1] in {
-  def : Cvt_PK_F32_F8_Pat<int_amdgcn_cvt_pk_f32_fp8, Index,
-                          V_CVT_PK_F32_FP8_e32, V_CVT_PK_F32_FP8_sdwa>;
-  def : Cvt_PK_F32_F8_Pat<int_amdgcn_cvt_pk_f32_bf8, Index,
-                          V_CVT_PK_F32_BF8_e32, V_CVT_PK_F32_BF8_sdwa>;
+let SubtargetPredicate = isGFX9Only in {
+  foreach Index = [0, -1] in {
+    def : Cvt_PK_F32_F8_Pat<int_amdgcn_cvt_pk_f32_fp8, Index,
+                            V_CVT_PK_F32_FP8_e32, V_CVT_PK_F32_FP8_sdwa>;
+    def : Cvt_PK_F32_F8_Pat<int_amdgcn_cvt_pk_f32_bf8, Index,
+                            V_CVT_PK_F32_BF8_e32, V_CVT_PK_F32_BF8_sdwa>;
+  }
+}
+
+
+// Similar to VOPProfile_Base_CVT_F32_F8, but for VOP3 instructions.
+def VOPProfile_Base_CVT_PK_F32_F8_OpSel : VOPProfileI2F <v2f32, i32> {
+  let InsVOP3OpSel = (ins Src0Mod:$src0_modifiers, Src0RC64:$src0,
+                          clampmod:$clamp, omod:$omod, op_sel0:$op_sel);
+
+  let HasOpSel = 1;
+  let HasExtVOP3DPP = 0;
+}
+
+def VOPProfile_Base_CVT_F32_F8_OpSel : VOPProfile<[f32, i32, i32, untyped]> {
+  let InsVOP3OpSel = (ins Src0Mod:$src0_modifiers, Src0RC64:$src0,
+                          Src1Mod:$src1_modifiers, Src1RC64:$src1,
+                          clampmod:$clamp, omod:$omod, op_sel0:$op_sel);
+  let AsmVOP3OpSel = !subst(", $src1_modifiers", "", getAsmVOP3OpSel<2, 0, 0, 1, 1, 0>.ret);
+
+  let HasOpSel = 1;
+  let HasExtDPP = 1;
+  let HasExtVOP3DPP = 1;
+
+  let Src1VOP3DPP = Src1RC64;
+  let AsmVOP3DPP8 = getAsmVOP3DPP8<AsmVOP3OpSel>.ret;
+  let AsmVOP3DPP16 = getAsmVOP3DPP16<AsmVOP3OpSel>.ret;
+}
+
+let SubtargetPredicate = isGFX12Plus, mayRaiseFPException = 0,
+    SchedRW = [WriteFloatCvt] in {
+  defm V_CVT_F32_FP8_OP_SEL    : VOP1Inst<"v_cvt_f32_fp8_op_sel", VOPProfile_Base_CVT_F32_F8_OpSel>;
+  defm V_CVT_F32_BF8_OP_SEL    : VOP1Inst<"v_cvt_f32_bf8_op_sel", VOPProfile_Base_CVT_F32_F8_OpSel>;
+  defm V_CVT_PK_F32_FP8_OP_SEL : VOP1Inst<"v_cvt_pk_f32_fp8_op_sel", VOPProfile_Base_CVT_PK_F32_F8_OpSel>;
+  defm V_CVT_PK_F32_BF8_OP_SEL : VOP1Inst<"v_cvt_pk_f32_bf8_op_sel", VOPProfile_Base_CVT_PK_F32_F8_OpSel>;
+}
+
+class Cvt_F32_F8_Pat_OpSel<SDPatternOperator node, bits<2> index,
+    VOP1_Pseudo inst_e32, VOP3_Pseudo inst_e64> : GCNPat<
+    (f32 (node i32:$src, index)),
+    !if (index,
+         (inst_e64 !if(index{0}, SRCMODS.OP_SEL_0, SRCMODS.OP_SEL_1), $src,
+                   !if(index{1}, SRCMODS.OP_SEL_0, SRCMODS.OP_SEL_1), (i32 0),
+                   0, 0, 0),
+         (inst_e32 $src))
+>;
+
+let SubtargetPredicate = isGFX12Plus in {
+  foreach Index = [0, 1, 2, 3] in {
+    def : Cvt_F32_F8_Pat_OpSel<int_amdgcn_cvt_f32_fp8, Index,
+                               V_CVT_F32_FP8_e32, V_CVT_F32_FP8_OP_SEL_e64>;
+    def : Cvt_F32_F8_Pat_OpSel<int_amdgcn_cvt_f32_bf8, Index,
+                               V_CVT_F32_BF8_e32, V_CVT_F32_BF8_OP_SEL_e64>;
+  }
+}
+
+class Cvt_PK_F32_F8_Pat_OpSel<SDPatternOperator node, int index,
+    VOP1_Pseudo inst_e32, VOP3_Pseudo inst_e64> : GCNPat<
+    (v2f32 (node i32:$src, index)),
+    !if (index,
+         (inst_e64 SRCMODS.OP_SEL_0, $src, 0, 0, SRCMODS.NONE),
+         (inst_e32 $src))
+>;
+
+let SubtargetPredicate = isGFX12Plus in {
+  foreach Index = [0, -1] in {
+    def : Cvt_PK_F32_F8_Pat_OpSel<int_amdgcn_cvt_pk_f32_fp8, Index,
+                                  V_CVT_PK_F32_FP8_e32, V_CVT_PK_F32_FP8_OP_SEL_e64>;
+    def : Cvt_PK_F32_F8_Pat_OpSel<int_amdgcn_cvt_pk_f32_bf8, Index,
+                                  V_CVT_PK_F32_BF8_e32, V_CVT_PK_F32_BF8_OP_SEL_e64>;
+  }
 }
 
 let SubtargetPredicate = isGFX10Plus in {
@@ -854,6 +928,20 @@ multiclass VOP1_Real_NO_DPP_OP_SEL_with_name<GFXGen Gen, bits<9> op,
   VOP3_Real_with_name<Gen, {0, 1, 1, op{6-0}}, opName, asmName>;
 
 
+// Define VOP1 instructions using the pseudo instruction with its old profile and
+// VOP3 using the OpSel profile for the pseudo instruction.
+defm V_CVT_F32_FP8      : VOP1_Real_NO_VOP3_with_name_gfx12<0x06c, "V_CVT_F32_FP8", "v_cvt_f32_fp8">;
+defm V_CVT_F32_FP8      : VOP1_Realtriple_e64_with_name<GFX12Gen, 0x06c, "V_CVT_F32_FP8_OP_SEL", "v_cvt_f32_fp8">;
+
+defm V_CVT_F32_BF8      : VOP1_Real_NO_VOP3_with_name_gfx12<0x06d, "V_CVT_F32_BF8", "v_cvt_f32_bf8">;
+defm V_CVT_F32_BF8      : VOP1_Realtriple_e64_with_name<GFX12Gen, 0x06d, "V_CVT_F32_BF8_OP_SEL", "v_cvt_f32_bf8">;
+
+defm V_CVT_PK_F32_FP8   : VOP1_Real_e32_with_name<GFX12Gen, 0x06e, "V_CVT_PK_F32_FP8", "v_cvt_pk_f32_fp8">;
+defm V_CVT_PK_F32_FP8   : VOP3_Real_with_name<GFX12Gen, 0x1ee, "V_CVT_PK_F32_FP8_OP_SEL", "v_cvt_pk_f32_fp8">;
+
+defm V_CVT_PK_F32_BF8   : VOP1_Real_e32_with_name<GFX12Gen, 0x06f, "V_CVT_PK_F32_BF8", "v_cvt_pk_f32_bf8">;
+defm V_CVT_PK_F32_BF8   : VOP3_Real_with_name<GFX12Gen, 0x1ef, "V_CVT_PK_F32_BF8_OP_SEL", "v_cvt_pk_f32_bf8">;
+
 defm V_CVT_NEAREST_I32_F32 : VOP1_Real_FULL_with_name_gfx11_gfx12<0x00c,
   "V_CVT_RPI_I32_F...
[truncated]

llvm/lib/Target/AMDGPU/AMDGPU.td

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll

jayfoad · 2024-01-17T16:02:05Z

@Sisyph and @kosarev for the assembler/disassembler parts.

kosarev · 2024-01-18T08:59:20Z

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

+    Op.addRegOrImmWithFPInputModsOperands(Inst, 1); // src0
+    // Add dummy src1
+    Inst.addOperand(MCOperand::createImm(0));
+    Inst.addOperand(MCOperand::createReg(AMDGPU::getMCReg(0, getSTI())));


Probably no need to call getMCReg() for NoRegister?

Are these dummy operands really necessary? By having them, we just seem to give ourselves more work handling them with custom code. NoRegister register operands look a bit weird.

Yes, I think we can remove getMCReg(). I will change in other places also.

We will need these dummy operands when doing cvtVOP3P here:

llvm-project/llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

Line 8414 in 99cae9a

OpSel = Inst.getOperand(OpSelIdx).getImm();

. Without them we will crash on out-of-bound access.

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

llvm/lib/Target/AMDGPU/Disassembler/AMDGPUDisassembler.cpp

jayfoad · 2024-01-19T13:56:43Z

Can you add a GFX12 RUN line to clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl? That will probably require adding "fp8-conversion-insts" to the GFX12 part of TargetParser.cpp. You can do this in a separate patch if you want.

mariusz-sikora-at-amd · 2024-01-19T15:50:39Z

Can you add a GFX12 RUN line to clang/test/CodeGenOpenCL/builtins-amdgcn-fp8.cl? That will probably require adding "fp8-conversion-insts" to the GFX12 part of TargetParser.cpp. You can do this in a separate patch if you want.

Done

arsenm

Why is so there so much special casing in the assembler/disassembler?

mariusz-sikora-at-amd · 2024-01-22T10:02:05Z

Why is so there so much special casing in the assembler/disassembler?

I'm not an original author of these change, but from what I understand it is a workaround to handle VOP3 instructions which have a single source but require the use of two bits from OPSEL.
V_CVT_F32_FP8 has one source but is using two bits from OPSEL to specify which part from 32 bit register to convert ([7:0], [15:8], [23: 16] or 31 : 24]). And since OPSELs are correlated with sources/destination (one bit from OPSEL with one soruce/destination) these is required without any deeper changes to TableGen.

I'm open to change TableGen, but I would prefer to create new ticket and do it with new PR. These change may take longer than one day and we would like to have these PR merged before LLVM branching.

mbrkusanin · 2024-01-22T10:05:27Z

Why is so there so much special casing in the assembler/disassembler?

I'm not an original author of these change, but from what I understand it is a workaround to handle VOP3 instructions which have a single source but require the use of two bits from OPSEL. V_CVT_F32_FP8 has one source but is using two bits from OPSEL to specify which part from 32 bit register to convert ([7:0], [15:8], [23: 16] or 31 : 24]). And since OPSELs are correlated with sources/destination (one bit from OPSEL with one soruce/destination) these is required without any deeper changes to TableGen.

I'm open to change TableGen, but I would prefer to create new ticket and do it with new PR. These change may take longer than one day and we would like to have these PR merged before LLVM branching.

Correct, some of these instructions use opsel[1] which in LLVM in stored in src1_modifiers so a dummy src1 is used. And as far as I know we can not have src1_modfiers without src1 operand.

mbrkusanin · 2024-01-22T10:22:37Z

Why is so there so much special casing in the assembler/disassembler?

I'm not an original author of these change, but from what I understand it is a workaround to handle VOP3 instructions which have a single source but require the use of two bits from OPSEL. V_CVT_F32_FP8 has one source but is using two bits from OPSEL to specify which part from 32 bit register to convert ([7:0], [15:8], [23: 16] or 31 : 24]). And since OPSELs are correlated with sources/destination (one bit from OPSEL with one soruce/destination) these is required without any deeper changes to TableGen.
I'm open to change TableGen, but I would prefer to create new ticket and do it with new PR. These change may take longer than one day and we would like to have these PR merged before LLVM branching.

Correct, some of these instructions use opsel[1] which in LLVM in stored in src1_modifiers so a dummy src1 is used. And as far as I know we can not have src1_modfiers without src1 operand.

Similarly V_CVT_SR_BF8_F32 for example uses opsel[2] and opsel[3] so we need src2_modifiers and src2.

llvm/lib/Target/AMDGPU/VOPInstructions.td

Sisyph

DPP changes look good, and functionally I'm fine with the patch.

I don't think the tablegen 'bit IsFP8' version of managing the op_sel bits is any better than adding a fake src1. It doesn't scale up to any more op_sel bits (Hence why we can't use it for V_CVT_SR_BF8_F32_e64_dpp_gfx12) and it is a new abstraction, whereas we have many instances of fake src operands already. Consider it a +1 but not +2 from me as is, based on that.

arsenm · 2024-01-24T00:20:03Z

llvm/lib/Target/AMDGPU/AsmParser/AMDGPUAsmParser.cpp

+    bool IsVOP3CvtSrDpp = Opc == AMDGPU::V_CVT_SR_BF8_F32_e64_dpp8_gfx12 ||
+                          Opc == AMDGPU::V_CVT_SR_FP8_F32_e64_dpp8_gfx12 ||
+                          Opc == AMDGPU::V_CVT_SR_BF8_F32_e64_dpp_gfx12 ||
+                          Opc == AMDGPU::V_CVT_SR_FP8_F32_e64_dpp_gfx12;


I don't want to hold this up for the release, but I do think this needs to be revisited. We should really avoid having more random lists of opcodes

Thanks, I will prepare different PRs to cover this and what Joe pointed out.

llvm#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com> (cherry picked from commit cfddb59)

#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com> (cherry picked from commit cfddb59)

llvm#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com> (cherry picked from commit cfddb59)

llvm#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com> Change-Id: I62e37982868d9f5b400bf794b82c59ae530080ed

llvm#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com> (cherry picked from commit cfddb59)

@topperc

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com> [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (llvm#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com> (cherry picked from commit cfddb59) [RISCV] Support __riscv_v_fixed_vlen for vbool types. (llvm#76551) This adopts a similar behavior to AArch64 SVE, where bool vectors are represented as a vector of chars with 1/8 the number of elements. This ensures the vector always occupies a power of 2 number of bytes. A consequence of this is that vbool64_t, vbool32_t, and vool16_t can only be used with a vector length that guarantees at least 8 bits. [Docs] Fix documentation build. Missing ending `` after c92ad41 Backport '[clang] static operators should evaluate object argument (reland)' to release/18.x (llvm#80109) Cherry picked from commit ee01a2c. Closes llvm#80041, backport llvm#80108. Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com> Co-authored-by: cor3ntin <corentinjabot@gmail.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com> PR for llvm#79568 (llvm#80120) Backporting llvm#79568 to clang 18. [docs] Add release notes for Windows specific changes in 18.x (llvm#80011) [AArch64] Add some release notes items (llvm#79983) [C++20] [Modules] Don't perform ODR checks in GMF Close llvm#79240. See the linked issue for details. Given the frequency of issue reporting about false positive ODR checks (I received private issue reports too), I'd like to backport this to 18.x too. [clang] Fix unexpected `-Wconstant-logical-operand` in C23 (llvm#80724) C23 has `bool`, but logical operators still return int. Check that we're not in C to avoid false-positive -Wconstant-logical-operand. Fixes llvm#64356 (cherry picked from commit a18e92d) [18.x][Docs] Add release note about Clang-defined target OS macros (llvm#80044) The change is included in the 18.x release. Move the release note to the release branch and reformat. (cherry picked from commit b40d5b1) ReleaseNotes: mention -mtls-dialect=desc (llvm#82731) [Clang] Fixes to immediate-escalating functions (llvm#82281) * Consider that immediate escalating function can appear at global scope, fixing a crash * Lambda conversion to function pointer was sometimes not performed in an immediate function context when it should be. Fixes llvm#82258 (cherry picked from commit baf6bd3) [Clang] [Sema] Handle placeholders in '.*' expressions (llvm#83103) When analysing whether we should handle a binary expression as an overloaded operator call or a builtin operator, we were calling `checkPlaceholderForOverload()`, which takes care of any placeholders that are not overload sets—which would usually make sense since those need to be handled as part of overload resolution. Unfortunately, we were also doing that for `.*`, which is not overloadable, and then proceeding to create a builtin operator anyway, which would crash if the RHS happened to be an unresolved overload set (due hitting an assertion in `CreateBuiltinBinOp()`—specifically, in one of its callees—in the `.*` case that makes sure its arguments aren’t placeholders). This pr instead makes it so we check for *all* placeholders early if the operator is `.*`. It’s worth noting that, 1. In the `.*` case, we now additionally also check for *any* placeholders (not just non-overload-sets) in the LHS; this shouldn’t make a difference, however—at least I couldn’t think of a way to trigger the assertion with an overload set as the LHS of `.*`; it is worth noting that the assertion in question would also complain if the LHS happened to be of placeholder type, though. 2. There is another case in which we also don’t perform overload resolution—namely `=` if the LHS is not of class or enumeration type after handling non-overload-set placeholders—as in the `.*` case, but similarly to 1., I first couldn’t think of a way of getting this case to crash, and secondly, `CreateBuiltinBinOp()` doesn’t seem to care about placeholders in the LHS or RHS in the `=` case (from what I can tell, it, or rather one of its callees, only checks that the LHS is not a pseudo-object type, but those will have already been handled by the call to `checkPlaceholderForOverload()` by the time we get to this function), so I don’t think this case suffers from the same problem. This fixes llvm#53815. --------- Co-authored-by: Aaron Ballman <aaron@aaronballman.com> [InstCombine] Fix miscompilation in PR83947 (llvm#83993) https://github.com/llvm/llvm-project/blob/762f762504967efbe159db5c737154b989afc9bb/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L394-L407 Comment from @topperc: > This transforms assumes the mask is a non-zero splat. We only know its a splat and not provably all 0s. The mask is a constexpr that includes the address of the global variable. We can't resolve the constant expression to an exact value. Fixes llvm#83947. SystemZ release notes for 18.x. (llvm#84560) Remove support for EXPORTAS in def files to maintain ABI compatibility for COFFShortExport [clang][Sema] Fix a CTAD regression after 42239d2 (llvm#86914) The most recent declaration of a template as a friend can introduce a different template parameter depth compared to what we anticipate from a CTAD guide. Fixes llvm#86769 [clang] Avoid -Wshadow warning when init-capture named same as class field (llvm#74512) Shadowing warning doesn't make much sense since field is not available in lambda's body without capturing this. Fixes llvm#71976 [SLP]Fix a crash if the argument of call was affected by minbitwidth analysis. Need to support proper type conversion for function arguments to avoid compiler crash. Fix override keyword being print to the left side Previously, the `override` keyword in C++ was being print in the left side of a method decl, which is unsupported by C++ standard. This commit fixes that by setting the `CanPrintOnLeft` field to 0, forcing it to be print on the right side of the decl. Signed-off-by: Giuliano Belinassi <gbelinassi@suse.de> [clang codegen] Fix MS ABI detection of user-provided constructors. (llvm#90151) In the context of determining whether a class counts as an "aggregate", a constructor template counts as a user-provided constructor. Fixes llvm#86384 (cherry picked from commit 3ab4ae9) release/18.x: [libclc] Fix linking against libIRReader Fixes llvm#91551 Update llvm/test/Transforms/InstCombine/bit_ceil.ll Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com> [RISCV] Add a unaligned-scalar-mem feature like we had in clang 17. This is ORed with the fast-unaligned-access feature which applies to scalar and vector together.: Regular squash. Cobol/PlI changes from 785ddc60@https://gitlab.phidani.be/Chirag.Patel/lldb.git Cobol/PLI support added from 6cefb217f097ac@https://gitlab.phidani.be/Chirag.Patel/llvm.git [LLDB] added lldb rpmbuild spec file [RPMBuild] Added lldb rpmbuild support. [LLDBRpm] added version for yum update. [lldb_rpm] minor cleanup. Build fix rleated to RTTI. build fix. [DWARFASTParserLegacy] Initial support for union type. [lldb][LegacyTypeSystem] Changed struct/union member offset from bytes to bits to support DW_AT_data_bit_offset. [lldbrpm] changed version suffix. [LegacyASTContext] Fix var string length display. [LLDB][CobolUserExpression] Added AST node Function call place holder. build fix. [LLVM][Test] Fixed assembler round trip test. [LLDB][CobolUserExpression] Added ast evaluation for sizeof operator. [LLDB][CobolUserExpression] Added placeholder in parser for func call. [LLDB][CobolUserExpression] Added sizeof operator, temporary placeholder for LENGTH OF. [LLDB][PLIUserExpression] Fixed array indexing. [LLDB][ValueObjectPrinter] Skip summary if custom format is requested. [LLDB][PLILanguage] changed bitset size, read from array type name. [LLDB][ValueObject] Fixed pli var string length read. [LLDB][PLILanguage] Fixed support for var string summary formatter. [LLVM][DIBuilder][C-API] Added changes to add lexical scope info for auto variable for functions. [LLVM][AsmCodegen] Added raincode extention AT_lexical_scope. [LLDB][SymbolFileDWARF] Added support for RAINCODE_lexical_scope attribute. [LLDB][PLIUserExpression] Added sizeof operator support, it will be renamed to proper functiona name later. [LLDB][ValueObject] minor cleanup. [LLDB][PLIUserExpression] Added STORAGE/STG builtin func support. [LLDB][PLIUserExpression][CobolUserExpression] Added LENGTH() for Cobol and STG/STORAGE() for PL/I, removed sizeof operator for both. [LLDB][DWARFASTParserLegacy] Added placeholder for DW_TAG_reference_type. [LLVM][CodeGen][AsmPrinter] Added DW_AT_name attribute to TAG_array_type. [LLDB][DWARFExpression] minor cleanup. [LLDB][PLILanguage] Fixed bitset array size read from array type. [LLDB][LegacyASTContext] Moved bit array calculation to type system. [LLVM][AsmPrinter] Fixed TAG_array duplicate attribute export. [LLVM][C-API] Changes array type api. [LLDB][PLI/Cobol UserExpression] Fixed array indexing, removed c style [] index access. [LLDB][PLIUserExpression] minor fix. [LLDB] Fixed bug relating struct member name access for Cobol/PL1. fixed coding style/whitespace/typos. [LLDB][LegacyTypeSystem] Place holder beautification for edited types. [LLDB] rpm version upgrade. [LLDB][CobolUserExpression] Fixed pic string ref modifier. [LLDB][TypeSystem] Added support to mutate existing type length, fixed cobolUserExpression refmod for display type. [LLDB][CobolUserExpression] Fixed lower bound ref modifier. [LLDB][Cobol/PLI UserExpression] Fixed error while searching for var, do not fully resolve type. [LLDB][RPM] fixed rpm version. cleanup, reduce number of changes from trunk. cleanup, reduce number of changes fomr trunk. build fix. Build fix. Build fix. [LegacyASTContext] Fixed bug relating packed decimal going through ebcdic iconv. [LLVM][DebugInfo] Fixed DIExpression node uniquness issue. build fix. cleanup. Refactoring, moved LegacyASTContext to Plugin typeSystem Legacy. Refactoring, renamed TypeSystem class. build fix. build fix, cleanup. cleanup fixed assert failure. [CobolUserExpression] Added placeholder for simpe assignment operator. [CobolUserExpression] Added basic support for assigment to variables. [CobolUserExpression] Added cobol move-to set-to syntex support. [LegacyTypeSystem][CobolUserExpression] Added literal type double,string. Added TypeSystem Encoding helper functions. [CobolLexer] Added string,float literal type support. [PLIUserExpression] Added assignment operator support. [CobolUserExpression] Added assignment comp-3 support. [PLIUserExpression] Assignment added pli display type support. temporary build fix. DIExpression asmprinter, print null for invalid entry. [Cobol/PLI UserExpression] Assignment endianity bug fix. [Cobol/PLIUserExpression] fixed data extractpr assert, fixed int precision convertion written as zero. [Cobol/PLIUserExpression] Assignment display type assert failure. build fix. [PLIUserExpression] Added support for var string assignment. [LegacyUserExpression] Simple semantic check place-holder. [CobolUserexpression] Assignment expression fixed display, array types. [TypeSystem] fixed encode int precision bug for i64 to i32. [PLI/Cobol UserExpression] fixed support for assinment into refmod data type. [CobolUserExpression] Fixed assignment display/comp-3 regression. temporary build fix. python3 lib on server needs few changes for this build. [TypeSystemLegacy] Fixed edited type display, skip formatting for edited type. [lldbrpm] package lldb-python-script too. [CobolUserExpression] Fixed SelctorOf expression with array index access e.g (lldb)p LastName1 of VAR(1) of TAB. [CobolUserExpression] Assignment to packed decimal fixed, added digit count read support from dwarf instead of runtime calculation. [CoboUserExpression] Fixed assignment string invalid byte order. [PLIUserExpression] Fixed string padding with space. [CobolUserExpression] Fixed Assignment string space padding. [LegacyTypeSysten] fixed crash in encoding due to long length the assignment. [CobolUserExpression][PLIUserExpression] fixed segfault. [DebugInfo] export identifier case as insensitive for PLI/Cobol compiled units. [TypeSystemLegacy] Fixed minor bug with dataencoding. rebase build fix. [StackFrame] fixed support for cobol/pli modref select syntex. case-insensitive breakpoint resolution for PLI/Cobol languages. cleanup. build fix. added initial support for TAG_dynamic_type. added c/c++ api to create dynamic type debug info. [DebugInfo] Added support to generate dwarf attribute DW_AT_allocated for DW_TAG_dynamic_type [PLIUserExpression][CobolUserExpression] Fixed name variable lookup for few cases. [DWARFASTParserLegacy] Initial support to parse TAG_dynamic_type. [AsmPrinter] Fix minor mistake for TAG_dynamic DW_AT_allocated. [TypeSystemLegacy] Added dynamic type place holder. [LLVM][AsmPrinter] Allow OP_call2/4 expression on local variable location. build fix. [LLDB][CompilerType] Added support to fetch dynamic type info. [lldb][ValueObjectVariable] Added dynamic variable read support. [LLDB][ValueObjectVariable] Added allocated check for dynamic types. [LLDB][ValueObjectVariable] fixed TAG_dynamic type attributes optional. [LLVM][DebugInfoMetadata] Fixed minor function call. [LLDB][TypeSystemLegacy] Added dynamic type info support. build fix. for jekins, use python3 sharedlibs lldbrpm use python3. temporary build fix. [LLDB][DWARFExpression] Added temporary operation extension for address calculation with file address in dwarf v5. [LLVM][CodeGen] Fixed dynamic type dwarf expression call2/call4 assert. [LLVM][Verifier] Added dynamic type check. [LLVM][Verifier] Added debugInfo verifier dynamic type extra checks. [LLDB][TypeSystemLegacy] Added check to avoid direct nested dynaic types. [LLVM][DebugInfo] Adding DW_OP_call2/4 support in TAG_subrange attributes DW_AT_lower_bound, DW_AT_upper_bound. [LLDB] Added option to hide frames with invalid line entry target.hide-invalid-legacy-frames, this is a temporary placeholder and it will be moved to more suitable location in future. [LLDB][DataFormatters] Fixed printing of char arrays with non-default format. [LLDB][StackFrame] Added check for member name lookup to reject array of structs. [lldb][DataFormatters] fixed multi-dimesional string formatting. [LLDB][ValueObjectVariable] cleanup: proper error message. [LLVM][DwarfUnit] Added DW_OP_call2/call4 support for array type. [LLVM][DwarfCompileUnit] fixed assert failure with DW_OP_call2/call4. [DIBuilder] Added DW_AT_static_link support. [LLVM][C-API][DebugInfo] Added support for DW_AT_static_link. [DebugInfo] fixed minor bug with Staticlink attribute generation. [DebugInfo] static link cleanup. rebase build fix. [LLDB][DWARFParser] Added initial support to parse DW_AT_static_link. [LLDB][StackFrame] Added support to read static link address. [LLDB][StackFrameList] Added helper function to search stack list using static link. [LLDB][ValueObjectPrinter] regression fix for hex format value print. [LLDB] build fix. [LLDB][ExpressionParser] bug fixed for positive int expression e.g. p move +3 to var. [LLDB][TypeSystemLegacy] Fixed bcd signed preferred value encoding. [LLVM][DebuggerTuning] default tune for lldb. [LLDB][TypeSystemLegacy] iconv try approximate and ignore if not possible, for character decoding. rebase build fix. [LLDB][CobolUserExpression][PLIUserExpression] fixed variable name overwriting. [LLDB][UserExpression] Temporary revert variable name bug. rebase build fix. rebase build fix. rebase build fix. initial placeholder for DW_AT_RAINCODE_static_link_recv. [LLDB][CobolUserExpression][PLIUserExpression] fixed variable name overwrite. [LLDB][Test] fixed UnsupportedLanguage test failure. [LLDB][CobolUserExpression] Place holder for compare operations. lldbrpm, temporary skip python dir. [CobolUserExpression] Adding placeholder for equality comparision. [PLIUserExpression] PLILexer, added partial support for comparision operators. [LLDB][DataExtractor] bytes compare func. rebase build fix. rebase build fix. Added DW_AT_RAINCODE_frame_base Patch by Amin! [LLDB][DWARFParser] Added support to parse DW_AT_RAINCODE_frame_base. build fix. [LLVM] Fix dynamic type [LLVM-C][API] Add api to create a dynamic DISubrange [LLDB] Add support for DW_AT_count as a DWARFExpression - Add DWARFExpression in ArrayInfo; - Add LegacyDynamicArray type for dynamic arrays; - Evaluate count expression every time we re-evaluate DW_AT_location. Rebase and fix compilation failures Only print case sensitiveness if source language is Cobol or PL/1. Fixes the following regressions: LLVM :: DebugInfo/X86/dwarf-public-names.ll LLVM :: DebugInfo/X86/length_symbol_difference.ll LLVM :: MC/X86/dwarf-size-field-overflow.test LLVM :: tools/llvm-dwarfdump/X86/statistics.ll (cherry picked from commit ff848081162f81ef3c5d8f447b6c28dd564d4ada) Use correct record size of DIDerivedType Use last index for Annotations replace dyn_cast with dyn_cast_or_null to handle invalid input smoothly Rebasing on LLVM-17-init and fixes regressions LZLANG-2470 valgrind vs. lldb_private::TargetCharsetReader::convert - remove the static buffer_length variable, which may not be big enough. - remove the loop - add lldb console errno logging when there is an iconv error. (cherry picked from commit 120402f28f787a90f65f725307519343b5937fee) LZLANG-2470 Fixes for previous lldb_private::TargetCharsetReader::convert changes. (cherry picked from commit 918c9b62a63b71347ebee5a7ccd0bd42bbdfc118) Lexer Bug Fix COBOL/PLI lexer would return variable name with '\n' at the start. 1155199180 (cherry picked from commit 7266c35747b19a11081b3fab07f6773bfb15fa1f) Ported Abhishek's Fix -Set is_singed for int variables [lldb] Bridge the gap when debugging the variable with command and codelldb (cherry picked from commit d88ad8abed856d239628d4cda3fad393fef1ba0e) Build Fixes after cherry-pick previous commit strings set by codelldb must be enclosed in quotes (cherry picked from commit 0072c09fbe9f5ead6bde25060dc8e9f4265989b3) Bug fix: p var = val in PLI didn't work (cherry picked from commit 9f3d16f85434cbd17e26d429622cd6b557eddacb) Port Abhisheks Fixes -Fix for MOVE val TO VAR [lldb] Added the DemangledNameContainsPath overload for pli/cobol (cherry picked from commit 552cf62d001beb59327e4fb81cd4620ee0d62c55) Fix warnings Fields of a struct array can now be used with `p` e.g FIELD(5) is equivalent to FIELD OF ARR(5) See ticket 1152892604 (cherry picked from commit 5e02341b015fddaca13a674b34228fe2b080a54c) Cobol-style multi-index support added (cherry picked from commit 7b0e7ae494ca2a9799e1f09d87146113de2e0f38) Fixed LENGTH(var) expression -get the size of var from lldb (cherry picked from commit 50657e2e7b2ec81a13764ca0105c130cc95ccfc7) Warning Fixes Make breakpoint Cases Insensitive Fixed Build and Regression failure after rebase Fixed warnings seen during lldb build [lldb] Store real bitwidth from debuginfo in Scalar Type Storing in higher bitwidth than required or specified by debug info creates problem when byteswap is done. Make comparison of breakpoint names case insensitive in `findEntryOffsetInCurrentIndex` 1156642284 typo fix: s/key/Key/ [lldb] Fix DWARFASTParser to correctly parse DW_AT_count for dynamic arrays [lldb] Change the way we look for variables in StackFrame for Legacy Languages 1156032652 [lldb] Bugfix in LENGTH(var) [cobol] and STG(var) [pli] We were encoding 4 bytes of LENGTH data and reading 8 bytes which cause a problem. Using size_t instead of uint32_t fixes the problem. [lldb] Fix cast failure in FindFieldInStructArray Complicated expressions in lldb broke the assumption that the expression is an identifier, thus we got a cast error. This fix removes that from happening and also fixes the bug that if the identifier is an array itself the last index specified in the input is used to index that variable itself. e.g 01 SAMPLE-TABLE. 05 TABLE-DEPTH OCCURS 3 TIMES. 10 TABLE-ROW OCCURS 3 TIMES. 15 TABLE-COLUMN OCCURS 3 TIMES PIC 9(8). Here TABLE-ROW(1, 2) means second element of TABLE-ROW OF TABLE-DEPTH(1). Revert "[lldb] Fix cast failure in FindFieldInStructArray" This reverts commit c1bab0e0b6a798698196434c7bb6cbe391fcdc1b. [lldb] Add support for IBM array-indexing syntax see 1156841764 [lldb] Fix cast error and support non-ibm indexing syntax see 1156841764 [lldb] Fixes After Rebase on llvmorg-18.1.4 [lldb] Fix bug in display of varying PLI strings See 1156884604 The STG function also should include the prefix when counting the size, which for now is 2 bytes for all strings because the PLI compiler doesn't support COMPAT(V3) version. If in the future we do support it, we would need to fix this again. (cherry picked from commit 4b39f3e1b55c3df09f5cb89dcdd347682f790ba9) [lldb] Add basic support for Level88 conditions [lldb] Add support for calling the runtime function rc_cob_level88 directly from the "p" command [lldb] Print the value of level88 variables as true/false with parent name. Prints the value of level88 condition names by calling the runtime functions and formatting it nicely. [lldb] Add support for indexed level88 variables [lldb] Fixes After Rebase on llvm main [LLDB] Preparation for upstream

@topperc

Co-authored-by: Petar Avramovic <Petar.Avramovic@amd.com> Co-authored-by: Piotr Sobczak <piotr.sobczak@amd.com> [AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… (llvm#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com> (cherry picked from commit cfddb59) [RISCV] Support __riscv_v_fixed_vlen for vbool types. (llvm#76551) This adopts a similar behavior to AArch64 SVE, where bool vectors are represented as a vector of chars with 1/8 the number of elements. This ensures the vector always occupies a power of 2 number of bytes. A consequence of this is that vbool64_t, vbool32_t, and vool16_t can only be used with a vector length that guarantees at least 8 bits. [Docs] Fix documentation build. Missing ending `` after c92ad41 Backport '[clang] static operators should evaluate object argument (reland)' to release/18.x (llvm#80109) Cherry picked from commit ee01a2c. Closes llvm#80041, backport llvm#80108. Co-authored-by: Shafik Yaghmour <shafik@users.noreply.github.com> Co-authored-by: cor3ntin <corentinjabot@gmail.com> Co-authored-by: Aaron Ballman <aaron@aaronballman.com> PR for llvm#79568 (llvm#80120) Backporting llvm#79568 to clang 18. [docs] Add release notes for Windows specific changes in 18.x (llvm#80011) [AArch64] Add some release notes items (llvm#79983) [C++20] [Modules] Don't perform ODR checks in GMF Close llvm#79240. See the linked issue for details. Given the frequency of issue reporting about false positive ODR checks (I received private issue reports too), I'd like to backport this to 18.x too. [clang] Fix unexpected `-Wconstant-logical-operand` in C23 (llvm#80724) C23 has `bool`, but logical operators still return int. Check that we're not in C to avoid false-positive -Wconstant-logical-operand. Fixes llvm#64356 (cherry picked from commit a18e92d) [18.x][Docs] Add release note about Clang-defined target OS macros (llvm#80044) The change is included in the 18.x release. Move the release note to the release branch and reformat. (cherry picked from commit b40d5b1) ReleaseNotes: mention -mtls-dialect=desc (llvm#82731) [Clang] Fixes to immediate-escalating functions (llvm#82281) * Consider that immediate escalating function can appear at global scope, fixing a crash * Lambda conversion to function pointer was sometimes not performed in an immediate function context when it should be. Fixes llvm#82258 (cherry picked from commit baf6bd3) [Clang] [Sema] Handle placeholders in '.*' expressions (llvm#83103) When analysing whether we should handle a binary expression as an overloaded operator call or a builtin operator, we were calling `checkPlaceholderForOverload()`, which takes care of any placeholders that are not overload sets—which would usually make sense since those need to be handled as part of overload resolution. Unfortunately, we were also doing that for `.*`, which is not overloadable, and then proceeding to create a builtin operator anyway, which would crash if the RHS happened to be an unresolved overload set (due hitting an assertion in `CreateBuiltinBinOp()`—specifically, in one of its callees—in the `.*` case that makes sure its arguments aren’t placeholders). This pr instead makes it so we check for *all* placeholders early if the operator is `.*`. It’s worth noting that, 1. In the `.*` case, we now additionally also check for *any* placeholders (not just non-overload-sets) in the LHS; this shouldn’t make a difference, however—at least I couldn’t think of a way to trigger the assertion with an overload set as the LHS of `.*`; it is worth noting that the assertion in question would also complain if the LHS happened to be of placeholder type, though. 2. There is another case in which we also don’t perform overload resolution—namely `=` if the LHS is not of class or enumeration type after handling non-overload-set placeholders—as in the `.*` case, but similarly to 1., I first couldn’t think of a way of getting this case to crash, and secondly, `CreateBuiltinBinOp()` doesn’t seem to care about placeholders in the LHS or RHS in the `=` case (from what I can tell, it, or rather one of its callees, only checks that the LHS is not a pseudo-object type, but those will have already been handled by the call to `checkPlaceholderForOverload()` by the time we get to this function), so I don’t think this case suffers from the same problem. This fixes llvm#53815. --------- Co-authored-by: Aaron Ballman <aaron@aaronballman.com> [InstCombine] Fix miscompilation in PR83947 (llvm#83993) https://github.com/llvm/llvm-project/blob/762f762504967efbe159db5c737154b989afc9bb/llvm/lib/Transforms/InstCombine/InstCombineCalls.cpp#L394-L407 Comment from @topperc: > This transforms assumes the mask is a non-zero splat. We only know its a splat and not provably all 0s. The mask is a constexpr that includes the address of the global variable. We can't resolve the constant expression to an exact value. Fixes llvm#83947. SystemZ release notes for 18.x. (llvm#84560) Remove support for EXPORTAS in def files to maintain ABI compatibility for COFFShortExport [clang][Sema] Fix a CTAD regression after 42239d2 (llvm#86914) The most recent declaration of a template as a friend can introduce a different template parameter depth compared to what we anticipate from a CTAD guide. Fixes llvm#86769 [clang] Avoid -Wshadow warning when init-capture named same as class field (llvm#74512) Shadowing warning doesn't make much sense since field is not available in lambda's body without capturing this. Fixes llvm#71976 [SLP]Fix a crash if the argument of call was affected by minbitwidth analysis. Need to support proper type conversion for function arguments to avoid compiler crash. Fix override keyword being print to the left side Previously, the `override` keyword in C++ was being print in the left side of a method decl, which is unsupported by C++ standard. This commit fixes that by setting the `CanPrintOnLeft` field to 0, forcing it to be print on the right side of the decl. Signed-off-by: Giuliano Belinassi <gbelinassi@suse.de> [clang codegen] Fix MS ABI detection of user-provided constructors. (llvm#90151) In the context of determining whether a class counts as an "aggregate", a constructor template counts as a user-provided constructor. Fixes llvm#86384 (cherry picked from commit 3ab4ae9) release/18.x: [libclc] Fix linking against libIRReader Fixes llvm#91551 Update llvm/test/Transforms/InstCombine/bit_ceil.ll Co-authored-by: Yingwei Zheng <dtcxzyw@qq.com> [RISCV] Add a unaligned-scalar-mem feature like we had in clang 17. This is ORed with the fast-unaligned-access feature which applies to scalar and vector together.: Regular squash. Cobol/PlI changes from 785ddc60@https://gitlab.phidani.be/Chirag.Patel/lldb.git Cobol/PLI support added from 6cefb217f097ac@https://gitlab.phidani.be/Chirag.Patel/llvm.git [LLDB] added lldb rpmbuild spec file [RPMBuild] Added lldb rpmbuild support. [LLDBRpm] added version for yum update. [lldb_rpm] minor cleanup. Build fix rleated to RTTI. build fix. [DWARFASTParserLegacy] Initial support for union type. [lldb][LegacyTypeSystem] Changed struct/union member offset from bytes to bits to support DW_AT_data_bit_offset. [lldbrpm] changed version suffix. [LegacyASTContext] Fix var string length display. [LLDB][CobolUserExpression] Added AST node Function call place holder. build fix. [LLVM][Test] Fixed assembler round trip test. [LLDB][CobolUserExpression] Added ast evaluation for sizeof operator. [LLDB][CobolUserExpression] Added placeholder in parser for func call. [LLDB][CobolUserExpression] Added sizeof operator, temporary placeholder for LENGTH OF. [LLDB][PLIUserExpression] Fixed array indexing. [LLDB][ValueObjectPrinter] Skip summary if custom format is requested. [LLDB][PLILanguage] changed bitset size, read from array type name. [LLDB][ValueObject] Fixed pli var string length read. [LLDB][PLILanguage] Fixed support for var string summary formatter. [LLVM][DIBuilder][C-API] Added changes to add lexical scope info for auto variable for functions. [LLVM][AsmCodegen] Added raincode extention AT_lexical_scope. [LLDB][SymbolFileDWARF] Added support for RAINCODE_lexical_scope attribute. [LLDB][PLIUserExpression] Added sizeof operator support, it will be renamed to proper functiona name later. [LLDB][ValueObject] minor cleanup. [LLDB][PLIUserExpression] Added STORAGE/STG builtin func support. [LLDB][PLIUserExpression][CobolUserExpression] Added LENGTH() for Cobol and STG/STORAGE() for PL/I, removed sizeof operator for both. [LLDB][DWARFASTParserLegacy] Added placeholder for DW_TAG_reference_type. [LLVM][CodeGen][AsmPrinter] Added DW_AT_name attribute to TAG_array_type. [LLDB][DWARFExpression] minor cleanup. [LLDB][PLILanguage] Fixed bitset array size read from array type. [LLDB][LegacyASTContext] Moved bit array calculation to type system. [LLVM][AsmPrinter] Fixed TAG_array duplicate attribute export. [LLVM][C-API] Changes array type api. [LLDB][PLI/Cobol UserExpression] Fixed array indexing, removed c style [] index access. [LLDB][PLIUserExpression] minor fix. [LLDB] Fixed bug relating struct member name access for Cobol/PL1. fixed coding style/whitespace/typos. [LLDB][LegacyTypeSystem] Place holder beautification for edited types. [LLDB] rpm version upgrade. [LLDB][CobolUserExpression] Fixed pic string ref modifier. [LLDB][TypeSystem] Added support to mutate existing type length, fixed cobolUserExpression refmod for display type. [LLDB][CobolUserExpression] Fixed lower bound ref modifier. [LLDB][Cobol/PLI UserExpression] Fixed error while searching for var, do not fully resolve type. [LLDB][RPM] fixed rpm version. cleanup, reduce number of changes from trunk. cleanup, reduce number of changes fomr trunk. build fix. Build fix. Build fix. [LegacyASTContext] Fixed bug relating packed decimal going through ebcdic iconv. [LLVM][DebugInfo] Fixed DIExpression node uniquness issue. build fix. cleanup. Refactoring, moved LegacyASTContext to Plugin typeSystem Legacy. Refactoring, renamed TypeSystem class. build fix. build fix, cleanup. cleanup fixed assert failure. [CobolUserExpression] Added placeholder for simpe assignment operator. [CobolUserExpression] Added basic support for assigment to variables. [CobolUserExpression] Added cobol move-to set-to syntex support. [LegacyTypeSystem][CobolUserExpression] Added literal type double,string. Added TypeSystem Encoding helper functions. [CobolLexer] Added string,float literal type support. [PLIUserExpression] Added assignment operator support. [CobolUserExpression] Added assignment comp-3 support. [PLIUserExpression] Assignment added pli display type support. temporary build fix. DIExpression asmprinter, print null for invalid entry. [Cobol/PLI UserExpression] Assignment endianity bug fix. [Cobol/PLIUserExpression] fixed data extractpr assert, fixed int precision convertion written as zero. [Cobol/PLIUserExpression] Assignment display type assert failure. build fix. [PLIUserExpression] Added support for var string assignment. [LegacyUserExpression] Simple semantic check place-holder. [CobolUserexpression] Assignment expression fixed display, array types. [TypeSystem] fixed encode int precision bug for i64 to i32. [PLI/Cobol UserExpression] fixed support for assinment into refmod data type. [CobolUserExpression] Fixed assignment display/comp-3 regression. temporary build fix. python3 lib on server needs few changes for this build. [TypeSystemLegacy] Fixed edited type display, skip formatting for edited type. [lldbrpm] package lldb-python-script too. [CobolUserExpression] Fixed SelctorOf expression with array index access e.g (lldb)p LastName1 of VAR(1) of TAB. [CobolUserExpression] Assignment to packed decimal fixed, added digit count read support from dwarf instead of runtime calculation. [CoboUserExpression] Fixed assignment string invalid byte order. [PLIUserExpression] Fixed string padding with space. [CobolUserExpression] Fixed Assignment string space padding. [LegacyTypeSysten] fixed crash in encoding due to long length the assignment. [CobolUserExpression][PLIUserExpression] fixed segfault. [DebugInfo] export identifier case as insensitive for PLI/Cobol compiled units. [TypeSystemLegacy] Fixed minor bug with dataencoding. rebase build fix. [StackFrame] fixed support for cobol/pli modref select syntex. case-insensitive breakpoint resolution for PLI/Cobol languages. cleanup. build fix. added initial support for TAG_dynamic_type. added c/c++ api to create dynamic type debug info. [DebugInfo] Added support to generate dwarf attribute DW_AT_allocated for DW_TAG_dynamic_type [PLIUserExpression][CobolUserExpression] Fixed name variable lookup for few cases. [DWARFASTParserLegacy] Initial support to parse TAG_dynamic_type. [AsmPrinter] Fix minor mistake for TAG_dynamic DW_AT_allocated. [TypeSystemLegacy] Added dynamic type place holder. [LLVM][AsmPrinter] Allow OP_call2/4 expression on local variable location. build fix. [LLDB][CompilerType] Added support to fetch dynamic type info. [lldb][ValueObjectVariable] Added dynamic variable read support. [LLDB][ValueObjectVariable] Added allocated check for dynamic types. [LLDB][ValueObjectVariable] fixed TAG_dynamic type attributes optional. [LLVM][DebugInfoMetadata] Fixed minor function call. [LLDB][TypeSystemLegacy] Added dynamic type info support. build fix. for jekins, use python3 sharedlibs lldbrpm use python3. temporary build fix. [LLDB][DWARFExpression] Added temporary operation extension for address calculation with file address in dwarf v5. [LLVM][CodeGen] Fixed dynamic type dwarf expression call2/call4 assert. [LLVM][Verifier] Added dynamic type check. [LLVM][Verifier] Added debugInfo verifier dynamic type extra checks. [LLDB][TypeSystemLegacy] Added check to avoid direct nested dynaic types. [LLVM][DebugInfo] Adding DW_OP_call2/4 support in TAG_subrange attributes DW_AT_lower_bound, DW_AT_upper_bound. [LLDB] Added option to hide frames with invalid line entry target.hide-invalid-legacy-frames, this is a temporary placeholder and it will be moved to more suitable location in future. [LLDB][DataFormatters] Fixed printing of char arrays with non-default format. [LLDB][StackFrame] Added check for member name lookup to reject array of structs. [lldb][DataFormatters] fixed multi-dimesional string formatting. [LLDB][ValueObjectVariable] cleanup: proper error message. [LLVM][DwarfUnit] Added DW_OP_call2/call4 support for array type. [LLVM][DwarfCompileUnit] fixed assert failure with DW_OP_call2/call4. [DIBuilder] Added DW_AT_static_link support. [LLVM][C-API][DebugInfo] Added support for DW_AT_static_link. [DebugInfo] fixed minor bug with Staticlink attribute generation. [DebugInfo] static link cleanup. rebase build fix. [LLDB][DWARFParser] Added initial support to parse DW_AT_static_link. [LLDB][StackFrame] Added support to read static link address. [LLDB][StackFrameList] Added helper function to search stack list using static link. [LLDB][ValueObjectPrinter] regression fix for hex format value print. [LLDB] build fix. [LLDB][ExpressionParser] bug fixed for positive int expression e.g. p move +3 to var. [LLDB][TypeSystemLegacy] Fixed bcd signed preferred value encoding. [LLVM][DebuggerTuning] default tune for lldb. [LLDB][TypeSystemLegacy] iconv try approximate and ignore if not possible, for character decoding. rebase build fix. [LLDB][CobolUserExpression][PLIUserExpression] fixed variable name overwriting. [LLDB][UserExpression] Temporary revert variable name bug. rebase build fix. rebase build fix. rebase build fix. initial placeholder for DW_AT_RAINCODE_static_link_recv. [LLDB][CobolUserExpression][PLIUserExpression] fixed variable name overwrite. [LLDB][Test] fixed UnsupportedLanguage test failure. [LLDB][CobolUserExpression] Place holder for compare operations. lldbrpm, temporary skip python dir. [CobolUserExpression] Adding placeholder for equality comparision. [PLIUserExpression] PLILexer, added partial support for comparision operators. [LLDB][DataExtractor] bytes compare func. rebase build fix. rebase build fix. Added DW_AT_RAINCODE_frame_base Patch by Amin! [LLDB][DWARFParser] Added support to parse DW_AT_RAINCODE_frame_base. build fix. [LLVM] Fix dynamic type [LLVM-C][API] Add api to create a dynamic DISubrange [LLDB] Add support for DW_AT_count as a DWARFExpression - Add DWARFExpression in ArrayInfo; - Add LegacyDynamicArray type for dynamic arrays; - Evaluate count expression every time we re-evaluate DW_AT_location. Rebase and fix compilation failures Only print case sensitiveness if source language is Cobol or PL/1. Fixes the following regressions: LLVM :: DebugInfo/X86/dwarf-public-names.ll LLVM :: DebugInfo/X86/length_symbol_difference.ll LLVM :: MC/X86/dwarf-size-field-overflow.test LLVM :: tools/llvm-dwarfdump/X86/statistics.ll (cherry picked from commit ff848081162f81ef3c5d8f447b6c28dd564d4ada) Use correct record size of DIDerivedType Use last index for Annotations replace dyn_cast with dyn_cast_or_null to handle invalid input smoothly Rebasing on LLVM-17-init and fixes regressions LZLANG-2470 valgrind vs. lldb_private::TargetCharsetReader::convert - remove the static buffer_length variable, which may not be big enough. - remove the loop - add lldb console errno logging when there is an iconv error. (cherry picked from commit 120402f28f787a90f65f725307519343b5937fee) LZLANG-2470 Fixes for previous lldb_private::TargetCharsetReader::convert changes. (cherry picked from commit 918c9b62a63b71347ebee5a7ccd0bd42bbdfc118) Lexer Bug Fix COBOL/PLI lexer would return variable name with '\n' at the start. 1155199180 (cherry picked from commit 7266c35747b19a11081b3fab07f6773bfb15fa1f) Ported Abhishek's Fix -Set is_singed for int variables [lldb] Bridge the gap when debugging the variable with command and codelldb (cherry picked from commit d88ad8abed856d239628d4cda3fad393fef1ba0e) Build Fixes after cherry-pick previous commit strings set by codelldb must be enclosed in quotes (cherry picked from commit 0072c09fbe9f5ead6bde25060dc8e9f4265989b3) Bug fix: p var = val in PLI didn't work (cherry picked from commit 9f3d16f85434cbd17e26d429622cd6b557eddacb) Port Abhisheks Fixes -Fix for MOVE val TO VAR [lldb] Added the DemangledNameContainsPath overload for pli/cobol (cherry picked from commit 552cf62d001beb59327e4fb81cd4620ee0d62c55) Fix warnings Fields of a struct array can now be used with `p` e.g FIELD(5) is equivalent to FIELD OF ARR(5) See ticket 1152892604 (cherry picked from commit 5e02341b015fddaca13a674b34228fe2b080a54c) Cobol-style multi-index support added (cherry picked from commit 7b0e7ae494ca2a9799e1f09d87146113de2e0f38) Fixed LENGTH(var) expression -get the size of var from lldb (cherry picked from commit 50657e2e7b2ec81a13764ca0105c130cc95ccfc7) Warning Fixes Make breakpoint Cases Insensitive Fixed Build and Regression failure after rebase Fixed warnings seen during lldb build [lldb] Store real bitwidth from debuginfo in Scalar Type Storing in higher bitwidth than required or specified by debug info creates problem when byteswap is done. Make comparison of breakpoint names case insensitive in `findEntryOffsetInCurrentIndex` 1156642284 typo fix: s/key/Key/ [lldb] Fix DWARFASTParser to correctly parse DW_AT_count for dynamic arrays [lldb] Change the way we look for variables in StackFrame for Legacy Languages 1156032652 [lldb] Bugfix in LENGTH(var) [cobol] and STG(var) [pli] We were encoding 4 bytes of LENGTH data and reading 8 bytes which cause a problem. Using size_t instead of uint32_t fixes the problem. [lldb] Fix cast failure in FindFieldInStructArray Complicated expressions in lldb broke the assumption that the expression is an identifier, thus we got a cast error. This fix removes that from happening and also fixes the bug that if the identifier is an array itself the last index specified in the input is used to index that variable itself. e.g 01 SAMPLE-TABLE. 05 TABLE-DEPTH OCCURS 3 TIMES. 10 TABLE-ROW OCCURS 3 TIMES. 15 TABLE-COLUMN OCCURS 3 TIMES PIC 9(8). Here TABLE-ROW(1, 2) means second element of TABLE-ROW OF TABLE-DEPTH(1). Revert "[lldb] Fix cast failure in FindFieldInStructArray" This reverts commit c1bab0e0b6a798698196434c7bb6cbe391fcdc1b. [lldb] Add support for IBM array-indexing syntax see 1156841764 [lldb] Fix cast error and support non-ibm indexing syntax see 1156841764 [lldb] Fixes After Rebase on llvmorg-18.1.4 [lldb] Fix bug in display of varying PLI strings See 1156884604 The STG function also should include the prefix when counting the size, which for now is 2 bytes for all strings because the PLI compiler doesn't support COMPAT(V3) version. If in the future we do support it, we would need to fix this again. (cherry picked from commit 4b39f3e1b55c3df09f5cb89dcdd347682f790ba9) [lldb] Add basic support for Level88 conditions [lldb] Add support for calling the runtime function rc_cob_level88 directly from the "p" command [lldb] Print the value of level88 variables as true/false with parent name. Prints the value of level88 condition names by calling the runtime functions and formatting it nicely. [lldb] Add support for indexed level88 variables [lldb] Fixes After Rebase on llvm main [LLDB] Preparation for upstream

llvm#78414) …bf8 instructions Add VOP1, VOP1_DPP8, VOP1_DPP16, VOP3, VOP3_DPP8, VOP3_DPP16 instructions that were supported on GFX940 (MI300): - V_CVT_F32_FP8 - V_CVT_F32_BF8 - V_CVT_PK_F32_FP8 - V_CVT_PK_F32_BF8 - V_CVT_PK_FP8_F32 - V_CVT_PK_BF8_F32 - V_CVT_SR_FP8_F32 - V_CVT_SR_BF8_F32 --------- Co-authored-by: Mateja Marjanovic <mateja.marjanovic@amd.com> Co-authored-by: Mirko Brkušanin <Mirko.Brkusanin@amd.com> (cherry picked from commit cfddb59)

mariusz-sikora-at-amd requested review from jayfoad, arsenm, piotrAMD and mbrkusanin January 17, 2024 09:13

llvmbot added backend:AMDGPU mc Machine (object) code labels Jan 17, 2024

arsenm reviewed Jan 17, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/AMDGPU.td Outdated Show resolved Hide resolved

llvm/test/CodeGen/AMDGPU/llvm.amdgcn.cvt.fp8.dpp.ll Show resolved Hide resolved

Update tests

a91e42a

jayfoad requested review from kosarev and Sisyph January 17, 2024 16:02

kosarev reviewed Jan 18, 2024

View reviewed changes

mariusz-sikora-at-amd added 7 commits January 18, 2024 12:41

Merge main into masikora/gfx12-cvt-fp8

01d8b5e

Update feature naming to FP8ConversionInsts

8e058e0

Update test

6149014

Move instruction closer to place where used

4996c77

Don't use getMCReg

a36de86

Remove VOP1 suffix from function name

afe32a9

Merge main into masikora/gfx12-cvt-fp8

685d767

Add fp8-conversion-insts feature to clang

7de5e75

mariusz-sikora-at-amd added 2 commits January 19, 2024 22:06

Merge main into masikora/gfx12-cvt-fp8

9ad8493

Merge main into masikora/gfx12-cvt-fp8

9a7989f

arsenm reviewed Jan 22, 2024

View reviewed changes

mbrkusanin reviewed Jan 22, 2024

View reviewed changes

llvm/lib/Target/AMDGPU/VOPInstructions.td Show resolved Hide resolved

mbrkusanin added 2 commits January 22, 2024 20:13

Remove dummy operands by Mirko Brkušanin

55d611a

[AMDGPU] Properly check op_sel in GCNDPPCombine

585e918

mariusz-sikora-at-amd mentioned this pull request Jan 23, 2024

[AMDGPU] Properly check op_sel in GCNDPPCombine #79122

Merged

mariusz-sikora-at-amd added 2 commits January 23, 2024 18:45

Merge main into masikora/gfx12-cvt-fp8

a3610b5

Update tests

1db3bf4

Sisyph reviewed Jan 23, 2024

View reviewed changes

arsenm approved these changes Jan 24, 2024

View reviewed changes

Merge main into masikora/gfx12-cvt-fp8

5270f1e

mariusz-sikora-at-amd merged commit cfddb59 into llvm:main Jan 24, 2024
3 of 4 checks passed

pointhex mentioned this pull request May 7, 2024

getStyleDiagHandler #91314

Closed

aemerson mentioned this pull request May 9, 2024

release/18.x: [AArc64][GlobalISel] Fix legalizer assert for G_INSERT_VECTOR_ELT - manual merge #91672

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… #78414

[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… #78414

mariusz-sikora-at-amd commented Jan 17, 2024

llvmbot commented Jan 17, 2024 •

edited

Loading

jayfoad commented Jan 17, 2024

kosarev Jan 18, 2024

mariusz-sikora-at-amd Jan 18, 2024

jayfoad commented Jan 19, 2024

mariusz-sikora-at-amd commented Jan 19, 2024

arsenm left a comment

mariusz-sikora-at-amd commented Jan 22, 2024

mbrkusanin commented Jan 22, 2024 •

edited

Loading

mbrkusanin commented Jan 22, 2024

Sisyph left a comment

arsenm Jan 24, 2024

mariusz-sikora-at-amd Jan 24, 2024

[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… #78414

[AMDGPU][GFX12] VOP encoding and codegen - add support for v_cvt fp8/… #78414

Conversation

mariusz-sikora-at-amd commented Jan 17, 2024

llvmbot commented Jan 17, 2024 • edited Loading

jayfoad commented Jan 17, 2024

kosarev Jan 18, 2024

Choose a reason for hiding this comment

mariusz-sikora-at-amd Jan 18, 2024

Choose a reason for hiding this comment

jayfoad commented Jan 19, 2024

mariusz-sikora-at-amd commented Jan 19, 2024

arsenm left a comment

Choose a reason for hiding this comment

mariusz-sikora-at-amd commented Jan 22, 2024

mbrkusanin commented Jan 22, 2024 • edited Loading

mbrkusanin commented Jan 22, 2024

Sisyph left a comment

Choose a reason for hiding this comment

arsenm Jan 24, 2024

Choose a reason for hiding this comment

mariusz-sikora-at-amd Jan 24, 2024

Choose a reason for hiding this comment

llvmbot commented Jan 17, 2024 •

edited

Loading

mbrkusanin commented Jan 22, 2024 •

edited

Loading