-
Notifications
You must be signed in to change notification settings - Fork 12k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup #109417
Conversation
Some instrumentation of densemap allocations has indicated that these are the variables that most often are filled by a small number of elements, and thus will benefit the most from having inline elements. I picked 16 inline elements at callsites which occasionally have more than 12 elements inserted, four inline elements for callsites where there typically aren't any elements inserted.
@llvm/pr-subscribers-llvm-analysis @llvm/pr-subscribers-llvm-transforms Author: Jeremy Morse (jmorse) Changestl;dr, if we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic: Background: inspired by @SLTozer 's introspective collection of stacktraces for some debug-info things, I've instrumented DenseMap to print where it was allocated and the max number of elements it contained. Run over CTMark and with the addition of some filtering, this has picked out the locations in LLVM where we allocate a DenseMap hashtable off the heap but we could instead get away with using the inline buckets of a SmallDenseMap and avoid calling malloc. I picked 16 inline elements at callsites which occasionally have more than 12 elements inserted, and four inline elements for some callsites where there typically aren't any elements inserted. One drawback of this technique is that it's fully tuned to making the compile-time-tracker happy, so might not be representative in general. Counterpoints would be that CTMark is chosen to have a range of different inputs and is vaguely representative, avoiding allocations is almost always a win, and in scenarios where we will /always/ insert at least one element it makes sense to spend a little stack memory to avoid that. (I've got two more patches that contribute another ~0.3% speedup, but it's now hit diminishing returns). Patch is 32.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109417.diff 14 Files Affected:
diff --git a/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h b/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
index decb33e6af6bcb..c31e663498d5f3 100644
--- a/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
@@ -492,7 +492,7 @@ class MemoryDependenceResults {
const MemoryLocation &Loc, bool isLoad,
BasicBlock *BB,
SmallVectorImpl<NonLocalDepResult> &Result,
- DenseMap<BasicBlock *, Value *> &Visited,
+ SmallDenseMap<BasicBlock *, Value *, 16> &Visited,
bool SkipFirstBlock = false,
bool IsIncomplete = false);
MemDepResult getNonLocalInfoForBlock(Instruction *QueryInst,
diff --git a/llvm/include/llvm/Analysis/SparsePropagation.h b/llvm/include/llvm/Analysis/SparsePropagation.h
index d5805a7314757f..194f4787a8de91 100644
--- a/llvm/include/llvm/Analysis/SparsePropagation.h
+++ b/llvm/include/llvm/Analysis/SparsePropagation.h
@@ -89,7 +89,7 @@ template <class LatticeKey, class LatticeVal> class AbstractLatticeFunction {
/// \p ChangedValues.
virtual void
ComputeInstructionState(Instruction &I,
- DenseMap<LatticeKey, LatticeVal> &ChangedValues,
+ SmallDenseMap<LatticeKey, LatticeVal, 16> &ChangedValues,
SparseSolver<LatticeKey, LatticeVal> &SS) = 0;
/// PrintLatticeVal - Render the given LatticeVal to the specified stream.
@@ -401,7 +401,7 @@ void SparseSolver<LatticeKey, LatticeVal, KeyInfo>::visitPHINode(PHINode &PN) {
// computed from its incoming values. For example, SSI form stores its sigma
// functions as PHINodes with a single incoming value.
if (LatticeFunc->IsSpecialCasedPHI(&PN)) {
- DenseMap<LatticeKey, LatticeVal> ChangedValues;
+ SmallDenseMap<LatticeKey, LatticeVal, 16> ChangedValues;
LatticeFunc->ComputeInstructionState(PN, ChangedValues, *this);
for (auto &ChangedValue : ChangedValues)
if (ChangedValue.second != LatticeFunc->getUntrackedVal())
@@ -456,7 +456,7 @@ void SparseSolver<LatticeKey, LatticeVal, KeyInfo>::visitInst(Instruction &I) {
// Otherwise, ask the transfer function what the result is. If this is
// something that we care about, remember it.
- DenseMap<LatticeKey, LatticeVal> ChangedValues;
+ SmallDenseMap<LatticeKey, LatticeVal, 16> ChangedValues;
LatticeFunc->ComputeInstructionState(I, ChangedValues, *this);
for (auto &ChangedValue : ChangedValues)
if (ChangedValue.second != LatticeFunc->getUntrackedVal())
diff --git a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
index 79504ca7b73c8f..c5fba184cd0850 100644
--- a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
@@ -888,7 +888,7 @@ void MemoryDependenceResults::getNonLocalPointerDependency(
// each block. Because of critical edges, we currently bail out if querying
// a block with multiple different pointers. This can happen during PHI
// translation.
- DenseMap<BasicBlock *, Value *> Visited;
+ SmallDenseMap<BasicBlock *, Value *, 16> Visited;
if (getNonLocalPointerDepFromBB(QueryInst, Address, Loc, isLoad, FromBB,
Result, Visited, true))
return;
@@ -1038,7 +1038,7 @@ bool MemoryDependenceResults::getNonLocalPointerDepFromBB(
Instruction *QueryInst, const PHITransAddr &Pointer,
const MemoryLocation &Loc, bool isLoad, BasicBlock *StartBB,
SmallVectorImpl<NonLocalDepResult> &Result,
- DenseMap<BasicBlock *, Value *> &Visited, bool SkipFirstBlock,
+ SmallDenseMap<BasicBlock *, Value *, 16> &Visited, bool SkipFirstBlock,
bool IsIncomplete) {
// Look up the cached info for Pointer.
ValueIsLoadPair CacheKey(Pointer.getAddr(), isLoad);
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 1d3443588ce60d..b2c2944c57978d 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -2255,7 +2255,7 @@ const SCEV *ScalarEvolution::getAnyExtendExpr(const SCEV *Op,
/// the common case where no interesting opportunities are present, and
/// is also used as a check to avoid infinite recursion.
static bool
-CollectAddOperandsWithScales(DenseMap<const SCEV *, APInt> &M,
+CollectAddOperandsWithScales(SmallDenseMap<const SCEV *, APInt, 16> &M,
SmallVectorImpl<const SCEV *> &NewOps,
APInt &AccumulatedConstant,
ArrayRef<const SCEV *> Ops, const APInt &Scale,
@@ -2753,7 +2753,7 @@ const SCEV *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
// operands multiplied by constant values.
if (Idx < Ops.size() && isa<SCEVMulExpr>(Ops[Idx])) {
uint64_t BitWidth = getTypeSizeInBits(Ty);
- DenseMap<const SCEV *, APInt> M;
+ SmallDenseMap<const SCEV *, APInt, 16> M;
SmallVector<const SCEV *, 8> NewOps;
APInt AccumulatedConstant(BitWidth, 0);
if (CollectAddOperandsWithScales(M, NewOps, AccumulatedConstant,
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 9d8c9119f7719d..88ed2291313c95 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -222,7 +222,7 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
bool IsExiting = false;
std::set<CopyHint> CopyHints;
- DenseMap<unsigned, float> Hint;
+ SmallDenseMap<unsigned, float, 8> Hint;
for (MachineRegisterInfo::reg_instr_nodbg_iterator
I = MRI.reg_instr_nodbg_begin(LI.reg()),
E = MRI.reg_instr_nodbg_end();
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index 6768eeeb4364c8..c1f3d5ac4ff957 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -239,7 +239,7 @@ namespace {
bool IsCheapInstruction(MachineInstr &MI) const;
- bool CanCauseHighRegPressure(const DenseMap<unsigned, int> &Cost,
+ bool CanCauseHighRegPressure(const SmallDenseMap<unsigned, int> &Cost,
bool Cheap);
void UpdateBackTraceRegPressure(const MachineInstr *MI);
@@ -264,7 +264,7 @@ namespace {
void InitRegPressure(MachineBasicBlock *BB);
- DenseMap<unsigned, int> calcRegisterCost(const MachineInstr *MI,
+ SmallDenseMap<unsigned, int> calcRegisterCost(const MachineInstr *MI,
bool ConsiderSeen,
bool ConsiderUnseenAsDef);
@@ -977,10 +977,10 @@ void MachineLICMImpl::UpdateRegPressure(const MachineInstr *MI,
/// If 'ConsiderSeen' is true, updates 'RegSeen' and uses the information to
/// figure out which usages are live-ins.
/// FIXME: Figure out a way to consider 'RegSeen' from all code paths.
-DenseMap<unsigned, int>
+SmallDenseMap<unsigned, int>
MachineLICMImpl::calcRegisterCost(const MachineInstr *MI, bool ConsiderSeen,
bool ConsiderUnseenAsDef) {
- DenseMap<unsigned, int> Cost;
+ SmallDenseMap<unsigned, int> Cost;
if (MI->isImplicitDef())
return Cost;
for (unsigned i = 0, e = MI->getDesc().getNumOperands(); i != e; ++i) {
@@ -1248,7 +1248,7 @@ bool MachineLICMImpl::IsCheapInstruction(MachineInstr &MI) const {
/// Visit BBs from header to current BB, check if hoisting an instruction of the
/// given cost matrix can cause high register pressure.
bool MachineLICMImpl::CanCauseHighRegPressure(
- const DenseMap<unsigned, int> &Cost, bool CheapInstr) {
+ const SmallDenseMap<unsigned, int>& Cost, bool CheapInstr) {
for (const auto &RPIdAndCost : Cost) {
if (RPIdAndCost.second <= 0)
continue;
diff --git a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
index 53ce21906204c8..738319d44d2a53 100644
--- a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
@@ -83,7 +83,7 @@ static unsigned countOperands(SDNode *Node, unsigned NumExpUses,
/// implicit physical register output.
void InstrEmitter::EmitCopyFromReg(SDNode *Node, unsigned ResNo, bool IsClone,
Register SrcReg,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
Register VRBase;
if (SrcReg.isVirtual()) {
// Just use the input register directly!
@@ -187,7 +187,7 @@ void InstrEmitter::CreateVirtualRegisters(SDNode *Node,
MachineInstrBuilder &MIB,
const MCInstrDesc &II,
bool IsClone, bool IsCloned,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
assert(Node->getMachineOpcode() != TargetOpcode::IMPLICIT_DEF &&
"IMPLICIT_DEF should have been handled as a special case elsewhere!");
@@ -266,7 +266,7 @@ void InstrEmitter::CreateVirtualRegisters(SDNode *Node,
/// getVR - Return the virtual register corresponding to the specified result
/// of the specified node.
Register InstrEmitter::getVR(SDValue Op,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
if (Op.isMachineOpcode() &&
Op.getMachineOpcode() == TargetOpcode::IMPLICIT_DEF) {
// Add an IMPLICIT_DEF instruction before every use.
@@ -280,7 +280,7 @@ Register InstrEmitter::getVR(SDValue Op,
return VReg;
}
- DenseMap<SDValue, Register>::iterator I = VRBaseMap.find(Op);
+ VRBaseMapType::iterator I = VRBaseMap.find(Op);
assert(I != VRBaseMap.end() && "Node emitted out of order - late");
return I->second;
}
@@ -318,7 +318,7 @@ InstrEmitter::AddRegisterOperand(MachineInstrBuilder &MIB,
SDValue Op,
unsigned IIOpNum,
const MCInstrDesc *II,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned) {
assert(Op.getValueType() != MVT::Other &&
Op.getValueType() != MVT::Glue &&
@@ -399,7 +399,7 @@ void InstrEmitter::AddOperand(MachineInstrBuilder &MIB,
SDValue Op,
unsigned IIOpNum,
const MCInstrDesc *II,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned) {
if (Op.isMachineOpcode()) {
AddRegisterOperand(MIB, Op, IIOpNum, II, VRBaseMap,
@@ -500,7 +500,7 @@ Register InstrEmitter::ConstrainForSubReg(Register VReg, unsigned SubIdx,
/// EmitSubregNode - Generate machine code for subreg nodes.
///
void InstrEmitter::EmitSubregNode(SDNode *Node,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsClone, bool IsCloned) {
Register VRBase;
unsigned Opc = Node->getMachineOpcode();
@@ -634,7 +634,7 @@ void InstrEmitter::EmitSubregNode(SDNode *Node,
///
void
InstrEmitter::EmitCopyToRegClassNode(SDNode *Node,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
Register VReg = getVR(Node->getOperand(0), VRBaseMap);
// Create the new VReg in the destination class and emit a copy.
@@ -654,7 +654,7 @@ InstrEmitter::EmitCopyToRegClassNode(SDNode *Node,
/// EmitRegSequence - Generate machine code for REG_SEQUENCE nodes.
///
void InstrEmitter::EmitRegSequence(SDNode *Node,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsClone, bool IsCloned) {
unsigned DstRCIdx = Node->getConstantOperandVal(0);
const TargetRegisterClass *RC = TRI->getRegClass(DstRCIdx);
@@ -703,7 +703,7 @@ void InstrEmitter::EmitRegSequence(SDNode *Node,
///
MachineInstr *
InstrEmitter::EmitDbgValue(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
DebugLoc DL = SD->getDebugLoc();
assert(cast<DILocalVariable>(SD->getVariable())
->isValidLocationForIntrinsic(DL) &&
@@ -755,7 +755,7 @@ MachineOperand GetMOForConstDbgOp(const SDDbgOperand &Op) {
void InstrEmitter::AddDbgValueLocationOps(
MachineInstrBuilder &MIB, const MCInstrDesc &DbgValDesc,
ArrayRef<SDDbgOperand> LocationOps,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
for (const SDDbgOperand &Op : LocationOps) {
switch (Op.getKind()) {
case SDDbgOperand::FRAMEIX:
@@ -786,7 +786,7 @@ void InstrEmitter::AddDbgValueLocationOps(
MachineInstr *
InstrEmitter::EmitDbgInstrRef(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
const DIExpression *Expr = (DIExpression *)SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -862,7 +862,7 @@ InstrEmitter::EmitDbgInstrRef(SDDbgValue *SD,
// Look up the corresponding VReg for the given SDNode, if any.
SDNode *Node = DbgOperand.getSDNode();
SDValue Op = SDValue(Node, DbgOperand.getResNo());
- DenseMap<SDValue, Register>::iterator I = VRBaseMap.find(Op);
+ VRBaseMapType::iterator I = VRBaseMap.find(Op);
// No VReg -> produce a DBG_VALUE $noreg instead.
if (I == VRBaseMap.end())
break;
@@ -928,7 +928,7 @@ MachineInstr *InstrEmitter::EmitDbgNoLocation(SDDbgValue *SD) {
MachineInstr *
InstrEmitter::EmitDbgValueList(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
DIExpression *Expr = SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -944,7 +944,7 @@ InstrEmitter::EmitDbgValueList(SDDbgValue *SD,
MachineInstr *
InstrEmitter::EmitDbgValueFromSingleOp(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
DIExpression *Expr = SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -996,7 +996,7 @@ InstrEmitter::EmitDbgLabel(SDDbgLabel *SD) {
///
void InstrEmitter::
EmitMachineNode(SDNode *Node, bool IsClone, bool IsCloned,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
unsigned Opc = Node->getMachineOpcode();
// Handle subreg insert/extract specially
@@ -1238,7 +1238,7 @@ EmitMachineNode(SDNode *Node, bool IsClone, bool IsCloned,
/// needed dependencies.
void InstrEmitter::
EmitSpecialNode(SDNode *Node, bool IsClone, bool IsCloned,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
switch (Node->getOpcode()) {
default:
#ifndef NDEBUG
diff --git a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
index 959bce31c8b278..fcfcaa8d35b848 100644
--- a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
+++ b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
@@ -30,6 +30,8 @@ class TargetLowering;
class TargetMachine;
class LLVM_LIBRARY_VISIBILITY InstrEmitter {
+ using VRBaseMapType = SmallDenseMap<SDValue, Register, 16>;
+
MachineFunction *MF;
MachineRegisterInfo *MRI;
const TargetInstrInfo *TII;
@@ -45,18 +47,17 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
/// EmitCopyFromReg - Generate machine code for an CopyFromReg node or an
/// implicit physical register output.
void EmitCopyFromReg(SDNode *Node, unsigned ResNo, bool IsClone,
- Register SrcReg, DenseMap<SDValue, Register> &VRBaseMap);
+ Register SrcReg, VRBaseMapType &VRBaseMap);
void CreateVirtualRegisters(SDNode *Node,
MachineInstrBuilder &MIB,
const MCInstrDesc &II,
bool IsClone, bool IsCloned,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBaseMap);
/// getVR - Return the virtual register corresponding to the specified result
/// of the specified node.
- Register getVR(SDValue Op,
- DenseMap<SDValue, Register> &VRBaseMap);
+ Register getVR(SDValue Op, VRBaseMapType &VRBaseMap);
/// AddRegisterOperand - Add the specified register as an operand to the
/// specified machine instr. Insert register copies if the register is
@@ -65,7 +66,7 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
SDValue Op,
unsigned IIOpNum,
const MCInstrDesc *II,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned);
/// AddOperand - Add the specified operand to the specified machine instr. II
@@ -76,7 +77,7 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
SDValue Op,
unsigned IIOpNum,
const MCInstrDesc *II,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned);
/// ConstrainForSubReg - Try to constrain VReg to a register class that
@@ -87,7 +88,7 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
/// EmitSubregNode - Generate machine code for subreg nodes.
///
- void EmitSubregNode(SDNode *Node, DenseMap<SDValue, Register> &VRBaseMap,
+ void EmitSubregNode(SDNode *Node, VRBaseMapType &VRBaseMap,
bool IsClone, bool IsCloned);
/// EmitCopyToRegClassNode - Generate machine code for COPY_TO_REGCLASS nodes.
@@ -95,11 +96,11 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
/// register is constrained to be in a particular register class.
///
void EmitCopyToRegClassNode(SDNode *Node,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBaseMap);
/// EmitRegSequence - Generate machine code for REG_SEQUENCE nodes.
///
- void EmitRegSequence(SDNode *Node, DenseMap<SDValue, Register> &VRBaseMap,
+ void EmitRegSequence(SDNode *Node, VRBaseMapType &VRBaseMap,
bool IsClone, bool IsCloned);
public:
/// CountResults - The results of target nodes have register or immediate
@@ -110,29 +111,29 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
void AddDbgValueLocationOps(MachineInstrBuilder &MIB,
const MCInstrDesc &DbgValDesc,
ArrayRef<SDDbgOperand> Locations,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBaseMap);
/// EmitDbgValue - Generate machine instruction for a dbg_value node.
///
MachineInstr *EmitDbgValue(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBaseMap);
/// Emit a dbg_value as a DBG_INSTR_REF. May produce DBG_VALUE $noreg instead
/// if there is no variable location; alternately a half-formed DBG_INSTR_REF
/// that refers to a virtual register and is corrected later in isel.
MachineInstr *EmitDbgInstrRef(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBa...
[truncated]
|
@llvm/pr-subscribers-llvm-regalloc Author: Jeremy Morse (jmorse) Changestl;dr, if we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic: Background: inspired by @SLTozer 's introspective collection of stacktraces for some debug-info things, I've instrumented DenseMap to print where it was allocated and the max number of elements it contained. Run over CTMark and with the addition of some filtering, this has picked out the locations in LLVM where we allocate a DenseMap hashtable off the heap but we could instead get away with using the inline buckets of a SmallDenseMap and avoid calling malloc. I picked 16 inline elements at callsites which occasionally have more than 12 elements inserted, and four inline elements for some callsites where there typically aren't any elements inserted. One drawback of this technique is that it's fully tuned to making the compile-time-tracker happy, so might not be representative in general. Counterpoints would be that CTMark is chosen to have a range of different inputs and is vaguely representative, avoiding allocations is almost always a win, and in scenarios where we will /always/ insert at least one element it makes sense to spend a little stack memory to avoid that. (I've got two more patches that contribute another ~0.3% speedup, but it's now hit diminishing returns). Patch is 32.79 KiB, truncated to 20.00 KiB below, full version: https://github.com/llvm/llvm-project/pull/109417.diff 14 Files Affected:
diff --git a/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h b/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
index decb33e6af6bcb..c31e663498d5f3 100644
--- a/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
@@ -492,7 +492,7 @@ class MemoryDependenceResults {
const MemoryLocation &Loc, bool isLoad,
BasicBlock *BB,
SmallVectorImpl<NonLocalDepResult> &Result,
- DenseMap<BasicBlock *, Value *> &Visited,
+ SmallDenseMap<BasicBlock *, Value *, 16> &Visited,
bool SkipFirstBlock = false,
bool IsIncomplete = false);
MemDepResult getNonLocalInfoForBlock(Instruction *QueryInst,
diff --git a/llvm/include/llvm/Analysis/SparsePropagation.h b/llvm/include/llvm/Analysis/SparsePropagation.h
index d5805a7314757f..194f4787a8de91 100644
--- a/llvm/include/llvm/Analysis/SparsePropagation.h
+++ b/llvm/include/llvm/Analysis/SparsePropagation.h
@@ -89,7 +89,7 @@ template <class LatticeKey, class LatticeVal> class AbstractLatticeFunction {
/// \p ChangedValues.
virtual void
ComputeInstructionState(Instruction &I,
- DenseMap<LatticeKey, LatticeVal> &ChangedValues,
+ SmallDenseMap<LatticeKey, LatticeVal, 16> &ChangedValues,
SparseSolver<LatticeKey, LatticeVal> &SS) = 0;
/// PrintLatticeVal - Render the given LatticeVal to the specified stream.
@@ -401,7 +401,7 @@ void SparseSolver<LatticeKey, LatticeVal, KeyInfo>::visitPHINode(PHINode &PN) {
// computed from its incoming values. For example, SSI form stores its sigma
// functions as PHINodes with a single incoming value.
if (LatticeFunc->IsSpecialCasedPHI(&PN)) {
- DenseMap<LatticeKey, LatticeVal> ChangedValues;
+ SmallDenseMap<LatticeKey, LatticeVal, 16> ChangedValues;
LatticeFunc->ComputeInstructionState(PN, ChangedValues, *this);
for (auto &ChangedValue : ChangedValues)
if (ChangedValue.second != LatticeFunc->getUntrackedVal())
@@ -456,7 +456,7 @@ void SparseSolver<LatticeKey, LatticeVal, KeyInfo>::visitInst(Instruction &I) {
// Otherwise, ask the transfer function what the result is. If this is
// something that we care about, remember it.
- DenseMap<LatticeKey, LatticeVal> ChangedValues;
+ SmallDenseMap<LatticeKey, LatticeVal, 16> ChangedValues;
LatticeFunc->ComputeInstructionState(I, ChangedValues, *this);
for (auto &ChangedValue : ChangedValues)
if (ChangedValue.second != LatticeFunc->getUntrackedVal())
diff --git a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
index 79504ca7b73c8f..c5fba184cd0850 100644
--- a/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
+++ b/llvm/lib/Analysis/MemoryDependenceAnalysis.cpp
@@ -888,7 +888,7 @@ void MemoryDependenceResults::getNonLocalPointerDependency(
// each block. Because of critical edges, we currently bail out if querying
// a block with multiple different pointers. This can happen during PHI
// translation.
- DenseMap<BasicBlock *, Value *> Visited;
+ SmallDenseMap<BasicBlock *, Value *, 16> Visited;
if (getNonLocalPointerDepFromBB(QueryInst, Address, Loc, isLoad, FromBB,
Result, Visited, true))
return;
@@ -1038,7 +1038,7 @@ bool MemoryDependenceResults::getNonLocalPointerDepFromBB(
Instruction *QueryInst, const PHITransAddr &Pointer,
const MemoryLocation &Loc, bool isLoad, BasicBlock *StartBB,
SmallVectorImpl<NonLocalDepResult> &Result,
- DenseMap<BasicBlock *, Value *> &Visited, bool SkipFirstBlock,
+ SmallDenseMap<BasicBlock *, Value *, 16> &Visited, bool SkipFirstBlock,
bool IsIncomplete) {
// Look up the cached info for Pointer.
ValueIsLoadPair CacheKey(Pointer.getAddr(), isLoad);
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index 1d3443588ce60d..b2c2944c57978d 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -2255,7 +2255,7 @@ const SCEV *ScalarEvolution::getAnyExtendExpr(const SCEV *Op,
/// the common case where no interesting opportunities are present, and
/// is also used as a check to avoid infinite recursion.
static bool
-CollectAddOperandsWithScales(DenseMap<const SCEV *, APInt> &M,
+CollectAddOperandsWithScales(SmallDenseMap<const SCEV *, APInt, 16> &M,
SmallVectorImpl<const SCEV *> &NewOps,
APInt &AccumulatedConstant,
ArrayRef<const SCEV *> Ops, const APInt &Scale,
@@ -2753,7 +2753,7 @@ const SCEV *ScalarEvolution::getAddExpr(SmallVectorImpl<const SCEV *> &Ops,
// operands multiplied by constant values.
if (Idx < Ops.size() && isa<SCEVMulExpr>(Ops[Idx])) {
uint64_t BitWidth = getTypeSizeInBits(Ty);
- DenseMap<const SCEV *, APInt> M;
+ SmallDenseMap<const SCEV *, APInt, 16> M;
SmallVector<const SCEV *, 8> NewOps;
APInt AccumulatedConstant(BitWidth, 0);
if (CollectAddOperandsWithScales(M, NewOps, AccumulatedConstant,
diff --git a/llvm/lib/CodeGen/CalcSpillWeights.cpp b/llvm/lib/CodeGen/CalcSpillWeights.cpp
index 9d8c9119f7719d..88ed2291313c95 100644
--- a/llvm/lib/CodeGen/CalcSpillWeights.cpp
+++ b/llvm/lib/CodeGen/CalcSpillWeights.cpp
@@ -222,7 +222,7 @@ float VirtRegAuxInfo::weightCalcHelper(LiveInterval &LI, SlotIndex *Start,
bool IsExiting = false;
std::set<CopyHint> CopyHints;
- DenseMap<unsigned, float> Hint;
+ SmallDenseMap<unsigned, float, 8> Hint;
for (MachineRegisterInfo::reg_instr_nodbg_iterator
I = MRI.reg_instr_nodbg_begin(LI.reg()),
E = MRI.reg_instr_nodbg_end();
diff --git a/llvm/lib/CodeGen/MachineLICM.cpp b/llvm/lib/CodeGen/MachineLICM.cpp
index 6768eeeb4364c8..c1f3d5ac4ff957 100644
--- a/llvm/lib/CodeGen/MachineLICM.cpp
+++ b/llvm/lib/CodeGen/MachineLICM.cpp
@@ -239,7 +239,7 @@ namespace {
bool IsCheapInstruction(MachineInstr &MI) const;
- bool CanCauseHighRegPressure(const DenseMap<unsigned, int> &Cost,
+ bool CanCauseHighRegPressure(const SmallDenseMap<unsigned, int> &Cost,
bool Cheap);
void UpdateBackTraceRegPressure(const MachineInstr *MI);
@@ -264,7 +264,7 @@ namespace {
void InitRegPressure(MachineBasicBlock *BB);
- DenseMap<unsigned, int> calcRegisterCost(const MachineInstr *MI,
+ SmallDenseMap<unsigned, int> calcRegisterCost(const MachineInstr *MI,
bool ConsiderSeen,
bool ConsiderUnseenAsDef);
@@ -977,10 +977,10 @@ void MachineLICMImpl::UpdateRegPressure(const MachineInstr *MI,
/// If 'ConsiderSeen' is true, updates 'RegSeen' and uses the information to
/// figure out which usages are live-ins.
/// FIXME: Figure out a way to consider 'RegSeen' from all code paths.
-DenseMap<unsigned, int>
+SmallDenseMap<unsigned, int>
MachineLICMImpl::calcRegisterCost(const MachineInstr *MI, bool ConsiderSeen,
bool ConsiderUnseenAsDef) {
- DenseMap<unsigned, int> Cost;
+ SmallDenseMap<unsigned, int> Cost;
if (MI->isImplicitDef())
return Cost;
for (unsigned i = 0, e = MI->getDesc().getNumOperands(); i != e; ++i) {
@@ -1248,7 +1248,7 @@ bool MachineLICMImpl::IsCheapInstruction(MachineInstr &MI) const {
/// Visit BBs from header to current BB, check if hoisting an instruction of the
/// given cost matrix can cause high register pressure.
bool MachineLICMImpl::CanCauseHighRegPressure(
- const DenseMap<unsigned, int> &Cost, bool CheapInstr) {
+ const SmallDenseMap<unsigned, int>& Cost, bool CheapInstr) {
for (const auto &RPIdAndCost : Cost) {
if (RPIdAndCost.second <= 0)
continue;
diff --git a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
index 53ce21906204c8..738319d44d2a53 100644
--- a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
@@ -83,7 +83,7 @@ static unsigned countOperands(SDNode *Node, unsigned NumExpUses,
/// implicit physical register output.
void InstrEmitter::EmitCopyFromReg(SDNode *Node, unsigned ResNo, bool IsClone,
Register SrcReg,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
Register VRBase;
if (SrcReg.isVirtual()) {
// Just use the input register directly!
@@ -187,7 +187,7 @@ void InstrEmitter::CreateVirtualRegisters(SDNode *Node,
MachineInstrBuilder &MIB,
const MCInstrDesc &II,
bool IsClone, bool IsCloned,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
assert(Node->getMachineOpcode() != TargetOpcode::IMPLICIT_DEF &&
"IMPLICIT_DEF should have been handled as a special case elsewhere!");
@@ -266,7 +266,7 @@ void InstrEmitter::CreateVirtualRegisters(SDNode *Node,
/// getVR - Return the virtual register corresponding to the specified result
/// of the specified node.
Register InstrEmitter::getVR(SDValue Op,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
if (Op.isMachineOpcode() &&
Op.getMachineOpcode() == TargetOpcode::IMPLICIT_DEF) {
// Add an IMPLICIT_DEF instruction before every use.
@@ -280,7 +280,7 @@ Register InstrEmitter::getVR(SDValue Op,
return VReg;
}
- DenseMap<SDValue, Register>::iterator I = VRBaseMap.find(Op);
+ VRBaseMapType::iterator I = VRBaseMap.find(Op);
assert(I != VRBaseMap.end() && "Node emitted out of order - late");
return I->second;
}
@@ -318,7 +318,7 @@ InstrEmitter::AddRegisterOperand(MachineInstrBuilder &MIB,
SDValue Op,
unsigned IIOpNum,
const MCInstrDesc *II,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned) {
assert(Op.getValueType() != MVT::Other &&
Op.getValueType() != MVT::Glue &&
@@ -399,7 +399,7 @@ void InstrEmitter::AddOperand(MachineInstrBuilder &MIB,
SDValue Op,
unsigned IIOpNum,
const MCInstrDesc *II,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned) {
if (Op.isMachineOpcode()) {
AddRegisterOperand(MIB, Op, IIOpNum, II, VRBaseMap,
@@ -500,7 +500,7 @@ Register InstrEmitter::ConstrainForSubReg(Register VReg, unsigned SubIdx,
/// EmitSubregNode - Generate machine code for subreg nodes.
///
void InstrEmitter::EmitSubregNode(SDNode *Node,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsClone, bool IsCloned) {
Register VRBase;
unsigned Opc = Node->getMachineOpcode();
@@ -634,7 +634,7 @@ void InstrEmitter::EmitSubregNode(SDNode *Node,
///
void
InstrEmitter::EmitCopyToRegClassNode(SDNode *Node,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
Register VReg = getVR(Node->getOperand(0), VRBaseMap);
// Create the new VReg in the destination class and emit a copy.
@@ -654,7 +654,7 @@ InstrEmitter::EmitCopyToRegClassNode(SDNode *Node,
/// EmitRegSequence - Generate machine code for REG_SEQUENCE nodes.
///
void InstrEmitter::EmitRegSequence(SDNode *Node,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsClone, bool IsCloned) {
unsigned DstRCIdx = Node->getConstantOperandVal(0);
const TargetRegisterClass *RC = TRI->getRegClass(DstRCIdx);
@@ -703,7 +703,7 @@ void InstrEmitter::EmitRegSequence(SDNode *Node,
///
MachineInstr *
InstrEmitter::EmitDbgValue(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
DebugLoc DL = SD->getDebugLoc();
assert(cast<DILocalVariable>(SD->getVariable())
->isValidLocationForIntrinsic(DL) &&
@@ -755,7 +755,7 @@ MachineOperand GetMOForConstDbgOp(const SDDbgOperand &Op) {
void InstrEmitter::AddDbgValueLocationOps(
MachineInstrBuilder &MIB, const MCInstrDesc &DbgValDesc,
ArrayRef<SDDbgOperand> LocationOps,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
for (const SDDbgOperand &Op : LocationOps) {
switch (Op.getKind()) {
case SDDbgOperand::FRAMEIX:
@@ -786,7 +786,7 @@ void InstrEmitter::AddDbgValueLocationOps(
MachineInstr *
InstrEmitter::EmitDbgInstrRef(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
const DIExpression *Expr = (DIExpression *)SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -862,7 +862,7 @@ InstrEmitter::EmitDbgInstrRef(SDDbgValue *SD,
// Look up the corresponding VReg for the given SDNode, if any.
SDNode *Node = DbgOperand.getSDNode();
SDValue Op = SDValue(Node, DbgOperand.getResNo());
- DenseMap<SDValue, Register>::iterator I = VRBaseMap.find(Op);
+ VRBaseMapType::iterator I = VRBaseMap.find(Op);
// No VReg -> produce a DBG_VALUE $noreg instead.
if (I == VRBaseMap.end())
break;
@@ -928,7 +928,7 @@ MachineInstr *InstrEmitter::EmitDbgNoLocation(SDDbgValue *SD) {
MachineInstr *
InstrEmitter::EmitDbgValueList(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
DIExpression *Expr = SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -944,7 +944,7 @@ InstrEmitter::EmitDbgValueList(SDDbgValue *SD,
MachineInstr *
InstrEmitter::EmitDbgValueFromSingleOp(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
DIExpression *Expr = SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -996,7 +996,7 @@ InstrEmitter::EmitDbgLabel(SDDbgLabel *SD) {
///
void InstrEmitter::
EmitMachineNode(SDNode *Node, bool IsClone, bool IsCloned,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
unsigned Opc = Node->getMachineOpcode();
// Handle subreg insert/extract specially
@@ -1238,7 +1238,7 @@ EmitMachineNode(SDNode *Node, bool IsClone, bool IsCloned,
/// needed dependencies.
void InstrEmitter::
EmitSpecialNode(SDNode *Node, bool IsClone, bool IsCloned,
- DenseMap<SDValue, Register> &VRBaseMap) {
+ VRBaseMapType &VRBaseMap) {
switch (Node->getOpcode()) {
default:
#ifndef NDEBUG
diff --git a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
index 959bce31c8b278..fcfcaa8d35b848 100644
--- a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
+++ b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
@@ -30,6 +30,8 @@ class TargetLowering;
class TargetMachine;
class LLVM_LIBRARY_VISIBILITY InstrEmitter {
+ using VRBaseMapType = SmallDenseMap<SDValue, Register, 16>;
+
MachineFunction *MF;
MachineRegisterInfo *MRI;
const TargetInstrInfo *TII;
@@ -45,18 +47,17 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
/// EmitCopyFromReg - Generate machine code for an CopyFromReg node or an
/// implicit physical register output.
void EmitCopyFromReg(SDNode *Node, unsigned ResNo, bool IsClone,
- Register SrcReg, DenseMap<SDValue, Register> &VRBaseMap);
+ Register SrcReg, VRBaseMapType &VRBaseMap);
void CreateVirtualRegisters(SDNode *Node,
MachineInstrBuilder &MIB,
const MCInstrDesc &II,
bool IsClone, bool IsCloned,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBaseMap);
/// getVR - Return the virtual register corresponding to the specified result
/// of the specified node.
- Register getVR(SDValue Op,
- DenseMap<SDValue, Register> &VRBaseMap);
+ Register getVR(SDValue Op, VRBaseMapType &VRBaseMap);
/// AddRegisterOperand - Add the specified register as an operand to the
/// specified machine instr. Insert register copies if the register is
@@ -65,7 +66,7 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
SDValue Op,
unsigned IIOpNum,
const MCInstrDesc *II,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned);
/// AddOperand - Add the specified operand to the specified machine instr. II
@@ -76,7 +77,7 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
SDValue Op,
unsigned IIOpNum,
const MCInstrDesc *II,
- DenseMap<SDValue, Register> &VRBaseMap,
+ VRBaseMapType &VRBaseMap,
bool IsDebug, bool IsClone, bool IsCloned);
/// ConstrainForSubReg - Try to constrain VReg to a register class that
@@ -87,7 +88,7 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
/// EmitSubregNode - Generate machine code for subreg nodes.
///
- void EmitSubregNode(SDNode *Node, DenseMap<SDValue, Register> &VRBaseMap,
+ void EmitSubregNode(SDNode *Node, VRBaseMapType &VRBaseMap,
bool IsClone, bool IsCloned);
/// EmitCopyToRegClassNode - Generate machine code for COPY_TO_REGCLASS nodes.
@@ -95,11 +96,11 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
/// register is constrained to be in a particular register class.
///
void EmitCopyToRegClassNode(SDNode *Node,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBaseMap);
/// EmitRegSequence - Generate machine code for REG_SEQUENCE nodes.
///
- void EmitRegSequence(SDNode *Node, DenseMap<SDValue, Register> &VRBaseMap,
+ void EmitRegSequence(SDNode *Node, VRBaseMapType &VRBaseMap,
bool IsClone, bool IsCloned);
public:
/// CountResults - The results of target nodes have register or immediate
@@ -110,29 +111,29 @@ class LLVM_LIBRARY_VISIBILITY InstrEmitter {
void AddDbgValueLocationOps(MachineInstrBuilder &MIB,
const MCInstrDesc &DbgValDesc,
ArrayRef<SDDbgOperand> Locations,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBaseMap);
/// EmitDbgValue - Generate machine instruction for a dbg_value node.
///
MachineInstr *EmitDbgValue(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBaseMap);
/// Emit a dbg_value as a DBG_INSTR_REF. May produce DBG_VALUE $noreg instead
/// if there is no variable location; alternately a half-formed DBG_INSTR_REF
/// that refers to a virtual register and is corrected later in isel.
MachineInstr *EmitDbgInstrRef(SDDbgValue *SD,
- DenseMap<SDValue, Register> &VRBaseMap);
+ VRBaseMapType &VRBa...
[truncated]
|
You can test this locally with the following command:git-clang-format --diff efdb3ae23247850d3886e3708400f0d991ed59e1 1e079343e73248de60f8b6279f2fc66b3ccd689a --extensions h,cpp -- llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h llvm/include/llvm/Analysis/SparsePropagation.h llvm/lib/Analysis/MemoryDependenceAnalysis.cpp llvm/lib/Analysis/ScalarEvolution.cpp llvm/lib/CodeGen/CalcSpillWeights.cpp llvm/lib/CodeGen/MachineLICM.cpp llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h llvm/lib/CodeGen/SelectionDAG/ScheduleDAGFast.cpp llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.h llvm/lib/Transforms/IPO/CalledValuePropagation.cpp llvm/lib/Transforms/Utils/BasicBlockUtils.cpp llvm/lib/Transforms/Vectorize/SLPVectorizer.cpp View the diff from clang-format here.diff --git a/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h b/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
index c31e663498..700343561f 100644
--- a/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
+++ b/llvm/include/llvm/Analysis/MemoryDependenceAnalysis.h
@@ -487,14 +487,12 @@ private:
MemDepResult getCallDependencyFrom(CallBase *Call, bool isReadOnlyCall,
BasicBlock::iterator ScanIt,
BasicBlock *BB);
- bool getNonLocalPointerDepFromBB(Instruction *QueryInst,
- const PHITransAddr &Pointer,
- const MemoryLocation &Loc, bool isLoad,
- BasicBlock *BB,
- SmallVectorImpl<NonLocalDepResult> &Result,
- SmallDenseMap<BasicBlock *, Value *, 16> &Visited,
- bool SkipFirstBlock = false,
- bool IsIncomplete = false);
+ bool getNonLocalPointerDepFromBB(
+ Instruction *QueryInst, const PHITransAddr &Pointer,
+ const MemoryLocation &Loc, bool isLoad, BasicBlock *BB,
+ SmallVectorImpl<NonLocalDepResult> &Result,
+ SmallDenseMap<BasicBlock *, Value *, 16> &Visited,
+ bool SkipFirstBlock = false, bool IsIncomplete = false);
MemDepResult getNonLocalInfoForBlock(Instruction *QueryInst,
const MemoryLocation &Loc, bool isLoad,
BasicBlock *BB, NonLocalDepInfo *Cache,
diff --git a/llvm/lib/Analysis/ScalarEvolution.cpp b/llvm/lib/Analysis/ScalarEvolution.cpp
index b2c2944c57..d2ffda254d 100644
--- a/llvm/lib/Analysis/ScalarEvolution.cpp
+++ b/llvm/lib/Analysis/ScalarEvolution.cpp
@@ -2254,12 +2254,10 @@ const SCEV *ScalarEvolution::getAnyExtendExpr(const SCEV *Op,
/// may be exposed. This helps getAddRecExpr short-circuit extra work in
/// the common case where no interesting opportunities are present, and
/// is also used as a check to avoid infinite recursion.
-static bool
-CollectAddOperandsWithScales(SmallDenseMap<const SCEV *, APInt, 16> &M,
- SmallVectorImpl<const SCEV *> &NewOps,
- APInt &AccumulatedConstant,
- ArrayRef<const SCEV *> Ops, const APInt &Scale,
- ScalarEvolution &SE) {
+static bool CollectAddOperandsWithScales(
+ SmallDenseMap<const SCEV *, APInt, 16> &M,
+ SmallVectorImpl<const SCEV *> &NewOps, APInt &AccumulatedConstant,
+ ArrayRef<const SCEV *> Ops, const APInt &Scale, ScalarEvolution &SE) {
bool Interesting = false;
// Iterate over the add operands. They are sorted, with constants first.
diff --git a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
index 12a48ab06f..bd834871c9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.cpp
@@ -183,10 +183,10 @@ void InstrEmitter::EmitCopyFromReg(SDNode *Node, unsigned ResNo, bool IsClone,
}
void InstrEmitter::CreateVirtualRegisters(SDNode *Node,
- MachineInstrBuilder &MIB,
- const MCInstrDesc &II,
- bool IsClone, bool IsCloned,
- VRBaseMapType &VRBaseMap) {
+ MachineInstrBuilder &MIB,
+ const MCInstrDesc &II, bool IsClone,
+ bool IsCloned,
+ VRBaseMapType &VRBaseMap) {
assert(Node->getMachineOpcode() != TargetOpcode::IMPLICIT_DEF &&
"IMPLICIT_DEF should have been handled as a special case elsewhere!");
@@ -311,13 +311,10 @@ static bool isConvergenceCtrlMachineOp(SDValue Op) {
/// AddRegisterOperand - Add the specified register as an operand to the
/// specified machine instr. Insert register copies if the register is
/// not in the required register class.
-void
-InstrEmitter::AddRegisterOperand(MachineInstrBuilder &MIB,
- SDValue Op,
- unsigned IIOpNum,
- const MCInstrDesc *II,
- VRBaseMapType &VRBaseMap,
- bool IsDebug, bool IsClone, bool IsCloned) {
+void InstrEmitter::AddRegisterOperand(MachineInstrBuilder &MIB, SDValue Op,
+ unsigned IIOpNum, const MCInstrDesc *II,
+ VRBaseMapType &VRBaseMap, bool IsDebug,
+ bool IsClone, bool IsCloned) {
assert(Op.getValueType() != MVT::Other &&
Op.getValueType() != MVT::Glue &&
"Chain and glue operands should occur at end of operand list!");
@@ -627,9 +624,8 @@ void InstrEmitter::EmitSubregNode(SDNode *Node, VRBaseMapType &VRBaseMap,
/// COPY_TO_REGCLASS is just a normal copy, except that the destination
/// register is constrained to be in a particular register class.
///
-void
-InstrEmitter::EmitCopyToRegClassNode(SDNode *Node,
- VRBaseMapType &VRBaseMap) {
+void InstrEmitter::EmitCopyToRegClassNode(SDNode *Node,
+ VRBaseMapType &VRBaseMap) {
Register VReg = getVR(Node->getOperand(0), VRBaseMap);
// Create the new VReg in the destination class and emit a copy.
@@ -695,9 +691,8 @@ void InstrEmitter::EmitRegSequence(SDNode *Node, VRBaseMapType &VRBaseMap,
/// EmitDbgValue - Generate machine instruction for a dbg_value node.
///
-MachineInstr *
-InstrEmitter::EmitDbgValue(SDDbgValue *SD,
- VRBaseMapType &VRBaseMap) {
+MachineInstr *InstrEmitter::EmitDbgValue(SDDbgValue *SD,
+ VRBaseMapType &VRBaseMap) {
DebugLoc DL = SD->getDebugLoc();
assert(cast<DILocalVariable>(SD->getVariable())
->isValidLocationForIntrinsic(DL) &&
@@ -746,10 +741,10 @@ MachineOperand GetMOForConstDbgOp(const SDDbgOperand &Op) {
/* SubReg */ 0, /* isDebug */ true);
}
-void InstrEmitter::AddDbgValueLocationOps(
- MachineInstrBuilder &MIB, const MCInstrDesc &DbgValDesc,
- ArrayRef<SDDbgOperand> LocationOps,
- VRBaseMapType &VRBaseMap) {
+void InstrEmitter::AddDbgValueLocationOps(MachineInstrBuilder &MIB,
+ const MCInstrDesc &DbgValDesc,
+ ArrayRef<SDDbgOperand> LocationOps,
+ VRBaseMapType &VRBaseMap) {
for (const SDDbgOperand &Op : LocationOps) {
switch (Op.getKind()) {
case SDDbgOperand::FRAMEIX:
@@ -778,9 +773,8 @@ void InstrEmitter::AddDbgValueLocationOps(
}
}
-MachineInstr *
-InstrEmitter::EmitDbgInstrRef(SDDbgValue *SD,
- VRBaseMapType &VRBaseMap) {
+MachineInstr *InstrEmitter::EmitDbgInstrRef(SDDbgValue *SD,
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
const DIExpression *Expr = (DIExpression *)SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -920,9 +914,8 @@ MachineInstr *InstrEmitter::EmitDbgNoLocation(SDDbgValue *SD) {
return BuildMI(*MF, DL, Desc, false, 0U, Var, Expr);
}
-MachineInstr *
-InstrEmitter::EmitDbgValueList(SDDbgValue *SD,
- VRBaseMapType &VRBaseMap) {
+MachineInstr *InstrEmitter::EmitDbgValueList(SDDbgValue *SD,
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
DIExpression *Expr = SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -936,9 +929,8 @@ InstrEmitter::EmitDbgValueList(SDDbgValue *SD,
return &*MIB;
}
-MachineInstr *
-InstrEmitter::EmitDbgValueFromSingleOp(SDDbgValue *SD,
- VRBaseMapType &VRBaseMap) {
+MachineInstr *InstrEmitter::EmitDbgValueFromSingleOp(SDDbgValue *SD,
+ VRBaseMapType &VRBaseMap) {
MDNode *Var = SD->getVariable();
DIExpression *Expr = SD->getExpression();
DebugLoc DL = SD->getDebugLoc();
@@ -988,9 +980,8 @@ InstrEmitter::EmitDbgLabel(SDDbgLabel *SD) {
/// EmitMachineNode - Generate machine code for a target-specific node and
/// needed dependencies.
///
-void InstrEmitter::
-EmitMachineNode(SDNode *Node, bool IsClone, bool IsCloned,
- VRBaseMapType &VRBaseMap) {
+void InstrEmitter::EmitMachineNode(SDNode *Node, bool IsClone, bool IsCloned,
+ VRBaseMapType &VRBaseMap) {
unsigned Opc = Node->getMachineOpcode();
// Handle subreg insert/extract specially
@@ -1230,9 +1221,8 @@ EmitMachineNode(SDNode *Node, bool IsClone, bool IsCloned,
/// EmitSpecialNode - Generate machine code for a target-independent node and
/// needed dependencies.
-void InstrEmitter::
-EmitSpecialNode(SDNode *Node, bool IsClone, bool IsCloned,
- VRBaseMapType &VRBaseMap) {
+void InstrEmitter::EmitSpecialNode(SDNode *Node, bool IsClone, bool IsCloned,
+ VRBaseMapType &VRBaseMap) {
switch (Node->getOpcode()) {
default:
#ifndef NDEBUG
diff --git a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
index 16d754cdc2..99f8af97b9 100644
--- a/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
+++ b/llvm/lib/CodeGen/SelectionDAG/InstrEmitter.h
@@ -51,11 +51,9 @@ private:
void EmitCopyFromReg(SDNode *Node, unsigned ResNo, bool IsClone,
Register SrcReg, VRBaseMapType &VRBaseMap);
- void CreateVirtualRegisters(SDNode *Node,
- MachineInstrBuilder &MIB,
- const MCInstrDesc &II,
- bool IsClone, bool IsCloned,
- VRBaseMapType &VRBaseMap);
+ void CreateVirtualRegisters(SDNode *Node, MachineInstrBuilder &MIB,
+ const MCInstrDesc &II, bool IsClone,
+ bool IsCloned, VRBaseMapType &VRBaseMap);
/// getVR - Return the virtual register corresponding to the specified result
/// of the specified node.
@@ -64,23 +62,18 @@ private:
/// AddRegisterOperand - Add the specified register as an operand to the
/// specified machine instr. Insert register copies if the register is
/// not in the required register class.
- void AddRegisterOperand(MachineInstrBuilder &MIB,
- SDValue Op,
- unsigned IIOpNum,
- const MCInstrDesc *II,
- VRBaseMapType &VRBaseMap,
- bool IsDebug, bool IsClone, bool IsCloned);
+ void AddRegisterOperand(MachineInstrBuilder &MIB, SDValue Op,
+ unsigned IIOpNum, const MCInstrDesc *II,
+ VRBaseMapType &VRBaseMap, bool IsDebug, bool IsClone,
+ bool IsCloned);
/// AddOperand - Add the specified operand to the specified machine instr. II
/// specifies the instruction information for the node, and IIOpNum is the
/// operand number (in the II) that we are adding. IIOpNum and II are used for
/// assertions only.
- void AddOperand(MachineInstrBuilder &MIB,
- SDValue Op,
- unsigned IIOpNum,
- const MCInstrDesc *II,
- VRBaseMapType &VRBaseMap,
- bool IsDebug, bool IsClone, bool IsCloned);
+ void AddOperand(MachineInstrBuilder &MIB, SDValue Op, unsigned IIOpNum,
+ const MCInstrDesc *II, VRBaseMapType &VRBaseMap, bool IsDebug,
+ bool IsClone, bool IsCloned);
/// ConstrainForSubReg - Try to constrain VReg to a register class that
/// supports SubIdx sub-registers. Emit a copy if that isn't possible.
diff --git a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
index 31939ae592..3398bca669 100644
--- a/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
+++ b/llvm/lib/CodeGen/SelectionDAG/ScheduleDAGSDNodes.cpp
@@ -736,7 +736,7 @@ void ScheduleDAGSDNodes::VerifyScheduledSequence(bool isBottomUp) {
/// ProcessSDDbgValues - Process SDDbgValues associated with this node.
static void
ProcessSDDbgValues(SDNode *N, SelectionDAG *DAG, InstrEmitter &Emitter,
- SmallVectorImpl<std::pair<unsigned, MachineInstr*> > &Orders,
+ SmallVectorImpl<std::pair<unsigned, MachineInstr *>> &Orders,
InstrEmitter::VRBaseMapType &VRBaseMap, unsigned Order) {
if (!N->getHasDebugValue())
return;
@@ -807,9 +807,9 @@ ProcessSourceNode(SDNode *N, SelectionDAG *DAG, InstrEmitter &Emitter,
ProcessSDDbgValues(N, DAG, Emitter, Orders, VRBaseMap, Order);
}
-void ScheduleDAGSDNodes::
-EmitPhysRegCopy(SUnit *SU, SmallDenseMap<SUnit *, Register, 16> &VRBaseMap,
- MachineBasicBlock::iterator InsertPos) {
+void ScheduleDAGSDNodes::EmitPhysRegCopy(
+ SUnit *SU, SmallDenseMap<SUnit *, Register, 16> &VRBaseMap,
+ MachineBasicBlock::iterator InsertPos) {
for (const SDep &Pred : SU->Preds) {
if (Pred.isCtrl())
continue; // ignore chain preds
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. As long as it's not inside nested maps, there's basically no downside to having inline storage.
Also nice results on clang, so this clearly does generalize beyond CTMark:
bin/clang-20 6913606M 6858510M (-0.80%)
bin/llvm-tblgen 261829M 261017M (-0.31%)
bin/clang-tblgen 121228M 118689M (-2.09%)
@@ -770,7 +770,7 @@ void ScheduleDAGLinearize::Schedule() { | |||
MachineBasicBlock* | |||
ScheduleDAGLinearize::EmitSchedule(MachineBasicBlock::iterator &InsertPos) { | |||
InstrEmitter Emitter(DAG->getTarget(), BB, InsertPos); | |||
DenseMap<SDValue, Register> VRBaseMap; | |||
SmallDenseMap<SDValue, Register, 16> VRBaseMap; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make VRBaseMapType
public and use InstrEmitter::VRBaseMapType
here?
NB: I can't fix all the clang-format errors as the existing files aren't clean, and it'd transform this patch into a 75% clang-format patch. |
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/144/builds/7942 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/175/builds/5873 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/185/builds/5854 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/3/builds/5226 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/2/builds/7575 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/65/builds/5179 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/180/builds/5753 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/160/builds/5755 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/154/builds/5012 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/133/builds/4240 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/33/builds/3713 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/153/builds/10043 Here is the relevant piece of the build log for the reference
|
This time with 100% more building unit tests. Original commit message follows. [NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (#109417) If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/56/builds/8371 Here is the relevant piece of the build log for the reference
|
LLVM Buildbot has detected a new failure on builder Full details are available at: https://lab.llvm.org/buildbot/#/builders/60/builds/8618 Here is the relevant piece of the build log for the reference
|
…m#109417) If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.
…dup (llvm#109417)" This reverts commit 3f37c51. Lo and behold, I missed a unit test
This time with 100% more building unit tests. Original commit message follows. [NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (llvm#109417) If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.
This time with 100% more building unit tests. Original commit message follows. [NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (llvm#109417) If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.
…m#109417) If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.
…dup (llvm#109417)" This reverts commit 3f37c51. Lo and behold, I missed a unit test
This time with 100% more building unit tests. Original commit message follows. [NFC] Switch a number of DenseMaps to SmallDenseMaps for speedup (llvm#109417) If we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic. Discovered by instrumenting DenseMap with some accounting code, then selecting sites where we'll get the most bang for our buck.
tl;dr, if we use SmallDenseMaps instead of DenseMaps at these locations, we get a substantial speedup because there's less spurious malloc traffic:
https://llvm-compile-time-tracker.com/compare.php?from=983635c014767a2a37f87a4b564b59a78b866154&to=1198cb4aa9c1715d18212a21328801b6985b8ca7&stat=instructions:u
Background: inspired by @SLTozer 's introspective collection of stacktraces for some debug-info things, I've instrumented DenseMap to print where it was allocated and the max number of elements it contained. Run over CTMark and with the addition of some filtering, this has picked out the locations in LLVM where we allocate a DenseMap hashtable off the heap but we could instead get away with using the inline buckets of a SmallDenseMap and avoid calling malloc. I picked 16 inline elements at callsites which occasionally have more than 12 elements inserted, and four inline elements for some callsites where there typically aren't any elements inserted.
One drawback of this technique is that it's fully tuned to making the compile-time-tracker happy, so might not be representative in general. Counterpoints would be that CTMark is chosen to have a range of different inputs and is vaguely representative, avoiding allocations is almost always a win, and in scenarios where we will /always/ insert at least one element it makes sense to spend a little stack memory to avoid that.
(I've got two more patches that contribute another ~0.3% speedup, but it's now hit diminishing returns).