Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[VPlan] Model branch cond to enter scalar epilogue in VPlan. #92651

Merged
merged 32 commits into from
Jul 5, 2024
Merged
Show file tree
Hide file tree
Changes from 30 commits
Commits
Show all changes
32 commits
Select commit Hold shift + click to select a range
53e8e56
[VPlan] Use DomTreeUpdater to automatically update DT for vector loop.
fhahn May 17, 2024
3a4ecfc
[VPlan] Model branch cond to enter scalar epilogue in VPlan.
fhahn May 11, 2023
10e747a
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn May 24, 2024
4dfba5d
!fixup update
fhahn May 24, 2024
fe03d51
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn May 31, 2024
2372421
!fixup fix after merge.
fhahn May 31, 2024
ab78e2a
!fixup fix formatting.
fhahn May 31, 2024
446b313
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 5, 2024
5c6e07e
!fixup clean up after merge, fix branch weights as post-processing
fhahn Jun 5, 2024
0a34761
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 6, 2024
a5c1fd6
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 7, 2024
045f051
!fixup fix formatting
fhahn Jun 7, 2024
2e4bb36
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 13, 2024
189aa60
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 13, 2024
014c393
!ifxup update setBranchWeights call.
fhahn Jun 13, 2024
c983acf
!fixup preserve DT via DTU, simplifications.
fhahn Jun 13, 2024
b165cca
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 13, 2024
852d28d
!fixup update remaining tests
fhahn Jun 13, 2024
b506722
!fixup remove unneeded onlyFirstPartUsed
fhahn Jun 13, 2024
3236f1d
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 20, 2024
a3c7747
!fixup fix after merge
fhahn Jun 20, 2024
8e7f8dd
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 22, 2024
0e74f48
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 23, 2024
bd99d7b
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 23, 2024
887a56d
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jun 27, 2024
8cab6d2
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jul 4, 2024
2ba349c
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jul 4, 2024
fd9975c
!fixup address latest comments, thanks!
fhahn Jul 4, 2024
dee166d
!fixup update tests
fhahn Jul 4, 2024
d5cd98d
!fixup update new tests.
fhahn Jul 4, 2024
ab6c49a
!fixup add assert messages, update remaining unit tests
fhahn Jul 5, 2024
7867bfa
Merge remote-tracking branch 'origin/main' into vplan-middle-block-br…
fhahn Jul 5, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
116 changes: 34 additions & 82 deletions llvm/lib/Transforms/Vectorize/LoopVectorize.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -2964,34 +2964,6 @@ void InnerLoopVectorizer::createVectorLoopSkeleton(StringRef Prefix) {
LoopScalarPreHeader =
SplitBlock(LoopMiddleBlock, LoopMiddleBlock->getTerminator(), DT, LI,
nullptr, Twine(Prefix) + "scalar.ph");

// Set up the middle block terminator. Two cases:
// 1) If we know that we must execute the scalar epilogue, retain the existing
// unconditional branch from the middle block to the scalar preheader. In that
// case, there's no edge from the middle block to exit blocks and thus no
// need to update the immediate dominator of the exit blocks.
if (Cost->requiresScalarEpilogue(VF.isVector())) {
assert(
LoopMiddleBlock->getSingleSuccessor() == LoopScalarPreHeader &&
" middle block should have the scalar preheader as single successor");
return;
}

// 2) Otherwise, we must have a single unique exit block (due to how we
// implement the multiple exit case). In this case, set up a conditional
// branch from the middle block to the loop scalar preheader, and the
// exit block. completeLoopSkeleton will update the condition to use an
// iteration check, if required to decide whether to execute the remainder.
BranchInst *BrInst =
BranchInst::Create(LoopExitBlock, LoopScalarPreHeader, Builder.getTrue());
auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();
BrInst->setDebugLoc(ScalarLatchTerm->getDebugLoc());
ReplaceInstWithInst(LoopMiddleBlock->getTerminator(), BrInst);

// Update dominator for loop exit. During skeleton creation, only the vector
// pre-header and the middle block are created. The vector loop is entirely
// created during VPlan exection.
DT->changeImmediateDominator(LoopExitBlock, LoopMiddleBlock);
}

PHINode *InnerLoopVectorizer::createInductionResumeValue(
Expand Down Expand Up @@ -3088,51 +3060,6 @@ void InnerLoopVectorizer::createInductionResumeValues(
}
}

BasicBlock *InnerLoopVectorizer::completeLoopSkeleton() {
// The trip counts should be cached by now.
Value *Count = getTripCount();
Value *VectorTripCount = getOrCreateVectorTripCount(LoopVectorPreHeader);

auto *ScalarLatchTerm = OrigLoop->getLoopLatch()->getTerminator();

// Add a check in the middle block to see if we have completed
// all of the iterations in the first vector loop. Three cases:
// 1) If we require a scalar epilogue, there is no conditional branch as
// we unconditionally branch to the scalar preheader. Do nothing.
// 2) If (N - N%VF) == N, then we *don't* need to run the remainder.
// Thus if tail is to be folded, we know we don't need to run the
// remainder and we can use the previous value for the condition (true).
// 3) Otherwise, construct a runtime check.
if (!Cost->requiresScalarEpilogue(VF.isVector()) &&
!Cost->foldTailByMasking()) {
// Here we use the same DebugLoc as the scalar loop latch terminator instead
// of the corresponding compare because they may have ended up with
// different line numbers and we want to avoid awkward line stepping while
// debugging. Eg. if the compare has got a line number inside the loop.
// TODO: At the moment, CreateICmpEQ will simplify conditions with constant
// operands. Perform simplification directly on VPlan once the branch is
// modeled there.
IRBuilder<> B(LoopMiddleBlock->getTerminator());
B.SetCurrentDebugLocation(ScalarLatchTerm->getDebugLoc());
Value *CmpN = B.CreateICmpEQ(Count, VectorTripCount, "cmp.n");
BranchInst &BI = *cast<BranchInst>(LoopMiddleBlock->getTerminator());
BI.setCondition(CmpN);
if (hasBranchWeightMD(*ScalarLatchTerm)) {
// Assume that `Count % VectorTripCount` is equally distributed.
unsigned TripCount = UF * VF.getKnownMinValue();
assert(TripCount > 0 && "trip count should not be zero");
const uint32_t Weights[] = {1, TripCount - 1};
setBranchWeights(BI, Weights, /*IsExpected=*/false);
}
}

#ifdef EXPENSIVE_CHECKS
assert(DT->verify(DominatorTree::VerificationLevel::Fast));
#endif

return LoopVectorPreHeader;
}

std::pair<BasicBlock *, Value *>
InnerLoopVectorizer::createVectorizedLoopSkeleton(
const SCEV2ValueTy &ExpandedSCEVs) {
Expand All @@ -3155,17 +3082,18 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
| [ ]_| <-- vector loop (created during VPlan execution).
| |
| v
\ -[ ] <--- middle-block.
\ -[ ] <--- middle-block (wrapped in VPIRBasicBlock with the branch to
| | successors created during VPlan execution)
\/ |
/\ v
| ->[ ] <--- new preheader.
| ->[ ] <--- new preheader (wrapped in VPIRBasicBlock).
| |
(opt) v <-- edge from middle to exit iff epilogue is not required.
| [ ] \
| [ ]_| <-- old scalar loop to handle remainder (scalar epilogue).
\ |
\ v
>[ ] <-- exit block(s).
>[ ] <-- exit block(s). (wrapped in VPIRBasicBlock)
...
*/

Expand All @@ -3192,7 +3120,7 @@ InnerLoopVectorizer::createVectorizedLoopSkeleton(
// Emit phis for the new starting index of the scalar loop.
createInductionResumeValues(ExpandedSCEVs);

return {completeLoopSkeleton(), nullptr};
return {LoopVectorPreHeader, nullptr};
}

// Fix up external users of the induction variable. At this point, we are
Expand Down Expand Up @@ -7477,6 +7405,9 @@ LoopVectorizationPlanner::executePlan(
std::tie(State.CFG.PrevBB, CanonicalIVStartValue) =
ILV.createVectorizedLoopSkeleton(ExpandedSCEVs ? *ExpandedSCEVs
: State.ExpandedSCEVs);
#ifdef EXPENSIVE_CHECKS
assert(DT->verify(DominatorTree::VerificationLevel::Fast));
#endif
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Better move the expensive assert to the end of createVectorizedLoopSkeleton()'s than here after calling it? This appeared at the end of completeLoopSkeleton(), called at the end of all three create[Epilogue]VectorizedLoopSkeleton()'s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we would have to move it to all implementations of createVectorizedLoopSkeleton. Left here for now, as this keeps it at a single place and makes sure it is called fro all all different ILV specializations.


Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good to add some explanation somewhere what IR blocks are wrapped when building VPlan(s), what IR blocks are built when creating the Skeleton and need VPlan to be wrap them now, and what blocks are created during VPlan execution.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Extended comment in createVectorizaedLoopSkeleton, moved the replacement to VPlan::execute and added comment there as well.

// Only use noalias metadata when using memory checks guaranteeing no overlap
// across all iterations.
Expand Down Expand Up @@ -7557,6 +7488,18 @@ LoopVectorizationPlanner::executePlan(

ILV.printDebugTracesAtEnd();

// 4. Adjust branch weight of the branch in the middle block.
auto *MiddleTerm =
cast<BranchInst>(State.CFG.VPBB2IRBB[ExitVPBB]->getTerminator());
if (MiddleTerm->isConditional() &&
hasBranchWeightMD(*OrigLoop->getLoopLatch()->getTerminator())) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thoughts independent of this patch:

  • record an indicator that the branch has weights.
  • assign new branch weights regardless of existing ones, as the new weights are independent of the old ones.
  • assign more accurate weights if trip-count and VFxUF are known to have a (greatest) common divider.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where should we record it? Ideally on the branch-on-cond VPInst, but that would require supporting metadata on VPinst at least.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assigning weights deserves further discussion; the indicator could be recorded in VPlan itself - whether weights should be assigned to all new branches (possibly adjusting weights of existing branches).

// Assume that `Count % VectorTripCount` is equally distributed.
unsigned TripCount = State.UF * State.VF.getKnownMinValue();
assert(TripCount > 0 && "trip count should not be zero");
const uint32_t Weights[] = {1, TripCount - 1};
setBranchWeights(*MiddleTerm, Weights, /*IsExpected=*/false);
}

return {State.ExpandedSCEVs, ReductionResumeValues};
}

Expand Down Expand Up @@ -7613,7 +7556,7 @@ EpilogueVectorizerMainLoop::createEpilogueVectorizedLoopSkeleton(
// inductions in the epilogue loop are created before executing the plan for
// the epilogue loop.

return {completeLoopSkeleton(), nullptr};
return {LoopVectorPreHeader, nullptr};
}

void EpilogueVectorizerMainLoop::printDebugTracesAtStart() {
Expand Down Expand Up @@ -7802,7 +7745,7 @@ EpilogueVectorizerEpilogueLoop::createEpilogueVectorizedLoopSkeleton(
{VecEpilogueIterationCountCheck,
EPI.VectorTripCount} /* AdditionalBypass */);

return {completeLoopSkeleton(), EPResumeVal};
return {LoopVectorPreHeader, EPResumeVal};
}

BasicBlock *
Expand Down Expand Up @@ -7847,7 +7790,6 @@ EpilogueVectorizerEpilogueLoop::emitMinimumVectorEpilogueIterCountCheck(
setBranchWeights(BI, Weights, /*IsExpected=*/false);
}
ReplaceInstWithInst(Insert->getTerminator(), &BI);

LoopBypassBlocks.push_back(Insert);
return Insert;
}
Expand Down Expand Up @@ -8552,9 +8494,17 @@ LoopVectorizationPlanner::tryToBuildVPlanWithVPRecipes(VFRange &Range) {
// modified; a basic block for the vector pre-header, followed by a region for
// the vector loop, followed by the middle basic block. The skeleton vector
// loop region contains a header and latch basic blocks.

bool RequiresScalarEpilogueCheck =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More consistent to set RequiresScalarEpilogue retaining the original condition rather than flipping it? A scalar epilog check is required in general unless the scalar epilog is surely to be reached or is surely to be avoided, this condition ensures the former only.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changing the logic leads to a few regressions, better to change this as follow up?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agree to have a follow up patch deal with regressions, but does retaining the original condition alone leads to any? I.e., RequiresScalarEpilogue = ... return CM.requiresScalarEpilogue(VF.isVector()) ...? The range should be clamped the same.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Now after updating all tests, there appears to be some changes only in the VPlan-native outer-loop vectorization path. Need to take a closer look.

LoopVectorizationPlanner::getDecisionAndClampRange(
[this](ElementCount VF) {
return !CM.requiresScalarEpilogue(VF.isVector());
},
Range);
VPlanPtr Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
*PSE.getSE(), OrigLoop->getLoopPreheader());
*PSE.getSE(), RequiresScalarEpilogueCheck, CM.foldTailByMasking(),
OrigLoop);
VPBasicBlock *HeaderVPBB = new VPBasicBlock("vector.body");
VPBasicBlock *LatchVPBB = new VPBasicBlock("vector.latch");
VPBlockUtils::insertBlockAfter(LatchVPBB, HeaderVPBB);
Expand Down Expand Up @@ -8802,7 +8752,7 @@ VPlanPtr LoopVectorizationPlanner::buildVPlan(VFRange &Range) {
// Create new empty VPlan
auto Plan = VPlan::createInitialVPlan(
createTripCountSCEV(Legal->getWidestInductionType(), PSE, OrigLoop),
*PSE.getSE(), OrigLoop->getLoopPreheader());
*PSE.getSE(), true, false, OrigLoop);

// Build hierarchical CFG
VPlanHCFGBuilder HCFGBuilder(OrigLoop, LI, *Plan);
Expand Down Expand Up @@ -10163,6 +10113,8 @@ bool LoopVectorizePass::processLoop(Loop *L) {
cast<VPHeaderPHIRecipe>(&R)->setStartValue(StartVal);
}

assert(DT->verify(DominatorTree::VerificationLevel::Fast) &&
"DT not preserved correctly");
LVP.executePlan(EPI.EpilogueVF, EPI.EpilogueUF, BestEpiPlan, EpilogILV,
DT, true, &ExpandedSCEVs);
++LoopsEpilogueVectorized;
Expand Down
Loading
Loading