Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

AMDGPU/NewPM: Fill out addPreISelPasses #102814

Merged
merged 1 commit into from
Aug 14, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
55 changes: 53 additions & 2 deletions llvm/lib/Target/AMDGPU/AMDGPUCodeGenPassBuilder.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -9,9 +9,17 @@
#include "AMDGPUCodeGenPassBuilder.h"
#include "AMDGPU.h"
#include "AMDGPUISelDAGToDAG.h"
#include "AMDGPUPerfHintAnalysis.h"
#include "AMDGPUTargetMachine.h"
#include "AMDGPUUnifyDivergentExitNodes.h"
#include "SIFixSGPRCopies.h"
#include "llvm/Analysis/UniformityAnalysis.h"
#include "llvm/Transforms/Scalar/FlattenCFG.h"
#include "llvm/Transforms/Scalar/Sink.h"
#include "llvm/Transforms/Scalar/StructurizeCFG.h"
#include "llvm/Transforms/Utils/FixIrreducible.h"
#include "llvm/Transforms/Utils/LCSSA.h"
#include "llvm/Transforms/Utils/UnifyLoopExits.h"

using namespace llvm;

Expand All @@ -28,8 +36,51 @@ AMDGPUCodeGenPassBuilder::AMDGPUCodeGenPassBuilder(
}

void AMDGPUCodeGenPassBuilder::addPreISel(AddIRPass &addPass) const {
// TODO: Add passes pre instruction selection.
// Test only, convert to real IR passes in future.
const bool LateCFGStructurize = AMDGPUTargetMachine::EnableLateStructurizeCFG;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this function run yet, or is this just preparatory work/NFC?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does run. We have a handful of tests running new PM already. I'm trying to avoid test failures caused by not running the same set of passes in the future

const bool DisableStructurizer = AMDGPUTargetMachine::DisableStructurizer;
const bool EnableStructurizerWorkarounds =
AMDGPUTargetMachine::EnableStructurizerWorkarounds;

if (TM.getOptLevel() > CodeGenOptLevel::None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

tiny nit: put the opt level in a variable to avoid repeating the call?

addPass(FlattenCFGPass());

if (TM.getOptLevel() > CodeGenOptLevel::None)
addPass(SinkingPass());

addPass(AMDGPULateCodeGenPreparePass(TM));

// Merge divergent exit nodes. StructurizeCFG won't recognize the multi-exit
// regions formed by them.

addPass(AMDGPUUnifyDivergentExitNodesPass());

if (!LateCFGStructurize && !DisableStructurizer) {
if (EnableStructurizerWorkarounds) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this option matter anymore?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Which one? The LateCFGStructurize was never enabled (and I'd need to look if the underlying pass is still there anymore)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry about that! I meant the option to disable the workarounds. They have been on by default for a very long time.

addPass(FixIrreduciblePass());
addPass(UnifyLoopExitsPass());
}

addPass(StructurizeCFGPass(/*SkipUniformRegions=*/false));
}

addPass(AMDGPUAnnotateUniformValuesPass());

if (!LateCFGStructurize && !DisableStructurizer) {
addPass(SIAnnotateControlFlowPass(TM));

// TODO: Move this right after structurizeCFG to avoid extra divergence
// analysis. This depends on stopping SIAnnotateControlFlow from making
// control flow modifications.
addPass(AMDGPURewriteUndefForPHIPass());
}

addPass(LCSSAPass());

if (TM.getOptLevel() > CodeGenOptLevel::Less)
addPass(AMDGPUPerfHintAnalysisPass(TM));

// FIXME: Why isn't this queried as required from AMDGPUISelDAGToDAG, and why
// isn't this in addInstSelector?
addPass(RequireAnalysisPass<UniformityInfoAnalysis, Function>());
}

Expand Down
8 changes: 5 additions & 3 deletions llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -338,10 +338,11 @@ static cl::opt<bool> EnableScalarIRPasses(
cl::init(true),
cl::Hidden);

static cl::opt<bool> EnableStructurizerWorkarounds(
static cl::opt<bool, true> EnableStructurizerWorkarounds(
"amdgpu-enable-structurizer-workarounds",
cl::desc("Enable workarounds for the StructurizeCFG pass"), cl::init(true),
cl::Hidden);
cl::desc("Enable workarounds for the StructurizeCFG pass"),
cl::location(AMDGPUTargetMachine::EnableStructurizerWorkarounds),
cl::init(true), cl::Hidden);

static cl::opt<bool, true> EnableLowerModuleLDS(
"amdgpu-enable-lower-module-lds", cl::desc("Enable lower module lds pass"),
Expand Down Expand Up @@ -616,6 +617,7 @@ bool AMDGPUTargetMachine::EnableLateStructurizeCFG = false;
bool AMDGPUTargetMachine::EnableFunctionCalls = false;
bool AMDGPUTargetMachine::EnableLowerModuleLDS = true;
bool AMDGPUTargetMachine::DisableStructurizer = false;
bool AMDGPUTargetMachine::EnableStructurizerWorkarounds = true;

AMDGPUTargetMachine::~AMDGPUTargetMachine() = default;

Expand Down
1 change: 1 addition & 0 deletions llvm/lib/Target/AMDGPU/AMDGPUTargetMachine.h
Original file line number Diff line number Diff line change
Expand Up @@ -38,6 +38,7 @@ class AMDGPUTargetMachine : public LLVMTargetMachine {
static bool EnableFunctionCalls;
static bool EnableLowerModuleLDS;
static bool DisableStructurizer;
static bool EnableStructurizerWorkarounds;

AMDGPUTargetMachine(const Target &T, const Triple &TT, StringRef CPU,
StringRef FS, const TargetOptions &Options,
Expand Down
2 changes: 1 addition & 1 deletion llvm/test/CodeGen/AMDGPU/bug-v4f64-subvector.ll
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
; RUN: llc < %s -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -start-before=amdgpu-isel -stop-after=amdgpu-isel -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK
; RUN: llc < %s -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -stop-after=amdgpu-isel -enable-new-pm | FileCheck %s --check-prefixes=CHECK
; RUN: llc < %s -mtriple=amdgcn-amd-amdhsa -mcpu=gfx900 -start-before=amdgpu-isel -stop-after=amdgpu-isel -enable-new-pm -verify-machineinstrs | FileCheck %s --check-prefixes=CHECK

; This caused failure in infinite cycle in Selection DAG (combine) due to missing insert_subvector.
;
Expand Down
Loading