-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Codegen][LLVM] Add ability to turn on fast math flags #9223
Changes from 11 commits
18ca603
1c8da0f
444e51a
6938d70
17fa49c
2795510
02cd251
3d6c2c3
b244dec
0c5d38b
c9ac146
99ae59f
d9e3524
5466663
cfeb699
2abbed5
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change | ||
---|---|---|---|---|
|
@@ -106,10 +106,23 @@ void ParseLLVMTargetOptions(const Target& target, std::string* triple, std::stri | |||
#if TVM_LLVM_VERSION < 50 | ||||
opt.LessPreciseFPMADOption = true; | ||||
#endif | ||||
opt.AllowFPOpFusion = llvm::FPOpFusion::Fast; | ||||
opt.UnsafeFPMath = false; | ||||
opt.NoInfsFPMath = false; | ||||
// We depend on generating IR with proper fast math flags to control fast math | ||||
// semantics. These just enable these optimizations if the proper IR flags | ||||
// are set. | ||||
opt.UnsafeFPMath = true; | ||||
opt.NoInfsFPMath = true; | ||||
opt.NoNaNsFPMath = true; | ||||
|
||||
#if TVM_LLVM_VERSION >= 50 | ||||
opt.NoSignedZerosFPMath = true; | ||||
#endif | ||||
|
||||
// Assume no generated code ever needs to handle floating point exceptions. | ||||
opt.NoTrappingFPMath = true; | ||||
|
||||
// TODO(AndrewZhaoLuo): Look into control of setting this flag. | ||||
opt.AllowFPOpFusion = llvm::FPOpFusion::Fast; | ||||
|
||||
if (soft_float_abi) { | ||||
opt.FloatABIType = llvm::FloatABI::Soft; | ||||
} else { | ||||
|
@@ -139,8 +152,22 @@ std::unique_ptr<llvm::TargetMachine> GetLLVMTargetMachine(const Target& target, | |||
ICHECK(allow_null) << err << " target_triple=" << target_triple; | ||||
return nullptr; | ||||
} | ||||
llvm::TargetMachine* tm = | ||||
llvm_target->createTargetMachine(target_triple, mcpu, mattr, opt, llvm::Reloc::PIC_); | ||||
|
||||
Integer llvm_opt_level = target->GetAttr<Integer>("O").value_or(Integer(2)); | ||||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. See tvm/src/target/llvm/codegen_llvm.cc Line 346 in 3229cb3
I don't see why users would want to choose an opt level other than 3. However, internally we may want to prefer faster compile time for the constant folding use case (which currently compiles every subgraph with opt level = 3). There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm good catch. Need to see what the difference between the two optimization settings is. As for the right Opt-Level, I think 3 can lead to slow downs in some situations (granted this is about gcc but same idea):
I did run a trial of fast math + changes the TargetMachine opt level to O2 and some models were faster and some were slower. So we should add the flag to make it easy to test. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Hmm so the link listed is for the PassManager. Higher opt level = more passes much like we have relay opt level. It appears to be associated with -O3 in clang. However, CodeGenOpts appears to be a separate thing that is set in clang too https://github.com/llvm/llvm-project/blob/main/clang/lib/CodeGen/BackendUtil.cpp#L935. Looks like this emits the assembly. This is also associated with clang's -O3. So the flag should control everything. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I made it based on the -O flag added. Default is still 2 however (which is the default for clang I believe) |
||||
llvm::CodeGenOpt::Level llvm_opt; | ||||
if (llvm_opt_level <= 0) { | ||||
llvm_opt = llvm::CodeGenOpt::None; | ||||
} else if (llvm_opt_level == 1) { | ||||
llvm_opt = llvm::CodeGenOpt::Less; | ||||
} else if (llvm_opt_level == 2) { | ||||
llvm_opt = llvm::CodeGenOpt::Default; | ||||
} else { | ||||
// llvm_opt_level >= 3 | ||||
llvm_opt = llvm::CodeGenOpt::Aggressive; | ||||
} | ||||
|
||||
llvm::TargetMachine* tm = llvm_target->createTargetMachine( | ||||
target_triple, mcpu, mattr, opt, llvm::Reloc::PIC_, llvm::CodeModel::Small, llvm_opt); | ||||
return std::unique_ptr<llvm::TargetMachine>(tm); | ||||
} | ||||
|
||||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Better not to change the default values unless there is a good reason.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I need to look deeper at the LLVM code, but I think these optimization respect "fastmath" flags. So if we turn these optimizations on and run it on generated IR without fastmath flags, it should have the same behavior as before.
But yes, let me double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, yes these settings in Clang are passed in from LangOpts which describe the dialect of C or C++ that is accepted. Don't understand it fully and don't want to change the specification here so turned it off.