-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Sanity check profiler atomics #113448
Sanity check profiler atomics #113448
Conversation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) soon. Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (
|
This PR changes how LLVM is built. Consider updating src/bootstrap/download-ci-llvm-stamp. |
I've built a toolchain with this change for x86_64, aarch64 and mipsel (the one I need that doesn't support 64 bit atomics) and tested outputs of the toolchain. They all work as intended. |
Did do try linking the resulting binary against libatomic (which should provide the necessary symbols) ? If not you can try adding If it does than it might be better to link against libatomic than to provide a sub-par experience. |
@Urgau I tried it for the sake of completeness. Same error. My take is: the atomics aren't function calls, they're LLVM intrinsics. They shouldn't ever leave the backend of the compiler, but it assumes that since it doesn't know what that call is, that it should just call that as a function. It puts the call in the output and we get our error. FYI, sync_fetch_and_add_8 isn't in gnu libatomic. |
// So, don't do it on 32 bit platforms | ||
if (TargetTriple.isArch64Bit()){ | ||
Options.Atomic = true; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does work on i686-unknown-linux-gnu
, at least, so this must be more nuanced than just being 64-bit.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes and no. 32bit architectures can’t do them in a single operation because their registers aren’t large enough for the 64 bit value used by the counter.
Some 32 bit (x86 and arm sub targets that support sync) have library implementations for the operation.
But, those library implementations are slow. This means a significant slowdown and change of timing for the program.
So, it’s a trade for either better accuracy in counts at the cost of a slow program with timing that is influenced by the profiler, or a program that runs more like normal and works on all 32 bit platforms, at the cost of potential undercounts in the profiler. It can’t undercount to 0 though.
The primary use here, code coverage measurement, is unaffected by the potential inaccuracy.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are we introducing UB if we instrument a data race by non-atomic updates?
That's not a rhetorical question - I really don't know. Maybe it's done low enough in the stack that such formal UB doesn't exist, but I think we should be very sure about it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
UB = undefined behavior?
The variables being incremented are distinct per function, so the race would be between threads calling the same function. The worst case is that they both read and increment, then the thread changes and they end up overwriting each others increment.
Net result is an undercount by 1 anytime there is a collision.
LLVM’s profiler ran this way exclusively for more a decade. The option for Atomics was added but is still defaulted to off.
We could also check for x86 and arm and turn it on for them. I don’t really have a stake either way.
My personal opinion is that I’d rather trade a little accuracy in counts for more normal performance. Makes analysis of races and other timing critical things more useful.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have target information about this support at the Rust level, cfg(target_has_atomic="64")
and max_atomic_width()
, so maybe we can just pass that in as yet-another parameter? Either making that bool InstrumentCoverage
a tri-state flag, or adding another bool
for whether to use atomics.
(LLVMRustOptimize
is getting so many parameters that we might want a new struct...)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We have target information about this support at the Rust level,
cfg(target_has_atomic="64")
andmax_atomic_width()
, so maybe we can just pass that in as yet-another parameter? Either making thatbool InstrumentCoverage
a tri-state flag, or adding anotherbool
for whether to use atomics.(
LLVMRustOptimize
is getting so many parameters that we might want a new struct...)
IMHO, this is absolutely the right answer. I'm not sure that I'm well versed enough in all of this to do that, but I'd be willing to give it a shot if I could impose on somebody for a "block diagram" of where it would need to be wired in.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The definition of LLVMRustOptimize here needs to be changed to make the InstrumentCoverage parameter correct to the new type introduced, or to add a new parameter.
rust/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp
Lines 612 to 623 in 64b932d
extern "C" LLVMRustResult | |
LLVMRustOptimize( | |
LLVMModuleRef ModuleRef, | |
LLVMTargetMachineRef TMRef, | |
LLVMRustPassBuilderOptLevel OptLevelRust, | |
LLVMRustOptStage OptStage, | |
bool NoPrepopulatePasses, bool VerifyIR, bool UseThinLTOBuffers, | |
bool MergeFunctions, bool UnrollLoops, bool SLPVectorize, bool LoopVectorize, | |
bool DisableSimplifyLibCalls, bool EmitLifetimeMarkers, | |
LLVMRustSanitizerOptions *SanitizerOptions, | |
const char *PGOGenPath, const char *PGOUsePath, | |
bool InstrumentCoverage, const char *InstrProfileOutput, |
Then the binding in cg_llvm to that function needs to be changed to match:
rust/compiler/rustc_codegen_llvm/src/llvm/ffi.rs
Line 2329 in 8ca44ef
pub fn LLVMRustOptimize<'a>( |
And the callsite of LLVMRustOptimize need to be altered appropriately:
let result = llvm::LLVMRustOptimize( |
You may have trouble getting the "does our target have atomics?" information at this point, you may need to work back a bit, possibly by making sure the information is accessible via the "god-object" of CodegenContext:
rust/compiler/rustc_codegen_llvm/src/back/write.rs
Lines 460 to 461 in 8ca44ef
pub(crate) unsafe fn llvm_optimize( | |
cgcx: &CodegenContext<LlvmCodegenBackend>, |
Which is documented here:
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/back/write/struct.CodegenContext.html
And parameterized by this:
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/struct.LlvmCodegenBackend.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@workingjubilee Thanks!
I would highly recommend squashing all of these changes down into one simple commit, so that the intermediate commits don't end up obscuring the history of unrelated lines in the file. |
a8217c7
to
64b932d
Compare
@Zalathar Fair. I assumed you guys squashed on PR merge. I assume from your response that you don't, so I've gone ahead and done it. |
Isn't this code path also used for the PGO data collection? Or does that use some other coverage instrumentation? If so, that is a case where having accurate counts may be quite important. I don't think we do PGO on non-64-bit targets today in our CI anywhere, so it's probably not easy to check against our existing benchmarks... |
Issue #91092 also cared about accurate counts, which is why we made it atomic in the first place. |
I am pretty sure we do not care about libatomic at all. |
I think you're right. I was responding to the suggestion of linking with it as a resolution. |
Ah, true. I didn't make the connection.
Registers are a convenient illusion provided by the CPU, and CMPXCHG8B dates back to the Pentium, but I mean obviously if we could tell LLVM to use |
@rustbot author |
@SeanMollet any updates? |
If you saw my last comment before I deleted it, you'll see how out of touch I am right now: buried trying to get the product out the door. We have a reasonable proposed work-around solution that I'd forgotten about. My work-around works well enough to solve my immediate problem. I'll come back to this after shipping and implement the more thorough work-around. |
Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this. @rustbot label: +S-inactive |
This fixes #112313
Probably the ideal way to fix it would be to add support for 64 bit Atomic operations to LLVM for every platform. I don't have time to do that and I don't think it's really necessary anyway. Atomics improve accuracy of profiling, but that's not really needed for checking code coverage (which is the primary use of it here). Non-zero values are sufficient to know that something has been called.