Sanity check profiler atomics #113448

SeanMollet · 2023-07-07T16:33:53Z

This fixes #112313

Probably the ideal way to fix it would be to add support for 64 bit Atomic operations to LLVM for every platform. I don't have time to do that and I don't think it's really necessary anyway. Atomics improve accuracy of profiling, but that's not really needed for checking code coverage (which is the primary use of it here). Non-zero values are sufficient to know that something has been called.

rustbot · 2023-07-07T16:34:01Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @cuviper (or someone else) soon.

Please see the contribution instructions for more information. Namely, in order to ensure the minimum review times lag, PR authors and assigned reviewers should ensure that the review label (S-waiting-on-review and S-waiting-on-author) stays updated, invoking these commands when appropriate:

@rustbot author: the review is finished, PR author should check the comments and take action accordingly
@rustbot review: the author is ready for a review, this PR will be queued again in the reviewer's queue

rustbot · 2023-07-07T16:34:04Z

This PR changes how LLVM is built. Consider updating src/bootstrap/download-ci-llvm-stamp.

SeanMollet · 2023-07-07T16:43:30Z

I've built a toolchain with this change for x86_64, aarch64 and mipsel (the one I need that doesn't support 64 bit atomics) and tested outputs of the toolchain. They all work as intended.

Urgau · 2023-07-07T17:09:10Z

Did do try linking the resulting binary against libatomic (which should provide the necessary symbols) ?

If not you can try adding println!("cargo:rustc-link-lib=atomic"); in your build.rs and see if it works.

If it does than it might be better to link against libatomic than to provide a sub-par experience.

SeanMollet · 2023-07-07T21:51:57Z

@Urgau I tried it for the sake of completeness. Same error.

My take is: the atomics aren't function calls, they're LLVM intrinsics. They shouldn't ever leave the backend of the compiler, but it assumes that since it doesn't know what that call is, that it should just call that as a function. It puts the call in the output and we get our error.

FYI, sync_fetch_and_add_8 isn't in gnu libatomic.

cuviper · 2023-07-07T22:14:27Z

compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp

+        // So, don't do it on 32 bit platforms
+        if (TargetTriple.isArch64Bit()){
+          Options.Atomic = true;
+        }


It does work on i686-unknown-linux-gnu, at least, so this must be more nuanced than just being 64-bit.

Yes and no. 32bit architectures can’t do them in a single operation because their registers aren’t large enough for the 64 bit value used by the counter.

Some 32 bit (x86 and arm sub targets that support sync) have library implementations for the operation.

But, those library implementations are slow. This means a significant slowdown and change of timing for the program.

So, it’s a trade for either better accuracy in counts at the cost of a slow program with timing that is influenced by the profiler, or a program that runs more like normal and works on all 32 bit platforms, at the cost of potential undercounts in the profiler. It can’t undercount to 0 though.

The primary use here, code coverage measurement, is unaffected by the potential inaccuracy.

Are we introducing UB if we instrument a data race by non-atomic updates?

That's not a rhetorical question - I really don't know. Maybe it's done low enough in the stack that such formal UB doesn't exist, but I think we should be very sure about it.

UB = undefined behavior?

The variables being incremented are distinct per function, so the race would be between threads calling the same function. The worst case is that they both read and increment, then the thread changes and they end up overwriting each others increment.

Net result is an undercount by 1 anytime there is a collision.

LLVM’s profiler ran this way exclusively for more a decade. The option for Atomics was added but is still defaulted to off.

We could also check for x86 and arm and turn it on for them. I don’t really have a stake either way.

My personal opinion is that I’d rather trade a little accuracy in counts for more normal performance. Makes analysis of races and other timing critical things more useful.

We have target information about this support at the Rust level, cfg(target_has_atomic="64") and max_atomic_width(), so maybe we can just pass that in as yet-another parameter? Either making that bool InstrumentCoverage a tri-state flag, or adding another bool for whether to use atomics.

(LLVMRustOptimize is getting so many parameters that we might want a new struct...)

We have target information about this support at the Rust level, cfg(target_has_atomic="64") and max_atomic_width(), so maybe we can just pass that in as yet-another parameter? Either making that bool InstrumentCoverage a tri-state flag, or adding another bool for whether to use atomics.

(LLVMRustOptimize is getting so many parameters that we might want a new struct...)

IMHO, this is absolutely the right answer. I'm not sure that I'm well versed enough in all of this to do that, but I'd be willing to give it a shot if I could impose on somebody for a "block diagram" of where it would need to be wired in.

The definition of LLVMRustOptimize here needs to be changed to make the InstrumentCoverage parameter correct to the new type introduced, or to add a new parameter.

rust/compiler/rustc_llvm/llvm-wrapper/PassWrapper.cpp

Lines 612 to 623 in 64b932d

extern "C" LLVMRustResult

LLVMRustOptimize(

LLVMModuleRef ModuleRef,

LLVMTargetMachineRef TMRef,

LLVMRustPassBuilderOptLevel OptLevelRust,

LLVMRustOptStage OptStage,

bool NoPrepopulatePasses, bool VerifyIR, bool UseThinLTOBuffers,

bool MergeFunctions, bool UnrollLoops, bool SLPVectorize, bool LoopVectorize,

bool DisableSimplifyLibCalls, bool EmitLifetimeMarkers,

LLVMRustSanitizerOptions *SanitizerOptions,

const char *PGOGenPath, const char *PGOUsePath,

bool InstrumentCoverage, const char *InstrProfileOutput,

Then the binding in cg_llvm to that function needs to be changed to match:

rust/compiler/rustc_codegen_llvm/src/llvm/ffi.rs

Line 2329 in 8ca44ef

pub fn LLVMRustOptimize<'a>(

And the callsite of LLVMRustOptimize need to be altered appropriately:

rust/compiler/rustc_codegen_llvm/src/back/write.rs

Line 511 in 8ca44ef

let result = llvm::LLVMRustOptimize(

You may have trouble getting the "does our target have atomics?" information at this point, you may need to work back a bit, possibly by making sure the information is accessible via the "god-object" of CodegenContext:

rust/compiler/rustc_codegen_llvm/src/back/write.rs

Lines 460 to 461 in 8ca44ef

pub(crate) unsafe fn llvm_optimize(

cgcx: &CodegenContext<LlvmCodegenBackend>,

Which is documented here:
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_ssa/back/write/struct.CodegenContext.html

And parameterized by this:
https://doc.rust-lang.org/nightly/nightly-rustc/rustc_codegen_llvm/struct.LlvmCodegenBackend.html

@workingjubilee Thanks!

Zalathar · 2023-07-08T00:31:13Z

I would highly recommend squashing all of these changes down into one simple commit, so that the intermediate commits don't end up obscuring the history of unrelated lines in the file.

…rms.

SeanMollet · 2023-07-08T00:45:29Z

@Zalathar Fair. I assumed you guys squashed on PR merge. I assume from your response that you don't, so I've gone ahead and done it.

Mark-Simulacrum · 2023-07-10T22:09:59Z

Atomics improve accuracy of profiling, but that's not really needed for checking code coverage (which is the primary use of it here).

Isn't this code path also used for the PGO data collection? Or does that use some other coverage instrumentation? If so, that is a case where having accurate counts may be quite important. I don't think we do PGO on non-64-bit targets today in our CI anywhere, so it's probably not easy to check against our existing benchmarks...

cuviper · 2023-07-10T22:15:51Z

Issue #91092 also cared about accurate counts, which is why we made it atomic in the first place.

workingjubilee · 2023-07-10T23:06:48Z

sync_fetch_and_add_8 isn't in gnu libatomic.

I am pretty sure we do not care about libatomic at all.

SeanMollet · 2023-07-10T23:12:21Z

sync_fetch_and_add_8 isn't in gnu libatomic.

I am pretty sure we do not care about libatomic at all.

I think you're right. I was responding to the suggestion of linking with it as a resolution.

workingjubilee · 2023-07-11T00:11:12Z

Ah, true. I didn't make the connection.

Yes and no. 32bit architectures can’t do them in a single operation because their registers aren’t large enough for the 64 bit value used by the counter.

Registers are a convenient illusion provided by the CPU, and CMPXCHG8B dates back to the Pentium, but I mean obviously if we could tell LLVM to use AtomicUsize if supported we would, I guess?

cuviper · 2023-07-21T21:49:16Z

@rustbot author

Dylan-DPC · 2023-08-18T12:39:33Z

@SeanMollet any updates?

SeanMollet · 2023-08-18T13:19:23Z

If you saw my last comment before I deleted it, you'll see how out of touch I am right now: buried trying to get the product out the door. We have a reasonable proposed work-around solution that I'd forgotten about.

My work-around works well enough to solve my immediate problem. I'll come back to this after shipping and implement the more thorough work-around.

JohnCSimon · 2023-11-12T18:07:13Z

@SeanMollet

Ping from triage: I'm closing this due to inactivity, Please reopen when you are ready to continue with this.
Note: if you are going to continue please open the PR BEFORE you push to it, else you won't be able to reopen - this is a quirk of github.
Thanks for your contribution.

@rustbot label: +S-inactive

rustbot assigned cuviper Jul 7, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jul 7, 2023

cuviper reviewed Jul 7, 2023

View reviewed changes

Change PassWrapper.cpp to not emit Atomic operations on 32 bit platfo…

64b932d

…rms.

SeanMollet force-pushed the SanityCheckProfilerAtomics branch from a8217c7 to 64b932d Compare July 8, 2023 00:43

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 21, 2023

rustbot added the S-inactive Status: Inactive and waiting on the author. This is often applied to closed PRs. label Nov 12, 2023

JohnCSimon closed this Nov 12, 2023

Dylan-DPC removed the S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. label Nov 12, 2023

briansmith mentioned this pull request Jan 3, 2024

building for mipsel, link fails on missing symbol __sync_fetch_and_add_8, which doesn't exist on mips and shouldn't be getting called #112313

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Sanity check profiler atomics #113448

Sanity check profiler atomics #113448

SeanMollet commented Jul 7, 2023

rustbot commented Jul 7, 2023

rustbot commented Jul 7, 2023

SeanMollet commented Jul 7, 2023

Urgau commented Jul 7, 2023

SeanMollet commented Jul 7, 2023

cuviper Jul 7, 2023

SeanMollet Jul 7, 2023

cuviper Jul 7, 2023

SeanMollet Jul 7, 2023

cuviper Jul 10, 2023

SeanMollet Jul 10, 2023

workingjubilee Jul 11, 2023

SeanMollet Jul 11, 2023

Zalathar commented Jul 8, 2023

SeanMollet commented Jul 8, 2023

Mark-Simulacrum commented Jul 10, 2023

cuviper commented Jul 10, 2023

workingjubilee commented Jul 10, 2023

SeanMollet commented Jul 10, 2023

workingjubilee commented Jul 11, 2023 •

edited

Loading

cuviper commented Jul 21, 2023

Dylan-DPC commented Aug 18, 2023

SeanMollet commented Aug 18, 2023

JohnCSimon commented Nov 12, 2023

	extern "C" LLVMRustResult
	LLVMRustOptimize(
	LLVMModuleRef ModuleRef,
	LLVMTargetMachineRef TMRef,
	LLVMRustPassBuilderOptLevel OptLevelRust,
	LLVMRustOptStage OptStage,
	bool NoPrepopulatePasses, bool VerifyIR, bool UseThinLTOBuffers,
	bool MergeFunctions, bool UnrollLoops, bool SLPVectorize, bool LoopVectorize,
	bool DisableSimplifyLibCalls, bool EmitLifetimeMarkers,
	LLVMRustSanitizerOptions *SanitizerOptions,
	const char PGOGenPath, const char PGOUsePath,
	bool InstrumentCoverage, const char *InstrProfileOutput,

	pub(crate) unsafe fn llvm_optimize(
	cgcx: &CodegenContext<LlvmCodegenBackend>,

Sanity check profiler atomics #113448

Sanity check profiler atomics #113448

Conversation

SeanMollet commented Jul 7, 2023

rustbot commented Jul 7, 2023

rustbot commented Jul 7, 2023

SeanMollet commented Jul 7, 2023

Urgau commented Jul 7, 2023

SeanMollet commented Jul 7, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Zalathar commented Jul 8, 2023

SeanMollet commented Jul 8, 2023

Mark-Simulacrum commented Jul 10, 2023

cuviper commented Jul 10, 2023

workingjubilee commented Jul 10, 2023

SeanMollet commented Jul 10, 2023

workingjubilee commented Jul 11, 2023 • edited Loading

cuviper commented Jul 21, 2023

Dylan-DPC commented Aug 18, 2023

SeanMollet commented Aug 18, 2023

JohnCSimon commented Nov 12, 2023

workingjubilee commented Jul 11, 2023 •

edited

Loading