Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

nondeterministic LLVM-IR with @anon.HASH.number #91168

Closed
fangism opened this issue Nov 24, 2021 · 6 comments
Closed

nondeterministic LLVM-IR with @anon.HASH.number #91168

fangism opened this issue Nov 24, 2021 · 6 comments

Comments

@fangism
Copy link

fangism commented Nov 24, 2021

I am observing that repeating the same command twice with --emit=llvm-ir gives non-deterministic .rlib and .ll outputs.

[I apologize in advance for not providing a full set of repro inputs and commands right now, but I will work with my team on this.]

LLVM-IR differences look like this:

--- /home/me/run-1/libfoo_rust.ll.local  2021-11-24 01:39:16.232960224 +0000
+++ /home/me/run-2/libfoo_rust.ll.local  2021-11-24 01:41:24.957404136 +0000
@@ -864,8 +864,8 @@
 @alloc4102 = private unnamed_addr constant <{ [32 x i8] }> <{ [32 x i8] c"assertion failed: idx < CAPACITY" }>, align 1
 @alloc4103 = private unnamed_addr constant <{ [62 x i8] }> <{ [62 x i8] c"/b/s/w/ir/x/w/rust/library/alloc/src/collections/btree/node.rs" }>, align 1
 @alloc4104 = private unnamed_addr constant <{ i8*, [16 x i8] }> <{ i8* getelementptr inbounds (<{ [62 x i8] }>, <{ [62 x i8] }>* @alloc4103, i32 0, i32 0, i32 0), [16 x i8] c">\00\00\00\00\00\00\00r\02\00\00\09\00\00\00" }>, align 8
-@anon.15741daeb2a46d45fc5cbefe41b02e6d.0 = private unnamed_addr constant <{ [16 x i8] }> <{ [16 x i8] c"\08\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00" }>, align 8
-@anon.15741daeb2a46d45fc5cbefe41b02e6d.1 = private unnamed_addr constant <{ [16 x i8] }> <{ [16 x i8] c"\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00" }>, align 8
+@anon.95861573866d15199b79d9f6e49b68ab.0 = private unnamed_addr constant <{ [16 x i8] }> <{ [16 x i8] c"\08\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00" }>, align 8
+@anon.95861573866d15199b79d9f6e49b68ab.1 = private unnamed_addr constant <{ [16 x i8] }> <{ [16 x i8] c"\01\00\00\00\00\00\00\00\00\00\00\00\00\00\00\00" }>, align 8
 @alloc868 = private unnamed_addr constant <{ [0 x i8] }> zeroinitializer, align 1
 @alloc4131 = private unnamed_addr constant <{ [4 x i8] }> <{ [4 x i8] c"Some" }>, align 1
 @vtable.y = private unnamed_addr constant <{ i8*, [16 x i8], i8*, [0 x i8] }> <{ i8* bitcast (void (%CapabilityName**)* @"_ZN4core3ptr48drop_in_place$LT$$RF$cm_rust..CapabilityName$GT$17he39a5a06256363e9E" to i8*), [16 x i8] c"\08\00\00\00\00\00\00\00\08\00\00\00\00\00\00\00", i8* bitcast (i1 (%CapabilityName**, %"core::fmt::Formatter"*)* @"_ZN42_$LT$$RF$T$u20$as$u20$core..fmt..Debug$GT$3fmt17h6a74665eeacccadaE" to i8*), [0 x i8] zeroinitializer }>, align 8, !dbg !61

Do we know why the hash value in @anon.HASH.n generated by rustc is non-deterministic? and what language construct is producing such anonymous symbols?

Meta

rustc --version --verbose:

rustc 1.58.0-nightly (c9c4b5d72 2021-11-17)
@fangism fangism added the C-bug Category: This is a bug. label Nov 24, 2021
@fangism
Copy link
Author

fangism commented Nov 24, 2021

This should be labeled with A-reproducibility.

@Urgau
Copy link
Member

Urgau commented Nov 24, 2021

Some reproducibility issue with LLVM have been fixed very recently. Can you check with the latest nightly ?

@nagisa nagisa added the A-reproducibility Area: Reproducible / deterministic builds label Nov 25, 2021
@tmandry
Copy link
Member

tmandry commented Nov 29, 2021

This is separate from #90301 which was fixed recently. That was an issue where the output produced by LLVM was nondeterministic for the same LLVM IR. This is an issue where the LLVM IR itself is changing.

@fangism
Copy link
Author

fangism commented Nov 30, 2021

I'm still working on reducing a test case but wanted to add one more observation:

There has been an observable chance that running the same command twice does produce the same result, so it may take multiple attempts to see a difference. What these means for automatic reduction tools like creduce/delta is that it may encounter false-negatives that discard a reduction candidate. For example, I have a test case reduced to a state where its output is repeatable about 50% of the time.

@fangism
Copy link
Author

fangism commented Dec 3, 2021

Update: We've been able to track this down to a procedural macro whose implementation involves iterating over a HashMap. It was shown that this produces different code with -Zunpretty=expanded from run-to-run, which explains the LLVM IR differences.

@fangism
Copy link
Author

fangism commented Dec 6, 2021

Conclusion: The fault for this lies in the proc macro definition, not the tool itself.

That said, it would be nice if there were some way to detect this, be it statically or otherwise, in the interest of deterministic (and security-auditable) builds.

@fangism fangism closed this as completed Dec 6, 2021
@jieyouxu jieyouxu added A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. and removed A-reproducibility Area: Reproducible / deterministic builds A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. C-bug Category: This is a bug. labels Aug 14, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants