Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Rust compilations are not reproducible #30330

Closed
glandium opened this issue Dec 11, 2015 · 11 comments
Closed

Rust compilations are not reproducible #30330

glandium opened this issue Dec 11, 2015 · 11 comments
Labels
T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@glandium
Copy link
Contributor

It is desirable for repeated builds of the same source with the same compiler produces the same object files. That is something that C/C++ compilers usually do (modulo some randomization that can be overcome with e.g. the -frandom-seed flag), and that allows for reproducible builds.

While the machine code that rustc emits is apparently consistent, the symbols it creates aren't. For instance, when building Firefox with the rust bits enabled on Mozilla's try server twice in a row, I get many differences in the symbol list like the following:

< 0000000002693700 l     F .text    0000000000000008              _ZN3fmt23_$RF$$u27$a$u20$T.Debug3fmt20h1447159415212165900E

---
> 0000000002693700 l     F .text    0000000000000008              _ZN3fmt23_$RF$$u27$a$u20$T.Debug3fmt20h7964148332310452618E

rustc should either emit the same symbol names or allow to seed the RNG it uses like gcc allows. (The former would be more appreciated)

Cc @froydnj

@larsbergstrom
Copy link
Contributor

cc @alexcrichton @nikomatsakis

@sfackler
Copy link
Member

Probably comes down to the use of the default hasher in HashMaps.

@nikomatsakis
Copy link
Contributor

rust-lang/rfcs#689 seems related. We're due to simplify our name mangling scheme in any case.

@froydnj
Copy link
Contributor

froydnj commented Dec 14, 2015

For clarification's sake, @glandium, is the problem that the symbol names themselves differ from one run to the next (that sounds a bit terrifying) or that the symbol table of the object file contains the same symbols, but differently ordered from one run to the next?

@alexcrichton
Copy link
Member

I believe this is definitely a problem with the Rust compiler itself. For example:

fn main() {
    foo::<i32>();
    foo::<u32>();
}

fn foo<T>() {
}

If I compile it and take a look at the symbols:

rustc foo.rs && nm -a ./foo | awk '{print $3}' | sort | md5sum

The hash printed is different each run of the compiler. The diff of what symbols are defined looks like:

--- before  2015-12-14 13:19:44.697749334 -0800
+++ after   2015-12-14 13:19:45.801766515 -0800
@@ -1342,8 +1342,8 @@
 _ZN3fmt8builders13_$LT$impl$GT$6finish20haa6fd0edf5f02227JXVE
 _ZN3fmt8builders13_$LT$impl$GT$9write_str20h1bc8784cdc379274mNVE
 _ZN3fmt8builders15debug_tuple_new20h4c8b35a336fdf8c4oUVE
-_ZN3foo21h15709589760389728117E
-_ZN3foo21h16839530509027187784E
+_ZN3foo21h11472057698419401202E
+_ZN3foo21h13246034627493895292E
 _ZN3num13_$LT$impl$GT$3fmt20h3309cfa0b9d84ea0gojE
 _ZN3num13_$LT$impl$GT$8from_str20h7438669d1b6422efe4iE
 _ZN3num14from_str_radix10_FILE_LINE20h828d2697f7b0d911XijE

So, to be clear, this is nondeterminism in the Rust compiler itself. The source file does not have to change, nor does the compiler itself have to change. Currently when the same compiler is run on the same source it will produce different results each time.

Seems bad!

@huonw huonw added the T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. label Dec 15, 2015
@eddyb
Copy link
Member

eddyb commented Dec 21, 2015

How did this happen? We don't explicitly use RNG in the compiler and I don't recall seeing this before.

EDIT: Apparently I'm misremembering and hash_crate_independent is only used for TypeId, not symbols. We dump the type to a string for the symbol hash.

@codyps
Copy link
Contributor

codyps commented Feb 8, 2016

FWIW, meta-rust (which I maintain) carries a few awful hacks to keep builds a bit more reproducible (https://github.com/jmesmon/meta-rust/pull/33, there is also a patch that futzes with symbol hashing) as bitbake does not like builds changing unexpectedly.

@michaelwoerister
Copy link
Member

Well, it seems that the code generating the symbol hash for monomorphized functions is incorporating memory addresses into the hash:

// from librustc_trans/trans/monomorphize.rs
let hash;
let s = {
    let mut state = SipHasher::new();
    hash_id.hash(&mut state);
    mono_ty.hash(&mut state); 
 // ^^^^^^^^^^^^^^^^^^^^^^^^^ 
 // the hash of a ty::Ty is derived from a memory address,
 // hash_id above also contains a vector of ty::Ty

    hash = format!("h{}", state.finish());
    let path = ccx.tcx().map.def_path_from_id(fn_node_id);
    exported_name(path, &hash[..])
};

So, no wonder :)
A simple fix would be to do the same as the code in trans::back::link and hash the encoded types.

@nikomatsakis
Copy link
Contributor

On Tue, Feb 09, 2016 at 03:17:27AM -0800, Michael Woerister wrote:

Well, it seems that the code generating the symbol hash for monomorphized functions is incorporating memory addresses into the hash:

Oh dear :)

bors added a commit that referenced this issue Mar 15, 2016
WIP: Implement stable symbol-name generation algorithm.

This PR changes the way symbol names are generated by the compiler. The new algorithm reflects the current state of the discussion over at rust-lang/rfcs#689.

Once it is done, it will also fix issue #30330. I want to add a test case for that before closing it though.

I also want to do some performance tests. The new algorithm does a little more work than the previous one due to various reasons, and it might make sense to adapt it in a way that allows it to be implemented more efficiently.

@nikomatsakis: It would be nice if there was a way of finding out if a `DefPath` refers to something in the current crate or in an external one. The information is already there, it's just not accessible at the moment. I'll probably propose some minor changes there, together with some facilities to allow for accessing `DefPaths` without allocating a `Vec` for them.

**TODO**
 - ~~Actually "crate qualify" symbols, as promised in the docs.~~
 - ~~Add a test case showing that symbol names are deterministic~~.
 - Maybe add a test case showing that symbol names are stable against small code changes.

~~One thing that might be interesting to the @rust-lang/compiler team:
I've used SipHash exclusively now for generating symbol hashes. Previously it was only used for monomorphizations and the rest of the code used a truncated version on SHA256. Is there any benefit to sticking to SHA? I don't really see one since we only used 64 bits of the digest anyway, but maybe I'm missing something?~~ ==> Just switched things back to SHA-2 for now.
@tbu-
Copy link
Contributor

tbu- commented Apr 17, 2016

Is this fixed by #32293?

@alexcrichton
Copy link
Member

Indeed confirmed fixed on nightly!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests