Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

reproducible builds broken in rustc 1.56.0 due to LLVM 13 update #90301

Closed
Fuuzetsu opened this issue Oct 26, 2021 · 22 comments · Fixed by #90759
Closed

reproducible builds broken in rustc 1.56.0 due to LLVM 13 update #90301

Fuuzetsu opened this issue Oct 26, 2021 · 22 comments · Fixed by #90759
Assignees
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-reproducibility Area: Reproducible / deterministic builds C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@Fuuzetsu
Copy link

Fuuzetsu commented Oct 26, 2021

I attach a tarball with a small reproducer (few dependencies from crates.io, lock file, empty lib.rs).

repro.tar.gz

Now run this using rustc 1.56.0:

for i in $(seq 1 10); do cargo clean &&  cargo build --release --quiet && sha256sum target/release/deps/libpalette_derive-*.so; done

I expect the sha256sum to be the same for the libpalette so file but it's often different.

This breaks any binary caches, similarly to what #89904 did but that ticket turned out to be a problem with the crate.

This time it seems problem with rustc itself (or rather LLVM: stay tuned): 1.55.0 produces same binary/crate hash on every build.

Note that in Cargo.toml there's:

[profile.release]
debug = true

Without this, the issue does not present itself. This perhaps makes it look similar to #89911 or #45397 but I think it's different. As far as I saw, it didn't produce a difference in ordering of symbols in the resulting .so though I may be mistaken. Please feel free to close as duplicate if you deem it to be the same issue.

I have ran with RUSTC_LOG=debug and after careful sifting through few GiB of output, I found the differences seem to start in rustc_codegen_ssa.

I have bisected rustc itself and ran it on our original code which exhibited the problem. I started the bisect at common merge point with 1.55.0 though the problem turned out well into the 1.56.0 release:

$ git bisect good
db002a06ae9154a35d410550bc5132df883d7baa is the first bad commit
commit db002a06ae9154a35d410550bc5132df883d7baa
Merge: e7f7fe462a5 306259c6459
Author: bors <bors@rust-lang.org>
Date:   Sat Aug 21 09:25:28 2021 +0000

    Auto merge of #87570 - nikic:llvm-13, r=nagisa
    
    Upgrade to LLVM 13
    
    Work in progress update to LLVM 13. Main changes:
    
     * InlineAsm diagnostics reported using SrcMgr diagnostic kind are now handled. Previously these used a separate diag handler.
     * Codegen tests are updated for additional attributes.
     * Some data layouts have changed.
     * Switch `#[used]` attribute from `llvm.used` to `llvm.compiler.used` to avoid SHF_GNU_RETAIN flag introduced in https://reviews.llvm.org/D97448, which appears to trigger a bug in older versions of gold.
     * Set `LLVM_INCLUDE_TESTS=OFF` to avoid Python 3.6 requirement.
    
    Upstream issues:
    
     * ~~https://bugs.llvm.org/show_bug.cgi?id=51210 (InlineAsm diagnostic reporting for module asm)~~ Fixed by https://github.com/llvm/llvm-project/commit/1558bb80c01b695ce12642527cbfccf16cf54ece.
     * ~~https://bugs.llvm.org/show_bug.cgi?id=51476 (Miscompile on AArch64 due to incorrect comparison elimination)~~ Fixed by https://github.com/llvm/llvm-project/commit/81b106584f2baf33e09be2362c35c1bf2f6bfe94.
     * https://bugs.llvm.org/show_bug.cgi?id=51207 (Can't set custom section flags anymore). Problematic change reverted in our fork, https://reviews.llvm.org/D107216 posted for upstream revert.
     * https://bugs.llvm.org/show_bug.cgi?id=51211 (Regression in codegen for #83623). This is an optimization regression that we may likely have to eat for this release. The fix for #83623 was based on an incorrect premise, and this needs to be properly addressed in the MergeICmps pass.
    
    The [compile-time impact](https://perf.rust-lang.org/compare.html?start=ef9549b6c0efb7525c9b012148689c8d070f9bc0&end=0983094463497eec22d550dad25576a894687002) is mixed, but quite positive as LLVM upgrades go.
    
    The LLVM 13 final release is scheduled for Sep 21st. The current nightly is scheduled for stable release on Oct 21st.
    
    r? `@ghost`

 .gitmodules                                        |  2 +-
 compiler/rustc_codegen_llvm/src/back/write.rs      | 43 +---------
 compiler/rustc_codegen_llvm/src/base.rs            |  8 +-
 compiler/rustc_codegen_llvm/src/consts.rs          | 15 +++-
 compiler/rustc_codegen_llvm/src/context.rs         | 55 +++++++++----
 compiler/rustc_codegen_llvm/src/lib.rs             |  2 +-
 compiler/rustc_codegen_llvm/src/llvm/diagnostic.rs | 94 +++++++++++++++++-----
 compiler/rustc_codegen_llvm/src/llvm/ffi.rs        |  7 +-
 compiler/rustc_codegen_ssa/src/traits/misc.rs      |  2 +
 compiler/rustc_codegen_ssa/src/traits/statics.rs   | 18 ++---
 compiler/rustc_feature/src/accepted.rs             |  2 +-
 compiler/rustc_llvm/llvm-wrapper/RustWrapper.cpp   | 20 ++++-
 .../src/spec/powerpc64_unknown_linux_gnu.rs        |  2 +-
 .../src/spec/powerpc64_unknown_linux_musl.rs       |  2 +-
 .../rustc_target/src/spec/powerpc64_wrs_vxworks.rs |  2 +-
 .../src/spec/powerpc64le_unknown_linux_gnu.rs      |  2 +-
 .../src/spec/powerpc64le_unknown_linux_musl.rs     |  2 +-
 .../src/spec/wasm32_unknown_emscripten.rs          |  2 +-
 .../src/spec/wasm32_unknown_unknown.rs             |  2 +-
 compiler/rustc_target/src/spec/wasm32_wasi.rs      |  2 +-
 .../src/spec/wasm64_unknown_unknown.rs             |  2 +-
 src/bootstrap/native.rs                            |  1 +
 src/llvm-project                                   |  2 +-
 src/test/codegen/array-equality.rs                 |  4 +-
 src/test/codegen/issue-83623-SIMD-PartialEq.rs     | 46 -----------
 src/test/codegen/repeat-trusted-len.rs             |  2 +-
 .../run-make-fulldeps/coverage-llvmir/Makefile     |  2 -
 .../coverage-llvmir/filecheck.testprog.txt         | 18 ++---
 src/test/ui/llvm-asm/issue-69092.rs                |  4 +-
 src/test/ui/llvm-asm/issue-69092.stderr            |  4 +-
 30 files changed, 201 insertions(+), 168 deletions(-)
 delete mode 100644 src/test/codegen/issue-83623-SIMD-PartialEq.rs

cc @nikic who did the update

Meta

rustc --version --verbose:

$ rustc --version --verbose
rustc 1.56.0 (09c42c458 2021-10-18)
binary: rustc
commit-hash: 09c42c45858d5f3aedfa670698275303a3d19afa
commit-date: 2021-10-18
host: x86_64-unknown-linux-gnu
release: 1.56.0
LLVM version: 13.0.0
@Fuuzetsu Fuuzetsu added the C-bug Category: This is a bug. label Oct 26, 2021
@bjorn3 bjorn3 added A-reproducibility Area: Reproducible / deterministic builds T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Oct 26, 2021
@yanok
Copy link
Contributor

yanok commented Oct 29, 2021

One thing I found: if I run it with RUSTFLAGS='--emit=llvm-ir' I can't reproduce it anymore.

@rustbot claim

@yanok
Copy link
Contributor

yanok commented Oct 30, 2021

Well, I was wrong. Still reproducible even with RUSTFLAGS='--emit=llvm-ir'. Produced *.ll files are identical though.

@fangism
Copy link

fangism commented Oct 31, 2021

Well, I was wrong. Still reproducible even with RUSTFLAGS='--emit=llvm-ir'. Produced *.ll files are identical though.

Consistent with my findings with --emit=llvm-ir as well.

@yanok
Copy link
Contributor

yanok commented Nov 8, 2021

Update: this IS a LLVM bug, I'm able to reproduce it with just ll input file using mainline LLVM. Filed https://bugs.llvm.org/show_bug.cgi?id=52441 to LLVM.

@yanok
Copy link
Contributor

yanok commented Nov 9, 2021

I have a fix: https://reviews.llvm.org/D113468

Do we want to apply it to llvm-rust or are we going to wait for LLVM review?

yanok added a commit to yanok/rust that referenced this issue Nov 10, 2021
bors added a commit to rust-lang-ci/rust that referenced this issue Nov 10, 2021
Update llvm submodule

This includes debug info generation fix, that fixes rust-lang#90301.

Also includes WASM backend related fix for https://bugs.llvm.org/show_bug.cgi?id=52352
(I haven't found a corresponding Rust bug).
@bors bors closed this as completed in 8128916 Nov 10, 2021
@Fuuzetsu
Copy link
Author

@yanok Good job, thank you!

@glandium
Copy link
Contributor

I wonder if rust needs https://reviews.llvm.org/D108968 too.

@glandium
Copy link
Contributor

I wonder if rust needs https://reviews.llvm.org/D108968 too.

Answering my own question: it does.

@glandium
Copy link
Contributor

@yanok Could you also do a PR for https://reviews.llvm.org/D108968?

@yanok
Copy link
Contributor

yanok commented Nov 16, 2021

Sure, will do.

@yanok
Copy link
Contributor

yanok commented Nov 17, 2021

Created #90978

@yanok
Copy link
Contributor

yanok commented Nov 17, 2021

Actually I was too slow, there is already #90954

@glandium
Copy link
Contributor

FWIW, while taking the 1.56.0 source and applying the 2 LLVM reproducibility patches fixes the issues we have with Firefox, the latest nightly doesn't: there's a remaining reproducibility issue on i686 linux. I'll test a patched beta next.

@cuviper
Copy link
Member

cuviper commented Nov 19, 2021

The next beta release should be fixed, per #90938.
(Should go out near 00:00 UTC -- I guess it will be 1.57.0-beta.4.)

@glandium
Copy link
Contributor

the beta branch + the patches seems to fix it, but I've found something new: depending how llvm is built, the result can vary. Specifically, llvm built with a sysroot exhibits reproducibility issues that are not present when not building with a sysroot. I'll wait for the official build for 1.57.0-beta.4 for a definite answer whether the issue is fixed. One thing is sure, though, there's an additional regression in nightly.

@glandium
Copy link
Contributor

So, as far as Firefox is concerned, everything is fixed in the latest betas. I did find a source of non-determinism that is not fixed, but a) it seems to also affects clang (as in, I also have found one in clang, and I think it's the same root cause) b) it goes away with PGO, so it doesn't actually affect Firefox.

@Fuuzetsu
Copy link
Author

Fuuzetsu commented Dec 1, 2021

@glandium is there a ticket about this one somewhere? It's great that it doesn't impact Firefox but it may be impacting other rust software

@glandium
Copy link
Contributor

glandium commented Dec 2, 2021

I haven't filed it because the smallest reproducer I have at the moment is "build firefox with these flags and observe how sometimes some functions have an extra mov", and from experience, those don't lead to any action.

@Fuuzetsu
Copy link
Author

Fuuzetsu commented Dec 3, 2021

I haven't filed it because the smallest reproducer I have at the moment is "build firefox with these flags and observe how sometimes some functions have an extra mov", and from experience, those don't lead to any action.

I see, thanks. It sounds like binary-reproducability thing and ABI should(?) be the same at least.

And this is a regression in new LLVM version (rustc 1.56 and beyond)?

@glandium
Copy link
Contributor

glandium commented Dec 3, 2021

I don't know if this affects rustc, but I narrowed down yet another source of non-determinism in debuginfo, and upstream came up with a patch: https://reviews.llvm.org/D115054

@Fuuzetsu
Copy link
Author

Fuuzetsu commented Dec 6, 2021

We tried using 1.57.0 and it seems broken still. I'll try to get a repro tomorrow. Should I open a new ticket or will someone reopen this one?

@Fuuzetsu
Copy link
Author

Fuuzetsu commented Dec 7, 2021

We tried using 1.57.0 and it seems broken still. I'll try to get a repro tomorrow. Should I open a new ticket or will someone reopen this one?

Actually, seems it comes from a derive crate in our dependency tree, similar to what we saw in #89904 ; So maybe it's fine.

EDIT: JelteF/derive_more#178

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-LLVM Area: Code generation parts specific to LLVM. Both correctness bugs and optimization-related issues. A-reproducibility Area: Reproducible / deterministic builds C-bug Category: This is a bug. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

7 participants