-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
aarch64 musl binaries panic since 2018-02-05 nightly #48967
Comments
I see the libc repo had a similar-sounding issue, but it was back in November 2017, and Rust builds have been working for me until February 5 2018. It may have prevented more visibility, though. See: rust-lang/libc#856 and rust-lang/libc@bea4879eec9a1 |
A colleague found that dynamically linking libgcc (and libc) using "target-feature=-crt-static" works around this issue. |
I wonder if it's related to #46566 |
If indeed it worked in 2018-02-04 and failed in 2018-02-05, then the suspicious commit range is 3d292b7 .. 0c6091f (commits), presuming I am interpreting this rustup output correctly > rustup install nightly-2018-02-04
info: syncing channel updates for 'nightly-2018-02-04-x86_64-unknown-linux-gnu'
nightly-2018-02-04-x86_64-unknown-linux-gnu unchanged - rustc 1.25.0-nightly (3d292b793 2018-02-03)
> rustup install nightly-2018-02-05
info: syncing channel updates for 'nightly-2018-02-05-x86_64-unknown-linux-gnu'
nightly-2018-02-05-x86_64-unknown-linux-gnu unchanged - rustc 1.25.0-nightly (0c6091fbd 2018-02-04) |
@tjkirch can you validate the commits from the two nightlies? That commit range looks a bit suspicious. Just include the |
triage: P-high We should figure out what is happening here. |
@nikomatsakis Sure! Here are the outputs. Working 2018-02-04 nightly:
Non-working 2018-02-05 nightly:
I also bisected with the beta releases using rustup and found that 2018-02-13 (1.24.0-beta.12) works, but the next beta 2018-02-20 (1.25.0-beta.2) crashes. Last working beta:
First crashing beta:
I also confirmed that the just-released 1.25.0-beta.10 and nightly-2018-03-14 still crash. My process is basically to |
@tjkirch Could you check which one of the three PRs are the cause?
|
Thanks! Looks like another trusting-trust issue then 🤷. |
I tried with "-Z", "thinlto=no" and with "-Z", "thinlto=yes" in my ~/.cargo/config's rustflags, but it crashed either way. |
Fascinating. That was our guess from the compiler team meeting, though it seemed unlikely as that PR ought to improve reliability in general. |
I could use any suggestions for how to reproduce this problem =) Some have suggested qemu? That said, I'm not sure where to start debugging this. Seems...likely, possible?...to be an LLVM problem? I'm sort of hoping that one of the LLVM upgrades will make it go away. =) In any case, reproducing it would be a start. |
Yeah, I think qemu-aarch64 is the way to go, and perhaps the "bleeding edge" aarch64/musl toolchain from https://toolchains.bootlin.com/ -- I'll see if I can build a repro using those tools. |
@nikomatsakis Here's how I reproduced from scratch:
|
This appears to be fixed again on nightly-2018-03-16, I guess because of #48892? |
I can confirm it started working for me as well in the latest nightly! This is with nightly-2018-03-17; nightly-2018-03-16 still crashed. To be specific: Crashes:
Works!
I'll work on bisecting to pinpoint the merge that fixed it. Don't want this to reoccur! |
Unfortunately rustup-toolchain-install-master wasn't able to fetch the artifacts for all the intervening commits; I'm guessing the artifacts either aren't uploaded for every build, they're just not uploaded yet, or some commits were built together. Anyway, it seems to work with all of these commits, but I'm not confident without being able to test all of them and confirm the negative case. |
So it seems clear that I am not the best person to investigate this, and I've also got a few other things to look into right now, so I don't really have time. I'm not sure who is, but we've got a lot of data collected. So the plan is that I (or someone else) will summarize the current state:
and we'll put out a call and see if anyone can help us narrow down the problem. I suspect some kind of LLVM bug here still and it'd be nice to know what to pin it on. |
Looking at the backtrace, it seems that this issue is related to thread-local storage. It seems that this function is causing a segfault:
I will have a look at the disassembly & relocations generated for that TLS access, I think something strange might be happening at LTO/link time. |
OK, so I think that I've found the source of the bug:
For some reason, the addresses of all TLS variables are offset by an additional 0x10. This behavior happens in nightly-2018-02-05 (broken) but not in nightly-2018-02-04 (good). I think this may have gone unnoticed in the past since all TLS was shifted by 0x10, and the TLS was zero-initialized. In this specific case, one of the bytes of the TLS data has an initial value of 0x3, but due to the 0x10 shift it is accessed at the wrong offset by the program. |
Now I'm completely stumped as to what actually caused this bug. It seems like a bug in LLVM rather than rustc, possibly related to LTO since the linker is getting confused about TLS offsets. |
visited for triage. It seems we haven't made progress since the last report. I am wondering whether we can enlist someone to act as a local "LLVM LTO bug identification" expert... |
@Amanieu can we confirm that it is an LTO problem? |
To me it looks like a linker bug: the TLS relocations are being resolved to the wrong value. Since it is very unlikely that the linker has been broken this whole time, I would blame it on LTO somehow interfering with the linker. |
@Amanieu -- question: do you think you can narrow this down to just LLVM IR inputs that reflect the error, so we can open a bug on the LLVM side? |
triage: P-medium Next steps are to diagnose the LLVM problem. Filing under #50422. |
I have a minimal reproduction: #![feature(libc, thread_local, asm)]
#![no_main]
extern crate libc;
#[thread_local]
static mut ASDF: u8 = 74;
#[inline(never)]
fn get_tls_val() -> i32 {
// The asm here is just to prevent the TLS access from being optimized away
unsafe {
let out: &u8;
asm!("" : "=r" (out) : "0" (&ASDF));
*out as i32
}
}
#[no_mangle]
pub unsafe extern fn main() -> i32 {
let val = get_tls_val();
libc::printf(b"%d\n\0".as_ptr(), val);
// UNCOMMENT THIS LINE TO TRIGGER THE BUG
//std::thread::sleep_ms(1);
0
} The bug only seems to trigger when libstd is linked into the final binary. The expected output is Bad version:
Good version:
|
Switching the linker between bfd, gold and lld doesn't seem to make any difference. |
There's a TLS-related fix in musl that applies to aarch64 and some other architectures; it will probably be in the 1.1.20 release. I wonder if it helps with this! https://git.musl-libc.org/cgit/musl/commit/?id=610c5a8524c3d6cd3ac5a5f1231422e7648a3791 |
That might very well be the solution! I noticed that the So basically, my earlier hypothesis is incorrect: the compiler/linker are calculating the TLS offsets correctly, it's just that musl isn't handling over-aligned TLS sections correctly. |
When new nightly build expected to appear with this fix? I'm trying to use |
@Amanieu - I've confirmed that your minimal reproduction prints |
I have build my project with rust compiled from sources with fresh musl - and can confirm it works. |
Are you cross compiling? I'm trying cross compile to test the tls fix but I get the error from #46651 and rust-lang/compiler-builtins#201 |
Use this command as a workaround:
|
Update musl to 1.1.19 and add patch to fix tls issue This fixes #48967
Update musl to 1.1.19 and add patch to fix tls issue This fixes rust-lang#48967
aarch64-unknown-linux-musl binaries crash immediately when built using Rust nightly since 2018-02-05, including the current beta, 1.25.0-beta.9.
This happens in debug and release builds.
It works fine with 2018-02-04, or with stable Rust 1.24.1. This is building on an x86_64-unknown-linux-gnu host, which shows no errors or warnings, and running on an embedded Linux 4.9 device.
I tried this code:
A fresh
cargo new --bin testme
. (Same thing with a completely empty "main {}" function.)I expected to see this happen:
The binary should run without panicking. With nightly 2018-02-04, it looks like this, through strace:
Instead, with 2018-02-05+, this happened:
With strace:
Here's the backtrace from the core, using gdb 7.1.2:
Meta
rustc --version --verbose
:Any nightly Rust since 2018-02-05, installed through rustup. Example:
My ~/.cargo/config:
aarch64-unknown-linux-musl-gcc
is from GCC 7.2.0 via Buildroot 2017.08.Note: I've tried adding
-C llvm-args=-fast-isel
per #48673 but it made no difference.The text was updated successfully, but these errors were encountered: