-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Miscompilation of Bevy (and some wgpu) apps resulting in segfault on macOS. #117902
Comments
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
This comment was marked as off-topic.
It would be useful if you could use |
I was not able to reproduce this on Windows. Marking as O-MacOS, but I suspect that the problem is platform-independent and just only manifests in platform-specific bevy code. |
OK. TIL about Edit: I created a simple bevy project that should automatically exit after the first frame is rendered to the screen. This should be scriptable. It will exit successfully on older rustc, or segfault on new rustc. Here is the bevy code if anyone else wants to try it: use bevy::prelude::*;
fn main() {
App::new()
.add_plugins(DefaultPlugins)
.add_systems(Update, exit)
.run();
}
fn exit(mut evw: EventWriter<bevy::app::AppExit>) {
evw.send(bevy::app::AppExit);
} |
I get
On MacOS 14.0 |
Bisected down to: |
oh well, that is sadly not a very useful bisection. it enabled more inlining, causing more optimizations to be applied overall. @saethlin anyways cause you may be interested in this, but I can't imagine that your PR is at fault |
That makes me wonder if Bevy is doing this thing that Embassy is (was?) doing: #117047 i.e. if you are relying on the address of an empty function being the same every time it is inspected, it must be one of |
@Nilstrieb Just tested it. LTO does not cause a segfault in earlier versions, and does cause a segfault in new versions. It does not seem to affect the results. Also, like I said, setting The issue occurs only with optimized builds with >1 codegen units, regardless of LTO. |
Ran in lldb:
Seems to be an invalid memory access in Included a backtrace, and disassembly of the function, if that helps. |
Bevy does something similar for its |
It looks like disabling cross-crate inlining "solves" the issue, i.e.:
|
WG-prioritization assigning priority (Zulip discussion). @rustbot label -I-prioritize +P-high |
|
Oh ... great ... so this is a miscompilation bug that also exists in stable, which could have been triggered if some specific code was inlined. Now rustc just inlines it automatically. So it isn't a codegen regression, but an old/preexisting bug exposed by new rustc enabling more aggressive optimizations. Damn. I'm so glad I reported this. Never could have imagined that it would go so deep... |
I am not yet convinced that this is a miscompilation as opposed to UB that is now exploited. |
These 2 functions are the closest to the segfault in the backtrace, so the problem surely lies around there. We could likely extract them both and hopefully repro outside of wgpu. |
I have not managed to separate the strangely-behaving code out from wgpu, but I have learned some things.
Which is exactly what you'd expect. But if I remove the attribute and leave cross-crate-inlining enabled, or add
So LLVM has done some kind of cross-function optimization here. The IR for the function bodies looks exactly the same to me, except for the fact that one does two gepis and the other just uses its arguments. And of course they are lowered to slightly different assembly. Here is the
And the
Directly after both of these is a small jump table. But I think what's down there is irrelevant, because in the So I think whatever is going on here relies on lldb has been supremely anti-helpful. It says breakpoints are set then blows past them (realizing this was happening took me a long time), and if I do too much instruction stepping it segfaults. |
I think what's happening with the signature is argument promotion, where it replaces an indirect argument with a direct one. |
I've removed 99% of
It's indeed way harder than one would hope. It's quite fiddly, maybe there's a magic value of the inlining threshold for different shapes, combined with various inline never/always attributes, but I didn't manage to easily merge more of the crates together without making the issue disappear; there are still 2 There are still a bunch of backends/API/etc wgpu abstractions in the repro so it can still be reduced more, but I'm not sure I'll have much more time to dedicate to this. At the very least, there are still two dependencies out of three that shouldn't be hard to remove ( I still don't know if this is a miscompile or UB (maybe slightly leaning towards the former):
|
I have some ideas on how to shrink this further and I'm starting to make progress... |
My latest progress brings the line count down to 221 lines, and there is precious little logic left, it's mostly bitflags and enum definitions. It's in a PR to the above repo. It got way more consistent when I found the path to remove the locks. I'm sure the remaining bits of At this point I'm quite sure this is a miscompile. There's no unsafe code left in the repo, and Miri doesn't complain. I've tried running this on aarch64-unknown-linux-gnu and x86_64-unknown-linux-gnu, and neither segfault. I doubt this is an OS issue, perhaps there is something different between the default aarch64-apple-darwin target and the aarch64-unknown-linux-gnu target that would be enlightening. |
Also somehow the MIR inliner is still required.
Though it's not clear if that's because the bug is in the MIR inliner or if only post-inlining do we generate the exact LLVM IR pattern that gets miscompiled. |
The repro is now at around 150 lines, with 1 bin and 1 lib. It's minimal enough e.g. for a test or LLVM bug report.
@saethlin Doesn't In any case, the LLVM optimization pass that seems to start introducing the segfault is cc @DianQK |
Yes that should turn off the MIR inliner. I was aiming for a reproducer that has |
Maybe the smaller repro behaves slightly differently now, |
Nice reproduction. But I can't reproduce it on macOS x86_64. Purchasing M1. |
Upstream issue: llvm/llvm-project#74680. Base on https://github.com/lqd/repro-117902, I got fn main() {
call_from_main();
}
#[inline(never)]
#[no_mangle]
fn call_from_main() {
let desc = TextureDescriptor(TextureFormat::Bc6hRgbUfloat);
device_create_texture(desc);
}
#[inline(never)]
#[no_mangle]
pub fn device_create_texture(desc: TextureDescriptor) {
let format = desc.0;
format.required_features().used();
format.used();
}
enum Feature {
A,
B,
}
impl Feature {
#[inline(never)]
fn used(self) {
core::hint::black_box(self);
}
}
#[repr(C)]
pub enum AstcBlock {
B12x12,
}
pub enum AstcChannel {
Unorm,
Hdr,
}
#[repr(C)]
pub enum TextureFormat {
Rgba8UnormSrgb,
Bc6hRgbUfloat,
Key,
Astc {
channel: AstcChannel,
block: AstcBlock,
},
}
impl TextureFormat {
#[inline(never)]
fn used(&self) {
core::hint::black_box(self);
}
#[inline(never)]
// #[no_mangle]
fn required_features(&self) -> Feature {
match self {
Self::Rgba8UnormSrgb => Feature::A,
Self::Bc6hRgbUfloat => Feature::B,
Self::Key => Feature::B,
Self::Astc { channel, .. } => match channel {
AstcChannel::Hdr => Feature::A,
AstcChannel::Unorm => Feature::B,
},
}
}
}
pub struct TextureDescriptor(pub TextureFormat); |
Hmm, I can only use nightly to reproduce, and I can't use local stage1/stage2 to reproduce. This could be a difference in build configuration. So I think this minimal reproduction is unstable. If we want to use this as a test case, I would expect a more stable test case. I added a more detailed explanation of this issue at llvm/llvm-project#74682 (comment). We should be able to create a must-be-dirty stack to reproduce. |
I got a "perfect" test case. I can reproduce it with nightly, 1.74.1 and even 1.69.0. fn main() {
let desc = TextureDescriptor(TextureFormat::A, 1, 1);
device_create_texture(desc);
}
#[inline(never)]
#[no_mangle]
pub fn device_create_texture(desc: TextureDescriptor) {
let format = desc.0;
assert_eq!(format.required_features(desc.1), Feature::A);
}
#[derive(PartialEq, Eq, Debug)]
enum Feature {
A,
B,
}
#[repr(i32)]
pub enum TextureFormat {
A,
B,
C,
D,
}
impl TextureFormat {
#[inline(never)]
fn required_features(&self, v: i8) -> Feature {
match self {
Self::A => Feature::A,
Self::B => Feature::B,
Self::C => Feature::B,
Self::D => match v {
0 => Feature::A,
_ => Feature::B,
},
}
}
}
pub struct TextureDescriptor(pub TextureFormat, i8, i16); By examining the generated instructions, the following instruction sets mov x0, #4294967296
movk x0, #1, lsl #48 godbolt: https://rust.godbolt.org/z/vnnvEszoP |
Reopening to track beta backport in #118994. |
Recent rustc seems to miscompile something somewhere in the dependency tree of Bevy, causing apps to crash with a segfault.
The issue seems to only manifest when optimizations are enabled and codegen units are > 1. The default cargo release configuration triggers it. Unoptimized builds are fine. Setting
codegen-units = 1
withopt-level = 3
also seems to be fine.I am on macOS 14.1 Sonoma, on Apple M1 Pro hardware.
Code
You can reproduce with any Bevy application. It is a huge code base, I know, I'm sorry I can't find a more minimal example.
Clone the Bevy repo: https://github.com/bevyengine/bevy
all these revs have the issue:
Run some example in release mode:
Version it worked on
The previous 1.74.0-beta.7 did not have the issue. I am not sure which nightly introduced the regression.
Version with regression
I first noticed the issue when I installed the 2023-11-11 nightly, though it might also be present in earlier nightlies. I just installed 1.75.0-beta.1 and updated my nightly, and can confirm the issue is present there, too.
The text was updated successfully, but these errors were encountered: