Using the `backtrace` crate worsens codegen when building with opt-level='s' #111959

nmathewson · 2023-05-25T20:30:19Z

My apologies if this is a known issue. I wanted to see whether the compiler would
optimize the following code in the way that I expected:

// lib.rs
pub struct Data { body: Box<[u8;509]> }

impl Data {
    pub fn some_bytes(&self) -> u16 {
	u16::from_be_bytes(
		self.body[3..5]
			.try_into()
			.expect("math fail")
	)
    }
}

It worked just fine, and compiled it down to a few simple instructions (according to cargo show-asm):

	mov rax, qword ptr [rdi]
	movzx eax, word ptr [rax + 3]
	rol ax, 8
	ret

But when I tried the same code in my own project, it generated a much worse implementation.

After some experimentation, I found that I could reproduce the issue by changing my test code as follows:

// lib.rs

use backtrace::Backtrace; // <--- ADDED THIS LINE

pub struct Data { body: Box<[u8;509]> }

impl Data {
    pub fn some_bytes(&self) -> u16 {
	u16::from_be_bytes(
		self.body[3..5]
			.try_into()
			.expect("math fail")
	)
    }
}

// Cargo.toml
[package]
name = "slice-playground"
version = "0.1.0"
edition = "2021"

[profile.release]
opt-level = 's'

[dependencies]
backtrace = "0.3.67"

With these changes, I got significantly worse code:

	.cfi_startproc
	push rax
	.cfi_def_cfa_offset 16
	mov rdi, qword ptr [rdi]
	add rdi, 3
	mov esi, 2
	call qword ptr [rip + core::array::<impl core::convert::TryFrom<&[T]> for [T; N]>::try_from@GOTPCREL]
	test al, 1
	jne .LBB1_1
	shr eax, 8
	rol ax, 8
	pop rcx
	.cfi_def_cfa_offset 8
	ret

.LBB1_1:
	.cfi_def_cfa_offset 16
	lea rdi, [rip + .L__unnamed_2]
	lea rcx, [rip + .L__unnamed_1]
	lea r8, [rip + .L__unnamed_3]
	mov rdx, rsp
	mov esi, 9
	call qword ptr [rip + core::result::unwrap_failed@GOTPCREL]

	ud2

Note that ALL of the changes are necessary to see the problem: I had to add the backtrace crate, I had to use it in the code, and I had to set opt-level = 's'.

I've reproduced this behavior with the following versions of Rust:

[2223]$ rustc +nightly --version --verbose 
rustc 1.71.0-nightly (c373194cb 2023-05-24)
binary: rustc
commit-hash: c373194cb6d882dc455a588bcc29c92a96b50252
commit-date: 2023-05-24
host: x86_64-unknown-linux-gnu
release: 1.71.0-nightly
LLVM version: 16.0.4
[2224]$ rustc --version --verbose
rustc 1.69.0 (84c898d65 2023-04-16)
binary: rustc
commit-hash: 84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc
commit-date: 2023-04-16
host: x86_64-unknown-linux-gnu
release: 1.69.0
LLVM version: 15.0.7

Thanks for taking a look at this!

The text was updated successfully, but these errors were encountered:

nmathewson · 2023-05-26T20:15:44Z

I've done some further investigation, trying to track this down. I'm not much practiced at compiler development, so this information might not actually be helpful. Here's what I found:

While trying to minimize the issue further, I have found that in place of using backtrace, the bad code generation issue also occurs if I say use miniz_oxide.
I can make the code generation issue go away again if instead I use a copy of miniz_oxide with the body of the function inflate::core::decompress commented out. (This function is very large.)
When I use cargo show-asm to look at the MIR of compiling some_bytes in both cases, the MIR outputs are identical.
When I use cargo show-asm to look at the "llvm-input" of some_bytes both cases, there is only one line difference:

-  %_9 = load ptr, ptr %self, align 8, !nonnull !2, !align !4, !noundef !2
+  %_9 = load ptr, ptr %self, align 8, !nonnull !2, !align !3, !noundef !2

I have no idea why the presence of miniz_oxide's decompress function would change the alignment on anything in the example some_bytes function, or why it would influence the behavior of LLVM passes on other code.

I hope this information is helpful somehow, or at least interesting.

nikic · 2023-05-26T20:21:17Z

Does -Z share-generics=off fix this?

nmathewson · 2023-05-26T21:38:40Z

Does -Z share-generics=off fix this?

It does!

…homcc Add #[inline] to array TryFrom impls I was looking into rust-lang#111959 and I realized we don't have these. They seem like an uncontroversial addition. IMO this PR does not fix that issue. I think the bad codegen is being caused by some underlying deeper problem but this change might cause the MIR inliner to paper over it in this specific case. r? `@thomcc`

Add #[inline] to array TryFrom impls I was looking into rust-lang/rust#111959 and I realized we don't have these. They seem like an uncontroversial addition. IMO this PR does not fix that issue. I think the bad codegen is being caused by some underlying deeper problem but this change might cause the MIR inliner to paper over it in this specific case. r? `@thomcc`

nmathewson added the C-bug Category: This is a bug. label May 25, 2023

jyn514 added the I-slow Issue: Problems and improvements with respect to performance of generated code. label May 25, 2023

saethlin mentioned this issue May 25, 2023

Add #[inline] to array TryFrom impls #111966

Merged

Dylan-DPC added the A-codegen Area: Code generation label May 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using the `backtrace` crate worsens codegen when building with opt-level='s' #111959

Using the `backtrace` crate worsens codegen when building with opt-level='s' #111959

nmathewson commented May 25, 2023

nmathewson commented May 26, 2023

nikic commented May 26, 2023

nmathewson commented May 26, 2023

Using the backtrace crate worsens codegen when building with opt-level='s' #111959

Using the backtrace crate worsens codegen when building with opt-level='s' #111959

Comments

nmathewson commented May 25, 2023

nmathewson commented May 26, 2023

nikic commented May 26, 2023

nmathewson commented May 26, 2023

Using the `backtrace` crate worsens codegen when building with opt-level='s' #111959

Using the `backtrace` crate worsens codegen when building with opt-level='s' #111959