Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using the backtrace crate worsens codegen when building with opt-level='s' #111959

Open
nmathewson opened this issue May 25, 2023 · 3 comments
Open
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.

Comments

@nmathewson
Copy link
Contributor

My apologies if this is a known issue. I wanted to see whether the compiler would
optimize the following code in the way that I expected:

// lib.rs
pub struct Data { body: Box<[u8;509]> }

impl Data {
    pub fn some_bytes(&self) -> u16 {
	u16::from_be_bytes(
		self.body[3..5]
			.try_into()
			.expect("math fail")
	)
    }
}

It worked just fine, and compiled it down to a few simple instructions (according to cargo show-asm):

	mov rax, qword ptr [rdi]
	movzx eax, word ptr [rax + 3]
	rol ax, 8
	ret

But when I tried the same code in my own project, it generated a much worse implementation.

After some experimentation, I found that I could reproduce the issue by changing my test code as follows:

// lib.rs

use backtrace::Backtrace; // <--- ADDED THIS LINE

pub struct Data { body: Box<[u8;509]> }

impl Data {
    pub fn some_bytes(&self) -> u16 {
	u16::from_be_bytes(
		self.body[3..5]
			.try_into()
			.expect("math fail")
	)
    }
}
// Cargo.toml
[package]
name = "slice-playground"
version = "0.1.0"
edition = "2021"

[profile.release]
opt-level = 's'

[dependencies]
backtrace = "0.3.67"

With these changes, I got significantly worse code:

	.cfi_startproc
	push rax
	.cfi_def_cfa_offset 16
	mov rdi, qword ptr [rdi]
	add rdi, 3
	mov esi, 2
	call qword ptr [rip + core::array::<impl core::convert::TryFrom<&[T]> for [T; N]>::try_from@GOTPCREL]
	test al, 1
	jne .LBB1_1
	shr eax, 8
	rol ax, 8
	pop rcx
	.cfi_def_cfa_offset 8
	ret

.LBB1_1:
	.cfi_def_cfa_offset 16
	lea rdi, [rip + .L__unnamed_2]
	lea rcx, [rip + .L__unnamed_1]
	lea r8, [rip + .L__unnamed_3]
	mov rdx, rsp
	mov esi, 9
	call qword ptr [rip + core::result::unwrap_failed@GOTPCREL]

	ud2

Note that ALL of the changes are necessary to see the problem: I had to add the backtrace crate, I had to use it in the code, and I had to set opt-level = 's'.

I've reproduced this behavior with the following versions of Rust:

[2223]$ rustc +nightly --version --verbose 
rustc 1.71.0-nightly (c373194cb 2023-05-24)
binary: rustc
commit-hash: c373194cb6d882dc455a588bcc29c92a96b50252
commit-date: 2023-05-24
host: x86_64-unknown-linux-gnu
release: 1.71.0-nightly
LLVM version: 16.0.4
[2224]$ rustc --version --verbose
rustc 1.69.0 (84c898d65 2023-04-16)
binary: rustc
commit-hash: 84c898d65adf2f39a5a98507f1fe0ce10a2b8dbc
commit-date: 2023-04-16
host: x86_64-unknown-linux-gnu
release: 1.69.0
LLVM version: 15.0.7

Thanks for taking a look at this!

@nmathewson nmathewson added the C-bug Category: This is a bug. label May 25, 2023
@jyn514 jyn514 added the I-slow Issue: Problems and improvements with respect to performance of generated code. label May 25, 2023
@Dylan-DPC Dylan-DPC added the A-codegen Area: Code generation label May 26, 2023
@nmathewson
Copy link
Contributor Author

I've done some further investigation, trying to track this down. I'm not much practiced at compiler development, so this information might not actually be helpful. Here's what I found:

  • While trying to minimize the issue further, I have found that in place of using backtrace, the bad code generation issue also occurs if I say use miniz_oxide.
  • I can make the code generation issue go away again if instead I use a copy of miniz_oxide with the body of the function inflate::core::decompress commented out. (This function is very large.)
  • When I use cargo show-asm to look at the MIR of compiling some_bytes in both cases, the MIR outputs are identical.
  • When I use cargo show-asm to look at the "llvm-input" of some_bytes both cases, there is only one line difference:
-  %_9 = load ptr, ptr %self, align 8, !nonnull !2, !align !4, !noundef !2
+  %_9 = load ptr, ptr %self, align 8, !nonnull !2, !align !3, !noundef !2

I have no idea why the presence of miniz_oxide's decompress function would change the alignment on anything in the example some_bytes function, or why it would influence the behavior of LLVM passes on other code.

I hope this information is helpful somehow, or at least interesting.

@nikic
Copy link
Contributor

nikic commented May 26, 2023

Does -Z share-generics=off fix this?

@nmathewson
Copy link
Contributor Author

Does -Z share-generics=off fix this?

It does!

GuillaumeGomez added a commit to GuillaumeGomez/rust that referenced this issue May 27, 2023
…homcc

Add #[inline] to array TryFrom impls

I was looking into rust-lang#111959 and I realized we don't have these. They seem like an uncontroversial addition.

IMO this PR does not fix that issue. I think the bad codegen is being caused by some underlying deeper problem but this change might cause the MIR inliner to paper over it in this specific case.

r? `@thomcc`
thomcc pushed a commit to tcdi/postgrestd that referenced this issue Jul 18, 2023
Add #[inline] to array TryFrom impls

I was looking into rust-lang/rust#111959 and I realized we don't have these. They seem like an uncontroversial addition.

IMO this PR does not fix that issue. I think the bad codegen is being caused by some underlying deeper problem but this change might cause the MIR inliner to paper over it in this specific case.

r? `@thomcc`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-codegen Area: Code generation C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code.
Projects
None yet
Development

No branches or pull requests

4 participants