-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize layout of function arguments in the Rust ABI - take 2 #97559
Closed
+347
−16
Closed
Changes from all commits
Commits
Show all changes
12 commits
Select commit
Hold shift + click to select a range
374bc27
Simplify the code of fixup by making it's code flow more natural
Urgau bf97e79
Don't aggregate homogeneous floats in the Rust ABI
Urgau dcc75bf
Test that target feature mix up with homogeneous floats is sound
Urgau ec16a32
Fix some codegen tests
Urgau 9ed05ed
Use simpler heuristic for determining if a layout only floats
Urgau f1c72be
Use nbdd0121 suggestion for reducing the perf impact
Urgau 1be1d4a
Remove undefined unlikely! macro
Urgau b2fba9a
Revert "Use nbdd0121 suggestion for reducing the perf impact"
Urgau a84f4c9
Let LLVM also handle small aggregate
Urgau 0c1451c
Revert "Revert "Use nbdd0121 suggestion for reducing the perf impact""
Urgau c7e8880
Retry with the homogeneous aggregate concept
Urgau 683e13f
Revert max_by_val_size * 2
Urgau File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,48 @@ | ||
// assembly-output: emit-asm | ||
// needs-llvm-components: x86 | ||
// compile-flags: --target x86_64-unknown-linux-gnu | ||
// compile-flags: -C llvm-args=--x86-asm-syntax=intel | ||
// compile-flags: -C opt-level=3 | ||
|
||
#![crate_type = "rlib"] | ||
#![no_std] | ||
|
||
// CHECK-LABEL: sum_f32: | ||
// CHECK: addss xmm0, xmm1 | ||
// CHECK-NEXT: ret | ||
#[no_mangle] | ||
pub fn sum_f32(a: f32, b: f32) -> f32 { | ||
a + b | ||
} | ||
|
||
// CHECK-LABEL: sum_f64x2: | ||
// CHECK: mov rax, [[PTR_IN:.*]] | ||
// CHECK-NEXT: movupd [[XMMA:xmm[0-9]]], xmmword ptr [rsi] | ||
// CHECK-NEXT: movupd [[XMMB:xmm[0-9]]], xmmword ptr [rdx] | ||
// CHECK-NEXT: addpd [[XMMB]], [[XMMA]] | ||
// CHECK-NEXT: movupd xmmword ptr {{\[}}[[PTR_IN]]{{\]}}, [[XMMB]] | ||
// CHECK-NEXT: ret | ||
#[no_mangle] | ||
pub fn sum_f64x2(a: [f64; 2], b: [f64; 2]) -> [f64; 2] { | ||
[ | ||
a[0] + b[0], | ||
a[1] + b[1], | ||
] | ||
} | ||
|
||
// CHECK-LABEL: sum_f32x4: | ||
// CHECK: mov rax, [[PTR_IN:.*]] | ||
// CHECK-NEXT: movups [[XMMA:xmm[0-9]]], xmmword ptr [rsi] | ||
// CHECK-NEXT: movups [[XMMB:xmm[0-9]]], xmmword ptr [rdx] | ||
// CHECK-NEXT: addps [[XMMB]], [[XMMA]] | ||
// CHECK-NEXT: movups xmmword ptr {{\[}}[[PTR_IN]]{{\]}}, [[XMMB]] | ||
// CHECK-NEXT: ret | ||
#[no_mangle] | ||
pub fn sum_f32x4(a: [f32; 4], b: [f32; 4]) -> [f32; 4] { | ||
[ | ||
a[0] + b[0], | ||
a[1] + b[1], | ||
a[2] + b[2], | ||
a[3] + b[3], | ||
] | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
//! Check that small (less then 128bits on x86_64) homogeneous floats are either pass as an array | ||
//! or by a pointer | ||
|
||
// compile-flags: -C no-prepopulate-passes -O | ||
// only-x86_64 | ||
|
||
#![crate_type = "lib"] | ||
|
||
pub struct Foo { | ||
bar1: f32, | ||
bar2: f32, | ||
bar3: f32, | ||
bar4: f32, | ||
} | ||
|
||
// CHECK: define [2 x float] @array_f32x2([2 x float] %0, [2 x float] %1) | ||
#[no_mangle] | ||
pub fn array_f32x2(a: [f32; 2], b: [f32; 2]) -> [f32; 2] { | ||
todo!() | ||
} | ||
|
||
// CHECK: define void @array_f32x4([4 x float]* {{.*}} sret([4 x float]) {{.*}} %0, [4 x float]* {{.*}} %a, [4 x float]* {{.*}} %b) | ||
#[no_mangle] | ||
pub fn array_f32x4(a: [f32; 4], b: [f32; 4]) -> [f32; 4] { | ||
todo!() | ||
} | ||
|
||
// CHECK: define void @array_f32x4_nested(%Foo* {{.*}} sret(%Foo) {{.*}} %0, %Foo* {{.*}} %a, %Foo* {{.*}} %b) | ||
#[no_mangle] | ||
pub fn array_f32x4_nested(a: Foo, b: Foo) -> Foo { | ||
todo!() | ||
} |
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
184 changes: 184 additions & 0 deletions
184
src/test/ui/abi/homogenous-floats-target-feature-mixup.rs
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,184 @@ | ||
// This test check that even if we mixup target feature of function with homogenous floats, | ||
// the abi is sound and still produce the right answer. | ||
// | ||
// This is basically the same test as src/test/ui/simd/target-feature-mixup.rs but for floats and | ||
// without #[repr(simd)] | ||
|
||
// run-pass | ||
// ignore-emscripten | ||
// ignore-sgx no processes | ||
|
||
#![feature(target_feature, cfg_target_feature)] | ||
#![feature(avx512_target_feature)] | ||
|
||
#![allow(overflowing_literals)] | ||
#![allow(unused_variables)] | ||
#![allow(stable_features)] | ||
|
||
use std::process::{Command, ExitStatus}; | ||
use std::env; | ||
|
||
fn main() { | ||
if let Some(level) = env::args().nth(1) { | ||
return test::main(&level) | ||
} | ||
|
||
let me = env::current_exe().unwrap(); | ||
for level in ["sse", "avx", "avx512"].iter() { | ||
let status = Command::new(&me).arg(level).status().unwrap(); | ||
if status.success() { | ||
println!("success with {}", level); | ||
continue | ||
} | ||
|
||
// We don't actually know if our computer has the requisite target features | ||
// for the test below. Testing for that will get added to libstd later so | ||
// for now just assume sigill means this is a machine that can't run this test. | ||
if is_sigill(status) { | ||
println!("sigill with {}, assuming spurious", level); | ||
continue | ||
} | ||
panic!("invalid status at {}: {}", level, status); | ||
} | ||
} | ||
|
||
#[cfg(unix)] | ||
fn is_sigill(status: ExitStatus) -> bool { | ||
use std::os::unix::prelude::*; | ||
status.signal() == Some(4) | ||
} | ||
|
||
#[cfg(windows)] | ||
fn is_sigill(status: ExitStatus) -> bool { | ||
status.code() == Some(0xc000001d) | ||
} | ||
|
||
#[cfg(any(target_arch = "x86", target_arch = "x86_64"))] | ||
#[allow(nonstandard_style)] | ||
mod test { | ||
#[derive(PartialEq, Debug, Clone, Copy)] | ||
struct f32x2(f32, f32); | ||
|
||
#[derive(PartialEq, Debug, Clone, Copy)] | ||
struct f32x4(f32, f32, f32, f32); | ||
|
||
#[derive(PartialEq, Debug, Clone, Copy)] | ||
struct f32x8(f32, f32, f32, f32, f32, f32, f32, f32); | ||
|
||
pub fn main(level: &str) { | ||
unsafe { | ||
main_normal(level); | ||
main_sse(level); | ||
if level == "sse" { | ||
return | ||
} | ||
main_avx(level); | ||
if level == "avx" { | ||
return | ||
} | ||
main_avx512(level); | ||
} | ||
} | ||
|
||
macro_rules! mains { | ||
($( | ||
$(#[$attr:meta])* | ||
unsafe fn $main:ident(level: &str) { | ||
... | ||
} | ||
)*) => ($( | ||
$(#[$attr])* | ||
unsafe fn $main(level: &str) { | ||
let m128 = f32x2(1., 2.); | ||
let m256 = f32x4(3., 4., 5., 6.); | ||
let m512 = f32x8(7., 8., 9., 10., 11., 12., 13., 14.); | ||
assert_eq!(id_sse_128(m128), m128); | ||
assert_eq!(id_sse_256(m256), m256); | ||
assert_eq!(id_sse_512(m512), m512); | ||
|
||
if level == "sse" { | ||
return | ||
} | ||
assert_eq!(id_avx_128(m128), m128); | ||
assert_eq!(id_avx_256(m256), m256); | ||
assert_eq!(id_avx_512(m512), m512); | ||
|
||
if level == "avx" { | ||
return | ||
} | ||
assert_eq!(id_avx512_128(m128), m128); | ||
assert_eq!(id_avx512_256(m256), m256); | ||
assert_eq!(id_avx512_512(m512), m512); | ||
} | ||
)*) | ||
} | ||
|
||
mains! { | ||
unsafe fn main_normal(level: &str) { ... } | ||
#[target_feature(enable = "sse2")] | ||
unsafe fn main_sse(level: &str) { ... } | ||
#[target_feature(enable = "avx")] | ||
unsafe fn main_avx(level: &str) { ... } | ||
#[target_feature(enable = "avx512bw")] | ||
unsafe fn main_avx512(level: &str) { ... } | ||
} | ||
|
||
#[target_feature(enable = "sse2")] | ||
unsafe fn id_sse_128(a: f32x2) -> f32x2 { | ||
assert_eq!(a, f32x2(1., 2.)); | ||
a.clone() | ||
} | ||
|
||
#[target_feature(enable = "sse2")] | ||
unsafe fn id_sse_256(a: f32x4) -> f32x4 { | ||
assert_eq!(a, f32x4(3., 4., 5., 6.)); | ||
a.clone() | ||
} | ||
|
||
#[target_feature(enable = "sse2")] | ||
unsafe fn id_sse_512(a: f32x8) -> f32x8 { | ||
assert_eq!(a, f32x8(7., 8., 9., 10., 11., 12., 13., 14.)); | ||
a.clone() | ||
} | ||
|
||
#[target_feature(enable = "avx")] | ||
unsafe fn id_avx_128(a: f32x2) -> f32x2 { | ||
assert_eq!(a, f32x2(1., 2.)); | ||
a.clone() | ||
} | ||
|
||
#[target_feature(enable = "avx")] | ||
unsafe fn id_avx_256(a: f32x4) -> f32x4 { | ||
assert_eq!(a, f32x4(3., 4., 5., 6.)); | ||
a.clone() | ||
} | ||
|
||
#[target_feature(enable = "avx")] | ||
unsafe fn id_avx_512(a: f32x8) -> f32x8 { | ||
assert_eq!(a, f32x8(7., 8., 9., 10., 11., 12., 13., 14.)); | ||
a.clone() | ||
} | ||
|
||
#[target_feature(enable = "avx512bw")] | ||
unsafe fn id_avx512_128(a: f32x2) -> f32x2 { | ||
assert_eq!(a, f32x2(1., 2.)); | ||
a.clone() | ||
} | ||
|
||
#[target_feature(enable = "avx512bw")] | ||
unsafe fn id_avx512_256(a: f32x4) -> f32x4 { | ||
assert_eq!(a, f32x4(3., 4., 5., 6.)); | ||
a.clone() | ||
} | ||
|
||
#[target_feature(enable = "avx512bw")] | ||
unsafe fn id_avx512_512(a: f32x8) -> f32x8 { | ||
assert_eq!(a, f32x8(7., 8., 9., 10., 11., 12., 13., 14.)); | ||
a.clone() | ||
} | ||
} | ||
|
||
#[cfg(not(any(target_arch = "x86", target_arch = "x86_64")))] | ||
mod test { | ||
pub fn main(level: &str) {} | ||
} |
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The old code used
size > ptr_size
. You changed it tosize > ptr_size * 2
.There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, you're right I forgot about this. It explain the performance regressions.
I will revert this change. Thanks for finding this out.