x86: make SSE2 required for i686 hardfloat targets and use it to pass SIMD types #135408

RalfJung · 2025-01-12T14:10:09Z

The primary goal of this is to make SSE2 required for our i686 targets (at least for the ones that use Pentium 4 as their baseline), to ensure they cannot be affected by #114479. This has been MCPd in rust-lang/compiler-team#808, and is tracked in #133611.

We do this by defining a new ABI that these targets select, and making SSE2 required by the ABI (that's the first commit). That's kind of a hack, but (a) it is the easiest way to make a target feature required via the target spec, and (b) we actually can use SSE2 for the Rust ABI now that it is required, so the second commit goes ahead and does that. Specifically, we use it in two ways: to return f64 values in a register rather than by-ptr, and to pass vectors of size up to 128bit in a register (or, well, whatever LLVM does when passing <4 x float> by-val, I don't actually know if this ends up in a register).

Cc @workingjubilee
Fixes #133611

rustbot · 2025-01-12T14:10:18Z

r? @SparrowLii

rustbot has assigned @SparrowLii.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

rustbot · 2025-01-12T14:10:21Z

Some changes occurred in compiler/rustc_codegen_gcc

cc @antoyo, @GuillaumeGomez

These commits modify compiler targets.
(See the Target Tier Policy.)

compiler/rustc_target/src/callconv/mod.rs

RalfJung · 2025-01-12T15:04:17Z

tests/assembly/x86-return-float.rs

+//@[sse] needs-llvm-components: x86
+// We make SSE available but don't use it for the ABI.
+//@[nosse] compile-flags: --target i586-unknown-linux-gnu -Ctarget-feature=+sse2 -Ctarget-cpu=pentium4
+//@[nosse] needs-llvm-components: x86


Tidy is being silly and doesn't let us set the needs-llvm-components: x86 uniformly for all revisions.

RalfJung · 2025-01-12T15:05:52Z

tests/codegen/intrinsics/transmute-x64.rs

+    // FIXME: the MIR opt still works, but the ABI logic now introduces
+    // an alloca here.
+    // CHECK: alloca
+    // CHECK: store <4 x float> %x, ptr %_0, align 16


I have no idea what this was trying to test, but it probably doesn't test that any more. The alloca is emitted by the ABI handling, and this test disables LLVM optimizations, so there's no way we can avoid the alloca.

It seems like this is intended to test mir-opts, but then why is it not a mir-opt test...?

Cc @scottmcm (who added the test in d757c4b)

This is a regression test for an ICE in cg_ssa: d757c4b

But why would that be a codegen test...?

hmm @scottmcm can you explain?

RalfJung · 2025-01-12T15:06:54Z

tests/codegen/simd/packed-simd.rs

-    // CHECK-NEXT: store <3 x float> [[VREG]], ptr [[RET_VREG]], [[RET_ALIGN]]
-    // CHECK-NEXT: ret void
+    // opt3-NEXT: ret <3 x float> [[VREG:%[a-z0-9_]+]]
+    // noopt: ret <3 x float> [[VREG:%[a-z0-9_]+]]


I have no idea if this test still makes any sense for the "noopt" revision... it seems like for some reason the call to load does not get inlined any more or so?

OTOH the "noopt" test was already very odd before... the load <3 x float> there referred to loading the return value of load() which was returned into the alloca.

So I made this care only about opt3 for the square_packed_full part of the test.

RalfJung · 2025-01-12T15:09:13Z

tests/codegen/simd/packed-simd.rs

@@ -1,4 +1,5 @@
 //@ revisions:opt3 noopt
+//@ only-x86_64


This test is checking our LLVM ABI lowering as much as it is checking anything packed-simd specific, so it will be very hard to make this work uniformly across targets that use a different ABI lowering.

compiler/rustc_target/src/spec/mod.rs

compiler/rustc_target/src/callconv/mod.rs

bjorn3 · 2025-01-12T15:47:31Z

compiler/rustc_target/src/callconv/x86.rs

            {
                // This is a single scalar that fits into an SSE register.
+                // FIXME: We cannot return 128-bit-floats this way since scalars larger than
+                // 64bit must be returned indirectly to make cranelift happy. See the comment
+                // in `adjust_for_rust_abi`.


f128 floats are only partially supported by Cranelift, but even so returning them in a vector register should work just fine I think. Returning f128 in a vector register doesn't have the same issue that returning i128 in integer registers has as f128 fits in a single vector register, while i128 doesn't fit in a single integer register.

The problem is that the way the "return large things indirectly" is implemented is not great, it leads to ICEs if other adjustments have already decided the ABI for one of these return types: make_indirect cannot be called if someone already called cast_to.

IMO this is a backend bug, backends should support all scalar types as return types.

RalfJung · 2025-01-12T17:27:17Z

tests/ui-fulldeps/codegen-backend/hotplug.rs

@@ -6,6 +6,9 @@
 //@ normalize-stdout: "libthe_backend.dylib" -> "libthe_backend.so"
 //@ normalize-stdout: "the_backend.dll" -> "libthe_backend.so"

+// Pick a target that requires no target features, so that no warning is shown
+// about missing target features.
+//@ compile-flags: --target arm-unknown-linux-gnueabi


@bjorn3 does the cranelift backend return anything for target_features_cfg? If not, there might be warnings now about missing target features, depending on the ABI info for the current target.

Yes, it currently hard codes sse and sse2 for all x86_64 targets that are not bare-metal:

rust/compiler/rustc_codegen_cranelift/src/lib.rs

Lines 179 to 200 in 7bb9888

fn target_features_cfg(

&self,

sess: &Session,

_allow_unstable: bool,

) -> Vec<rustc_span::Symbol> {

// FIXME return the actually used target features. this is necessary for #[cfg(target_feature)]

if sess.target.arch == "x86_64" && sess.target.os != "none" {

// x86_64 mandates SSE2 support

vec![sym::fsxr, sym::sse, sym::sse2]

} else if sess.target.arch == "aarch64" {

match &*sess.target.os {

"none" => vec![],

// On macOS the aes, sha2 and sha3 features are enabled by default and ring

// fails to compile on macOS when they are not present.

"macos" => vec![sym::neon, sym::aes, sym::sha2, sym::sha3],

// AArch64 mandates Neon support

_ => vec![sym::neon],

}

} else {

vec![]

}

}

tests/ui-fulldeps/codegen-backend/ doesn't actually use cg_clif. It uses the backend in tests/ui-fulldeps/codegen-backend/auxiliary/the_backend.rs. It is fine to implement target_features_cfg there as always returning sse and sse2. It doesn't compile anything to machine code anyway. It is just a test that -Zcodegen-backend with an external codegen backend functions.

Yeah I get that. But it still seemed easiest to just use a target that doesn't require any features.

compiler/rustc_target/src/spec/targets/i686_unknown_linux_gnu.rs

RalfJung · 2025-02-03T15:44:06Z

I'm not sure that this is a good justification for creating a new ABI for it.

The "Rust" ABI is entirely an internal implementation detail of the compiler. It seems silly to pass SSE values (or f64 return values) by-reference when we require SSE for a given target. So I am not sure why we'd keep applying our existing less efficient non-standard ABI adjustments when we don't have to.

briansmith · 2025-02-03T15:55:44Z

Here is the RedoxOS target definition for its i686-unknown-kernel: https://gitlab.redox-os.org/redox-os/kernel/-/blob/master/targets/i686-unknown-kernel.json

`"llvm-target": "i686-unknown-none",`
...
`"features": "-mmx,-sse,-sse2,-sse3,-ssse3,-sse4.1,-sse4.2,-3dnow,-3dnowa,-avx,-avx2,+soft-float",`
...

I guess this would be very similar to what one for the Linux kernel would be.

/cc @jackpot51

RalfJung · 2025-02-03T16:00:50Z

That is clearly a softfloat target and as such unaffected by this PR. Linux also uses a softfloat target.

briansmith · 2025-02-03T16:13:45Z

So, in a softfloat i686-* target, or an i586- target, when I have this:

#[target_feature(enable = "sse2")]
fn foo(....) {
}

When I call foo(), will it also use the new calling convention?

RalfJung · 2025-02-03T16:27:05Z

No, softfloat targets always use the softfloat ABI. i586 hardfloat targets use the basic x86-32 non-SSE ABI. #[target_feature] does not affect the ABI (except in cases tracked by existing soundness issues: #116344 and #116558).

(FYI, someone recently told me that LLVM basically entirely ignores +sse2 when +soft-float is set; I did not check that, but I suspect your attribute already does basically nothing on a soft-float target. But that's a separate issue.)

briansmith · 2025-02-03T17:04:40Z

No, softfloat targets always use the softfloat ABI. i586 hardfloat targets use the basic x86-32 non-SSE ABI. #[target_feature] does not affect the ABI (except in cases tracked by existing soundness issues: #116344 and #116558).

(FYI, someone recently told me that LLVM basically entirely ignores +sse2 when +soft-float is set; I did not check that, but I suspect your attribute already does basically nothing on a soft-float target. But that's a separate issue.)

That will work very poorly for x86_64-unknown-none and also any similar softfloat i686 target. We really want our SIMD-enabled functions to be fully optimized (including between calls to themselves) even on targets where SIMD can only be used in specific contexts, like within OS kernels.

And the reason it's relevant to this PR is that, if we made that work, then we wouldn't need a new ABI for hardfloat x86 because the existing ABI would automatically do the right thing on a per-call basis.

RalfJung · 2025-02-03T17:18:11Z

That will work very poorly for x86_64-unknown-none and also any similar softfloat i686 target. We really want our SIMD-enabled functions to be fully optimized (including between calls to themselves) even on targets where SIMD can only be used in specific contexts, like within OS kernels.

That (LLVM not using +sse2 when +soft-float is set) is a pre-existing issue and has nothing to do with this PR. I agree it's a bug, though it's mostly an LLVM bug, and it's related to LLVM generally having very poor support for softfloat ABI targets (except on arm-32, there it has excellent support).

And the reason it's relevant to this PR is that, if we made that work, then we wouldn't need a new ABI for hardfloat x86 because the existing ABI would automatically do the right thing on a per-call basis.

I don't follow. ABI can never depend on #[target_feature], that would be unsound.

bjorn3 · 2025-02-03T17:20:00Z

(FYI, someone recently told me that LLVM basically entirely ignores +sse2 when +soft-float is set; I did not check that, but I suspect your attribute already does basically nothing on a soft-float target. But that's a separate issue.)

Looks like that is indeed the case: https://rust.godbolt.org/z/PGGv3n9M6

RalfJung · 2025-02-03T17:25:38Z

wouldn't need a new ABI

As I already explained, we don't need a new ABI for anything. What we need is to prevent people from disabling SSE on targets with a Pentium 4 baseline, as doing so would bring them outside what is supported for these targets. (See #114479, which the compiler team decided to fix by requiring SSE on the affected targets.) That's the first commit. The second commit (the only one that actually changes the ABI -- and only the entirely unstable internal "Rust" ABI) is entirely "because we can". The second commit does nothing for softfloat targets.

I have no idea what you are trying to prevent or achieve here, but I recommend you read the issues linked in the OP -- this is just the last PR in a long line of work on cleaning up our handling of target features, and there is a lot of context here which you are not accounting for.

Explicitly choose x86 softfloat/hardfloat ABI Part of rust-lang#135408: Instead of choosing this based on the target features listed in the target spec, make that choice explicit. All built-in targets are being updated here; custom (JSON-defined) x86 (32bit and 64bit) softfloat targets need to explicitly set `rustc-abi` to `x86-softfloat`.

bors · 2025-02-03T23:35:15Z

☔ The latest upstream changes (presumably #136146) made this pull request unmergeable. Please resolve the merge conflicts.

workingjubilee · 2025-02-04T00:57:38Z

@briansmith function pointers exist?

Explicitly choose x86 softfloat/hardfloat ABI Part of rust-lang/rust#135408: Instead of choosing this based on the target features listed in the target spec, make that choice explicit. All built-in targets are being updated here; custom (JSON-defined) x86 (32bit and 64bit) softfloat targets need to explicitly set `rustc-abi` to `x86-softfloat`.

RalfJung · 2025-02-04T10:57:30Z

@workingjubilee all the prerequisite PRs landed so this should be good to go. :)

briansmith · 2025-02-04T16:25:07Z

I have no idea what you are trying to prevent or achieve here, but I recommend you read the issues linked in the OP -- this is just the last PR in a long line of work on cleaning up our handling of target features, and there is a lot of context here which you are not accounting for.

I have read the other issues. My point is that we need a better story for using SIMD usage (including optimized calling conventions) within targets that don't have SIMD target features enabled by default. Not just for x86, but for all targets. If we can solve that problem then having a new ABI for x86 makes less sense because ideally an optimized calling convention would automatically be chosen based on the type of the function being called (whether it contains FP/SIMD types in its parameters or return type).

Regardless, I don't object to this change.

RalfJung · 2025-02-04T16:37:26Z

That's a much wider discussion; there are several pretty tricky issues in the proposal you sketched that I don't know how to solve or how you imagine them being solved -- but please let's move this to Zulip or IRLO if you want to continue that discussion.

My personal main focus is fixing the soundness issues in our existing treatment of target features; I won't be pushing for anything beyond that myself, but I am happy to help evaluate the soundness of any proposals that are being made to improve control or performance. Given that everything in this PR is an unstable implementation detail, if it turns out that those proposals obviate the need for this separate ABI, we can just take it back -- but I don't see how that could possibly work, and even if it did work I doubt anything will actually materialize in the near future.

bors · 2025-02-07T17:10:58Z

☔ The latest upstream changes (presumably #136684) made this pull request unmergeable. Please resolve the merge conflicts.

workingjubilee

some preliminary comments, mostly positive noises. looks good. I need to reread the actual compiler logic again.

workingjubilee · 2025-02-07T22:10:22Z

tests/ui/sse-abi-checks.rs

@@ -1,6 +1,6 @@
 //! Ensure we trigger abi_unsupported_vector_types for target features that are usually enabled
 //! on a target, but disabled in this file via a `-C` flag.
-//@ compile-flags: --crate-type=rlib --target=i686-unknown-linux-gnu -C target-feature=-sse,-sse2
+//@ compile-flags: --crate-type=rlib --target=i586-unknown-linux-gnu -C target-feature=-sse,-sse2


Huh. Do we need to disable SSE and SSE2 on this target anymore?

Ah, right, we do not... and the point here was to check that -Ctarget-feature disabling features is correctly checked. I guess we can't do that on x86 any more since we can't disable the relevant target features any more. Do we even still have any target that has some vector size by default which can then be disabled?

For now I am mirroring the old test by also setting -Ctarget-cpu.

workingjubilee · 2025-02-07T22:13:56Z

compiler/rustc_target/src/spec/targets/i686_unknown_linux_gnu.rs

+    // Also note that x86 without SSE2 is *not* considered a Tier 1 target by the Rust project, and
+    // it has some known floating-point correctness issues that can lead to unsoundness.


"unsoundness" sounds vague, and the alternative can also be "unsoundness" as seen by a distro packager (i.e. programs that incorporate Rust crashing due to SIGILL on CPUs they used to run on... usually a bad sign!)

I think we can specify the double-rounding issue and sometimes value truncation, because LLVM can incorrectly handle spilling a value to the stack and then pulling it back into an FP register? So float-using code may now have corrupted values, which may lead to incorrect math or memory unsoundness if code uses a float as a key to a data structure.

I extended the comment.

workingjubilee · 2025-02-07T22:20:00Z

tests/assembly/closure-inherit-target-feature.rs

@@ -15,9 +15,8 @@ pub unsafe fn sse41_blend_nofeature(x: __m128, y: __m128) -> __m128 {
        // check that _mm_blend_ps is not being inlined into the closure
        // CHECK-LABEL: {{sse41_blend_nofeature.*closure.*:}}
        // CHECK-NOT: blendps
-        // CHECK: {{call .*_mm_blend_ps.*}}
+        // CHECK: {{jmp .*_mm_blend_ps.*}}


huh, that's interesting...

workingjubilee · 2025-02-07T22:20:45Z

tests/codegen/simd-intrinsic/simd-intrinsic-transmute-array.rs

@@ -38,7 +38,7 @@ pub fn build_array_s(x: [f32; 4]) -> S<4> {
 #[no_mangle]
 pub fn build_array_transmute_s(x: [f32; 4]) -> S<4> {
    // CHECK: %[[VAL:.+]] = load <4 x float>, ptr %x, align [[ARRAY_ALIGN]]
-    // CHECK: store <4 x float> %[[VAL:.+]], ptr %_0, align [[VECTOR_ALIGN]]
+    // CHECK: ret <4 x float> %[[VAL:.+]]


rustbot assigned SparrowLii Jan 12, 2025

rustbot added O-apple Operating system: Apple (macOS, iOS, tvOS, visionOS, watchOS) S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels Jan 12, 2025

RalfJung commented Jan 12, 2025

View reviewed changes

compiler/rustc_target/src/callconv/mod.rs Show resolved Hide resolved

This comment has been minimized.

Sign in to view

RalfJung force-pushed the x86-sse2 branch 2 times, most recently from 1c0655d to 1e6dbb8 Compare January 12, 2025 15:03

RalfJung commented Jan 12, 2025

View reviewed changes

Noratrieb reviewed Jan 12, 2025

View reviewed changes

compiler/rustc_target/src/spec/mod.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Jan 12, 2025

View reviewed changes

compiler/rustc_target/src/callconv/mod.rs Outdated Show resolved Hide resolved

bjorn3 reviewed Jan 12, 2025

View reviewed changes

compiler/rustc_target/src/callconv/mod.rs Outdated Show resolved Hide resolved

RalfJung force-pushed the x86-sse2 branch 2 times, most recently from 8bfa689 to d7b63a3 Compare January 12, 2025 15:29

bjorn3 reviewed Jan 12, 2025

View reviewed changes

RalfJung force-pushed the x86-sse2 branch from a52a670 to 2789b51 Compare January 12, 2025 15:54

This comment has been minimized.

Sign in to view

RalfJung force-pushed the x86-sse2 branch 2 times, most recently from e3efac5 to e082978 Compare January 12, 2025 16:33

This comment has been minimized.

Sign in to view

RalfJung force-pushed the x86-sse2 branch from e082978 to 3eb1f47 Compare January 12, 2025 17:24

RalfJung commented Jan 12, 2025

View reviewed changes

This comment has been minimized.

Sign in to view

workingjubilee reviewed Jan 12, 2025

View reviewed changes

compiler/rustc_target/src/spec/targets/i686_unknown_linux_gnu.rs Outdated Show resolved Hide resolved

RalfJung changed the title ~~x86: make SSE2 required for i686 targets and use it to pass SIMD types~~ x86: make SSE2 required for i686 hardfloat targets and use it to pass SIMD types Feb 3, 2025

RalfJung force-pushed the x86-sse2 branch from aaa4b7a to 08d30d4 Compare February 4, 2025 10:18

RalfJung marked this pull request as ready for review February 4, 2025 10:56

RalfJung unassigned estebank Feb 4, 2025

RalfJung force-pushed the x86-sse2 branch from 08d30d4 to 41a90fb Compare February 4, 2025 11:03

workingjubilee reviewed Feb 7, 2025

View reviewed changes

RalfJung force-pushed the x86-sse2 branch 3 times, most recently from c588f51 to f6c93a2 Compare February 8, 2025 07:38

RalfJung added 2 commits February 8, 2025 09:11

add x86-sse2 (32bit) ABI that requires SSE2 target feature

80afc75

x86-sse2 ABI: use SSE registers for floats and SIMD

198b87f

RalfJung force-pushed the x86-sse2 branch from f6c93a2 to 198b87f Compare February 8, 2025 08:11

	fn target_features_cfg(
	&self,
	sess: &Session,
	_allow_unstable: bool,
	) -> Vec<rustc_span::Symbol> {
	// FIXME return the actually used target features. this is necessary for #[cfg(target_feature)]
	if sess.target.arch == "x86_64" && sess.target.os != "none" {
	// x86_64 mandates SSE2 support
	vec![sym::fsxr, sym::sse, sym::sse2]
	} else if sess.target.arch == "aarch64" {
	match &*sess.target.os {
	"none" => vec![],
	// On macOS the aes, sha2 and sha3 features are enabled by default and ring
	// fails to compile on macOS when they are not present.
	"macos" => vec![sym::neon, sym::aes, sym::sha2, sym::sha3],
	// AArch64 mandates Neon support
	_ => vec![sym::neon],
	}
	} else {
	vec![]
	}
	}

		// Also note that x86 without SSE2 is not considered a Tier 1 target by the Rust project, and
		// it has some known floating-point correctness issues that can lead to unsoundness.

x86: make SSE2 required for i686 hardfloat targets and use it to pass SIMD types #135408

Are you sure you want to change the base?

x86: make SSE2 required for i686 hardfloat targets and use it to pass SIMD types #135408

Conversation

RalfJung commented Jan 12, 2025 • edited Loading

rustbot commented Jan 12, 2025

rustbot commented Jan 12, 2025

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

RalfJung Jan 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Jan 12, 2025 • edited Loading

Choose a reason for hiding this comment

RalfJung Jan 12, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

bjorn3 Jan 12, 2025 • edited Loading

Choose a reason for hiding this comment

RalfJung Jan 12, 2025 • edited Loading

Choose a reason for hiding this comment

This comment has been minimized.

This comment has been minimized.

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

This comment has been minimized.

RalfJung commented Feb 3, 2025 • edited Loading

briansmith commented Feb 3, 2025

RalfJung commented Feb 3, 2025 • edited Loading

briansmith commented Feb 3, 2025 • edited Loading

RalfJung commented Feb 3, 2025 • edited Loading

briansmith commented Feb 3, 2025

RalfJung commented Feb 3, 2025 • edited Loading

bjorn3 commented Feb 3, 2025

RalfJung commented Feb 3, 2025 • edited Loading

bors commented Feb 3, 2025

workingjubilee commented Feb 4, 2025

RalfJung commented Feb 4, 2025

briansmith commented Feb 4, 2025

RalfJung commented Feb 4, 2025 • edited Loading

bors commented Feb 7, 2025

workingjubilee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung Feb 8, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RalfJung commented Jan 12, 2025 •

edited

Loading

RalfJung Jan 12, 2025 •

edited

Loading

RalfJung Jan 12, 2025 •

edited

Loading

RalfJung Jan 12, 2025 •

edited

Loading

bjorn3 Jan 12, 2025 •

edited

Loading

RalfJung Jan 12, 2025 •

edited

Loading

RalfJung commented Feb 3, 2025 •

edited

Loading

RalfJung commented Feb 3, 2025 •

edited

Loading

briansmith commented Feb 3, 2025 •

edited

Loading

RalfJung commented Feb 3, 2025 •

edited

Loading

RalfJung commented Feb 3, 2025 •

edited

Loading

RalfJung commented Feb 3, 2025 •

edited

Loading

RalfJung commented Feb 4, 2025 •

edited

Loading

RalfJung Feb 8, 2025 •

edited

Loading