Stop generating `alloca`s & `memcmp` for simple short array equality #85828

scottmcm · 2021-05-30T06:45:56Z

Example:

pub fn demo(x: [u16; 6], y: [u16; 6]) -> bool { x == y }

Before:

define zeroext i1 @_ZN10playground4demo17h48537f7eac23948fE(i96 %0, i96 %1) unnamed_addr #0 {
start:
  %y = alloca [6 x i16], align 8
  %x = alloca [6 x i16], align 8
  %.0..sroa_cast = bitcast [6 x i16]* %x to i96*
  store i96 %0, i96* %.0..sroa_cast, align 8
  %.0..sroa_cast3 = bitcast [6 x i16]* %y to i96*
  store i96 %1, i96* %.0..sroa_cast3, align 8
  %_11.i.i.i = bitcast [6 x i16]* %x to i8*
  %_14.i.i.i = bitcast [6 x i16]* %y to i8*
  %bcmp.i.i.i = call i32 @bcmp(i8* nonnull dereferenceable(12) %_11.i.i.i, i8* nonnull dereferenceable(12) %_14.i.i.i, i64 12) #2, !alias.scope !2
  %2 = icmp eq i32 %bcmp.i.i.i, 0
  ret i1 %2
}

playground::demo: # @playground::demo
	sub	rsp, 32
	mov	qword ptr [rsp], rdi
	mov	dword ptr [rsp + 8], esi
	mov	qword ptr [rsp + 16], rdx
	mov	dword ptr [rsp + 24], ecx
	xor	rdi, rdx
	xor	esi, ecx
	or	rsi, rdi
	sete	al
	add	rsp, 32
	ret

After:

define zeroext i1 @_ZN4mini4demo17h7a8994aaa314c981E(i96 %0, i96 %1) unnamed_addr #0 {
start:
  %2 = icmp eq i96 %0, %1
  ret i1 %2
}

_ZN4mini4demo17h7a8994aaa314c981E:
	xor	rcx, r8
	xor	edx, r9d
	or	rdx, rcx
	sete	al
	ret

rust-highfive · 2021-05-30T06:45:58Z

Some changes occured to the CTFE / Miri engine

cc @rust-lang/miri

rust-highfive · 2021-05-30T06:45:59Z

r? @dtolnay

(rust-highfive has picked a reviewer for you, use r? to override)

scottmcm · 2021-05-30T06:57:21Z

@bors try @rust-timer queue

rust-timer · 2021-05-30T06:57:22Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

bors · 2021-05-30T06:57:30Z

⌛ Trying commit a6546252f35553da72f213514eb1ea3f26c67b0a with merge 4fc737828e7a2f978c30875c76c93b2f0e5d6562...

bors · 2021-05-30T07:44:05Z

☀️ Try build successful - checks-actions
Build commit: 4fc737828e7a2f978c30875c76c93b2f0e5d6562 (4fc737828e7a2f978c30875c76c93b2f0e5d6562)

rust-timer · 2021-05-30T07:44:06Z

Queued 4fc737828e7a2f978c30875c76c93b2f0e5d6562 with parent bff138d, future comparison URL.

library/core/src/array/eq.rs

oli-obk · 2021-05-30T08:53:40Z

For future archeologists it may be helpful to create a separate commit for the move of the eq logic to the eq module and base the logic changes on top of that

library/core/src/intrinsics.rs

RalfJung · 2021-05-30T09:46:32Z

compiler/rustc_mir/src/interpret/intrinsics.rs

+        let rhs = self.read_scalar(rhs)?.check_init()?;
+        let lhs_bytes = self.memory.read_bytes(lhs, layout.size)?;
+        let rhs_bytes = self.memory.read_bytes(rhs, layout.size)?;
+        Ok(Scalar::Int((lhs_bytes == rhs_bytes).into()))


Please use Scalar::from_bool.

Way better! Thanks.

By the way, this will raise errors when there are uninit bytes or pointers anywhere in this memory. For CTFE that makes sense, for Miri-the-tool we might want to properly support comparing pointers...
Nothing we have to resolve now, just pointing this out.

compiler/rustc_mir/src/interpret/intrinsics.rs

rust-timer · 2021-05-30T10:14:40Z

Finished benchmarking try commit (4fc737828e7a2f978c30875c76c93b2f0e5d6562): comparison url.

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying rollup- to bors.

Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up.

@bors rollup=never
@rustbot label: +S-waiting-on-review -S-waiting-on-perf

Mark-Simulacrum · 2021-05-30T12:49:06Z

compiler/rustc_typeck/src/check/intrinsic.rs

@@ -367,6 +367,14 @@ pub fn check_intrinsic_type(tcx: TyCtxt<'_>, it: &hir::ForeignItem<'_>) {

            sym::nontemporal_store => (1, vec![tcx.mk_mut_ptr(param(0)), param(0)], tcx.mk_unit()),

+            sym::raw_eq => {
+                let param_count = if intrinsic_name == sym::raw_eq { 2 } else { 1 };


This is matching on the intrinsic name already?

Oops! Good catch. (Remnant of a previous attempt which included a second intrinsic.)

still there, did you forget to push?

It's fixed -- https://github.com/rust-lang/rust/pull/85828/files#diff-b2a7c31c785c36fc43ebd3ba40a8a1571af0cf6fb3852594055305ac5bc58e88R384 -- GitHub just isn't showing it on the conversation page.

compiler/rustc_codegen_llvm/src/intrinsic.rs

scottmcm · 2021-05-31T01:10:12Z

Added the implementation for cranelift.

I didn't bother doing anything too fancy, but it does trigger for the IPv6 case ([u16; 8]):

@0009                               v9 = load.i128 notrap v8
@0009                               v10 = load.i128 notrap v7
@0009                               v11 = icmp eq v9, v10
@0009                               v12 = bint.i8 v11

Otherwise it lib_calls out to memcmp still (here for the [u16; 6] case):

@0009                               v9 = iconst.i64 12
@0009                               v10 = call fn2(v8, v7, v9)
@0009                               v16 = iconst.i32 0
@0009                               v11 = icmp eq v10, v16
@0009                               v12 = bint.i8 v11

Added another nice codegen test example.

pub fn array_eq_zero(x: [u16; 8]) -> bool { x == [0; 8] }

Before:

  %x = alloca i128, align 8
  store i128 %0, i128* %x, align 8
  %_11.i.i.i = bitcast i128* %x to i8*
  %bcmp.i.i.i = call i32 @bcmp(i8* nonnull dereferenceable(16) %_11.i.i.i, i8* nonnull dereferenceable(16) getelementptr inbounds (<{ [16 x i8] }>, <{ [16 x i8] }>* @alloc2, i64 0, i32 0, i64 0), i64 16) #2, !alias.scope !2
  %1 = icmp eq i32 %bcmp.i.i.i, 0
  ret i1 %1

	sub	rsp, 16
	mov	qword ptr [rsp + 8], rsi
	mov	qword ptr [rsp], rdi
	or	rdi, rsi
	sete	al
	add	rsp, 16
	ret

After:

  %1 = icmp eq i128 %0, 0
  ret i1 %1

	or	rcx, rdx
	sete	al
	ret

compiler/rustc_codegen_cranelift/src/value_and_place.rs

Mark-Simulacrum · 2021-05-31T14:19:28Z

Did some investigation into the regressions in the perf run here - https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/.2385828

bjorn3

cg_clif changes LGTM

src/test/codegen/array-equality.rs

scottmcm · 2021-06-03T17:42:41Z

Changing the threshold means it's probably worth re-running perf

@bors try @rust-timer queue

rust-timer · 2021-06-03T17:42:43Z

Awaiting bors try build completion.

@rustbot label: +S-waiting-on-perf

the8472 · 2021-07-05T18:57:09Z

compiler/rustc_codegen_llvm/src/context.rs

@@ -712,6 +713,10 @@ impl CodegenCx<'b, 'tcx> {
        ifn!("llvm.assume", fn(i1) -> void);
        ifn!("llvm.prefetch", fn(i8p, t_i32, t_i32, t_i32) -> void);

+        // This isn't an "LLVM intrinsic", but LLVM's optimization passes
+        // recognize it like one and we assume it exists in `core::slice::cmp`
+        ifn!("memcmp", fn(i8p, i8p, t_isize) -> t_i32);


The memcmp in slice::cmp has a FIXME for the return type, should that go here too?

- Add `:Sized` assertion in interpreter impl - Use `Scalar::from_bool` instead of `ScalarInt: From<bool>` - Remove unneeded comparison in intrinsic typeck - Make this UB to call with undef, not just return undef in that case

Showing that this avoids an alloca and private constant.

<https://docs.rs/cranelift-codegen/0.74.0/cranelift_codegen/ir/types/struct.Type.html#method.int>

scottmcm · 2021-07-08T22:53:34Z

@oli-obk I've re-based and blessed this.

oli-obk · 2021-07-09T08:06:19Z

@bors r+

bors · 2021-07-09T08:06:20Z

📌 Commit d064494 has been approved by oli-obk

bors · 2021-07-09T09:16:33Z

⌛ Testing commit d064494 with merge ee86f96...

bors · 2021-07-09T11:57:19Z

☀️ Test successful - checks-actions
Approved by: oli-obk
Pushing ee86f96 to master...

…t-slices, r=dtolnay Do array-slice equality via array equality, rather than always via slices ~~Draft because it needs a rebase after rust-lang#91766 eventually gets through bors.~~ This enables the optimizations from rust-lang#85828 to be used for array-to-slice comparisons too, not just array-to-array. For example, <https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=5f9ba69b3d5825a782f897c830d3a6aa> ```rust pub fn demo(x: &[u8], y: [u8; 4]) -> bool { *x == y } ``` Currently writes the array to stack for no reason: ```nasm sub rsp, 4 mov dword ptr [rsp], edx cmp rsi, 4 jne .LBB0_1 mov eax, dword ptr [rdi] cmp eax, dword ptr [rsp] sete al add rsp, 4 ret .LBB0_1: xor eax, eax add rsp, 4 ret ``` Whereas with the change in this PR it just compares it directly: ```nasm cmp rsi, 4 jne .LBB1_1 cmp dword ptr [rdi], edx sete al ret .LBB1_1: xor eax, eax ret ```

rust-highfive assigned dtolnay May 30, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label May 30, 2021

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 30, 2021

oli-obk reviewed May 30, 2021

View reviewed changes

library/core/src/array/eq.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 30, 2021

View reviewed changes

library/core/src/intrinsics.rs Outdated Show resolved Hide resolved

RalfJung reviewed May 30, 2021

View reviewed changes

compiler/rustc_mir/src/interpret/intrinsics.rs Show resolved Hide resolved

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label May 30, 2021

Mark-Simulacrum reviewed May 30, 2021

View reviewed changes

scottmcm force-pushed the raw-eq branch from a654625 to a8a54f6 Compare May 30, 2021 17:26

scottmcm mentioned this pull request May 30, 2021

dead-code optimize if const { expr } even in opt-level=0 #85836

Closed

bjorn3 reviewed May 30, 2021

View reviewed changes

compiler/rustc_codegen_llvm/src/intrinsic.rs Show resolved Hide resolved

joshtriplett mentioned this pull request May 31, 2021

Change Ipv6Addr::is_loopback to include IPv4-mapped loopback addresses #85655

Closed

bjorn3 reviewed May 31, 2021

View reviewed changes

compiler/rustc_codegen_cranelift/src/value_and_place.rs Outdated Show resolved Hide resolved

bjorn3 approved these changes May 31, 2021

View reviewed changes

erikdesjardins reviewed Jun 2, 2021

View reviewed changes

src/test/codegen/array-equality.rs Show resolved Hide resolved

the8472 reviewed Jul 5, 2021

View reviewed changes

scottmcm added 8 commits July 8, 2021 14:53

Move the PartialEq and Eq impls for arrays to a separate file

d05eafa

Stop generating allocas+memcmp for simple array equality

2456495

PR feedback

b63b2f1

- Add `:Sized` assertion in interpreter impl - Use `Scalar::from_bool` instead of `ScalarInt: From<bool>` - Remove unneeded comparison in intrinsic typeck - Make this UB to call with undef, not just return undef in that case

Implement the raw_eq intrinsic in codegen_cranelift

1216353

Add another codegen test, array_eq_zero

039a3ba

Showing that this avoids an alloca and private constant.

PR Feedback: Don't put SSA-only types in CValues

3d2869c

Use cranelift's Type::int instead of doing the match myself

6444f24

<https://docs.rs/cranelift-codegen/0.74.0/cranelift_codegen/ir/types/struct.Type.html#method.int>

Adjust the threshold to look at the ABI, not just the size

07fb5ee

scottmcm force-pushed the raw-eq branch from ee90bef to 07fb5ee Compare July 8, 2021 21:56

This comment has been minimized.

Sign in to view

Bless a UI test

d064494

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jul 9, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Jul 9, 2021

bors merged commit ee86f96 into rust-lang:master Jul 9, 2021

rustbot added this to the 1.55.0 milestone Jul 9, 2021

scottmcm deleted the raw-eq branch August 29, 2021 07:03

scottmcm mentioned this pull request Dec 12, 2021

Do array-slice equality via array equality, rather than always via slices #91838

Merged

Urgau mentioned this pull request Jan 28, 2022

Don't over-optimize the abi layout #93405

Closed

This was referenced May 23, 2023

Example does not find panicking input since nightly-2021-12-18 rust-fuzz/libfuzzer#90

Open

Use load+store instead of memcpy for small integer arrays #111999

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Stop generating `alloca`s & `memcmp` for simple short array equality #85828

Stop generating `alloca`s & `memcmp` for simple short array equality #85828

scottmcm commented May 30, 2021

rust-highfive commented May 30, 2021

rust-highfive commented May 30, 2021

scottmcm commented May 30, 2021

rust-timer commented May 30, 2021

bors commented May 30, 2021

bors commented May 30, 2021

rust-timer commented May 30, 2021

oli-obk commented May 30, 2021

RalfJung May 30, 2021

scottmcm May 30, 2021

RalfJung May 30, 2021 •

edited

Loading

rust-timer commented May 30, 2021

Mark-Simulacrum May 30, 2021

scottmcm May 30, 2021

oli-obk Jul 6, 2021

scottmcm Jul 8, 2021 •

edited

Loading

scottmcm commented May 31, 2021 •

edited

Loading

Mark-Simulacrum commented May 31, 2021

bjorn3 left a comment

scottmcm commented Jun 3, 2021

rust-timer commented Jun 3, 2021

the8472 Jul 5, 2021

This comment has been minimized.

scottmcm commented Jul 8, 2021

oli-obk commented Jul 9, 2021

bors commented Jul 9, 2021

bors commented Jul 9, 2021

bors commented Jul 9, 2021

Stop generating allocas & memcmp for simple short array equality #85828

Stop generating allocas & memcmp for simple short array equality #85828

Conversation

scottmcm commented May 30, 2021

rust-highfive commented May 30, 2021

rust-highfive commented May 30, 2021

scottmcm commented May 30, 2021

rust-timer commented May 30, 2021

bors commented May 30, 2021

bors commented May 30, 2021

rust-timer commented May 30, 2021

oli-obk commented May 30, 2021

RalfJung May 30, 2021

Choose a reason for hiding this comment

scottmcm May 30, 2021

Choose a reason for hiding this comment

RalfJung May 30, 2021 • edited Loading

Choose a reason for hiding this comment

rust-timer commented May 30, 2021

Mark-Simulacrum May 30, 2021

Choose a reason for hiding this comment

scottmcm May 30, 2021

Choose a reason for hiding this comment

oli-obk Jul 6, 2021

Choose a reason for hiding this comment

scottmcm Jul 8, 2021 • edited Loading

Choose a reason for hiding this comment

scottmcm commented May 31, 2021 • edited Loading

Mark-Simulacrum commented May 31, 2021

bjorn3 left a comment

Choose a reason for hiding this comment

scottmcm commented Jun 3, 2021

rust-timer commented Jun 3, 2021

the8472 Jul 5, 2021

Choose a reason for hiding this comment

This comment has been minimized.

scottmcm commented Jul 8, 2021

oli-obk commented Jul 9, 2021

bors commented Jul 9, 2021

bors commented Jul 9, 2021

bors commented Jul 9, 2021

Stop generating `alloca`s & `memcmp` for simple short array equality #85828

Stop generating `alloca`s & `memcmp` for simple short array equality #85828

RalfJung May 30, 2021 •

edited

Loading

scottmcm Jul 8, 2021 •

edited

Loading

scottmcm commented May 31, 2021 •

edited

Loading