-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Stop generating alloca
s & memcmp
for simple short array equality
#85828
Conversation
Some changes occured to the CTFE / Miri engine cc @rust-lang/miri |
r? @dtolnay (rust-highfive has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
⌛ Trying commit a6546252f35553da72f213514eb1ea3f26c67b0a with merge 4fc737828e7a2f978c30875c76c93b2f0e5d6562... |
☀️ Try build successful - checks-actions |
Queued 4fc737828e7a2f978c30875c76c93b2f0e5d6562 with parent bff138d, future comparison URL. |
For future archeologists it may be helpful to create a separate commit for the move of the eq logic to the eq module and base the logic changes on top of that |
let rhs = self.read_scalar(rhs)?.check_init()?; | ||
let lhs_bytes = self.memory.read_bytes(lhs, layout.size)?; | ||
let rhs_bytes = self.memory.read_bytes(rhs, layout.size)?; | ||
Ok(Scalar::Int((lhs_bytes == rhs_bytes).into())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Please use Scalar::from_bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Way better! Thanks.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
By the way, this will raise errors when there are uninit bytes or pointers anywhere in this memory. For CTFE that makes sense, for Miri-the-tool we might want to properly support comparing pointers...
Nothing we have to resolve now, just pointing this out.
Finished benchmarking try commit (4fc737828e7a2f978c30875c76c93b2f0e5d6562): comparison url. Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. Please note that if the perf results are neutral, you should likely undo the rollup=never given below by specifying Importantly, though, if the results of this run are non-neutral do not roll this PR up -- it will mask other regressions or improvements in the roll up. @bors rollup=never |
@@ -367,6 +367,14 @@ pub fn check_intrinsic_type(tcx: TyCtxt<'_>, it: &hir::ForeignItem<'_>) { | |||
|
|||
sym::nontemporal_store => (1, vec![tcx.mk_mut_ptr(param(0)), param(0)], tcx.mk_unit()), | |||
|
|||
sym::raw_eq => { | |||
let param_count = if intrinsic_name == sym::raw_eq { 2 } else { 1 }; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is matching on the intrinsic name already?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops! Good catch. (Remnant of a previous attempt which included a second intrinsic.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
still there, did you forget to push?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fixed -- https://github.com/rust-lang/rust/pull/85828/files#diff-b2a7c31c785c36fc43ebd3ba40a8a1571af0cf6fb3852594055305ac5bc58e88R384 -- GitHub just isn't showing it on the conversation page.
Added the implementation for cranelift. I didn't bother doing anything too fancy, but it does trigger for the IPv6 case (
Otherwise it
Added another nice codegen test example. pub fn array_eq_zero(x: [u16; 8]) -> bool { x == [0; 8] } Before: %x = alloca i128, align 8
store i128 %0, i128* %x, align 8
%_11.i.i.i = bitcast i128* %x to i8*
%bcmp.i.i.i = call i32 @bcmp(i8* nonnull dereferenceable(16) %_11.i.i.i, i8* nonnull dereferenceable(16) getelementptr inbounds (<{ [16 x i8] }>, <{ [16 x i8] }>* @alloc2, i64 0, i32 0, i64 0), i64 16) #2, !alias.scope !2
%1 = icmp eq i32 %bcmp.i.i.i, 0
ret i1 %1
After: %1 = icmp eq i128 %0, 0
ret i1 %1
|
Did some investigation into the regressions in the perf run here - https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/.2385828 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
cg_clif changes LGTM
Changing the threshold means it's probably worth re-running perf @bors try @rust-timer queue |
Awaiting bors try build completion. @rustbot label: +S-waiting-on-perf |
@@ -712,6 +713,10 @@ impl CodegenCx<'b, 'tcx> { | |||
ifn!("llvm.assume", fn(i1) -> void); | |||
ifn!("llvm.prefetch", fn(i8p, t_i32, t_i32, t_i32) -> void); | |||
|
|||
// This isn't an "LLVM intrinsic", but LLVM's optimization passes | |||
// recognize it like one and we assume it exists in `core::slice::cmp` | |||
ifn!("memcmp", fn(i8p, i8p, t_isize) -> t_i32); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The memcmp in slice::cmp
has a FIXME for the return type, should that go here too?
- Add `:Sized` assertion in interpreter impl - Use `Scalar::from_bool` instead of `ScalarInt: From<bool>` - Remove unneeded comparison in intrinsic typeck - Make this UB to call with undef, not just return undef in that case
Showing that this avoids an alloca and private constant.
This comment has been minimized.
This comment has been minimized.
@oli-obk I've re-based and blessed this. |
@bors r+ |
📌 Commit d064494 has been approved by |
☀️ Test successful - checks-actions |
…t-slices, r=dtolnay Do array-slice equality via array equality, rather than always via slices ~~Draft because it needs a rebase after rust-lang#91766 eventually gets through bors.~~ This enables the optimizations from rust-lang#85828 to be used for array-to-slice comparisons too, not just array-to-array. For example, <https://play.rust-lang.org/?version=nightly&mode=release&edition=2021&gist=5f9ba69b3d5825a782f897c830d3a6aa> ```rust pub fn demo(x: &[u8], y: [u8; 4]) -> bool { *x == y } ``` Currently writes the array to stack for no reason: ```nasm sub rsp, 4 mov dword ptr [rsp], edx cmp rsi, 4 jne .LBB0_1 mov eax, dword ptr [rdi] cmp eax, dword ptr [rsp] sete al add rsp, 4 ret .LBB0_1: xor eax, eax add rsp, 4 ret ``` Whereas with the change in this PR it just compares it directly: ```nasm cmp rsi, 4 jne .LBB1_1 cmp dword ptr [rdi], edx sete al ret .LBB1_1: xor eax, eax ret ```
Example:
Before:
After: