-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Clamp Function for f32 and f64 #100556
Clamp Function for f32 and f64 #100556
Conversation
This comment was marked as resolved.
This comment was marked as resolved.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon. Please see the contribution instructions for more information. |
These are very susceptible to getting worse for different implementations, since they were defined in the RFC very intentionally to be able to be implemented as But I think we don't have a test for that right now. So could you please add a https://github.com/rust-lang/rust/tree/master/src/test/assembly test for it? I think this should work: // Floating-point clamp is designed to be implementable as max+min,
// so check to make sure that's what it's actually emitting.
// assembly-output: emit-asm
// compile-flags: -O -C "llvm-args=-x86-asm-syntax=intel"
// only-x86_64
// CHECK-LABEL: clamp_demo:
#[no_mangle]
pub fn clamp_demo(a: f32, x: f32, y: f32) -> f32 {
// CHECK: maxss
// CHECK: minss
a.clamp(x, y)
}
// CHECK-LABEL: clamp12_demo:
#[no_mangle]
pub fn clamp12_demo(a: f32) -> f32 {
// CHECK-NEXT: movss xmm1
// CHECK-NEXT: maxss xmm1, xmm0
// CHECK-NEXT: movss xmm0
// CHECK-NEXT: minss xmm0, xmm1
// CHECK-NEXT: ret
a.clamp(1.0, 2.0)
} Then if the new implementation of the method still passes (or produces similarly good ASM), then I'd be happy to use it! @rustbot author |
I checked the codegen in godbolt pub fn clamp_old(this: f32, min: f32, max: f32) -> f32 {
assert!(min <= max);
let mut x = this;
if x < min {
x = min;
}
if x > max {
x = max;
}
x
}
pub fn clamp_new(this: f32, min: f32, max: f32) -> f32 {
assert!(min <= max);
if this <= min {
return min;
}
if this > max {
return max;
}
this
} example::clamp_old:
ucomiss xmm2, xmm1
jb .LBB0_2
maxss xmm1, xmm0
minss xmm2, xmm1
movaps xmm0, xmm2
ret
.LBB0_2:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_2]
mov esi, 28
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2
example::clamp_new:
ucomiss xmm2, xmm1
jb .LBB1_2
minss xmm2, xmm0
cmpnless xmm0, xmm1
andps xmm2, xmm0
andnps xmm0, xmm1
orps xmm0, xmm2
ret
.LBB1_2:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_3]
mov esi, 28
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2 |
Thanks, this is what I was concerned about. For example, on Zen2 https://www.agner.org/optimize/instruction_tables.pdf lists Targeting that architecture specifically doesn't fix it either: https://godbolt.org/z/ET5sEff8q. That can (In general, down at the single instruction level, trying to branch to skip things is often slower than just running them.) So I think you could update this to use a |
(closing and immediately reopening to try to re-kick the PR build, since only 1 of the three jobs ran.) |
@scottmcm Thank you for your advice! I attempted to change the clamp function using example::clamp_old:
ucomiss xmm2, xmm1
jb .LBB0_2
maxss xmm1, xmm0
minss xmm2, xmm1
movaps xmm0, xmm2
ret
.LBB0_2:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_2]
mov esi, 28
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2
example::clamp_new:
ucomiss xmm2, xmm1
jb .LBB1_5
ucomiss xmm1, xmm0
jae .LBB1_4
ucomiss xmm0, xmm2
movaps xmm1, xmm0
jbe .LBB1_4
movaps xmm1, xmm2
.LBB1_4:
movaps xmm0, xmm1
ret
.LBB1_5:
push rax
lea rdi, [rip + .L__unnamed_1]
lea rdx, [rip + .L__unnamed_3]
mov esi, 28
call qword ptr [rip + core::panicking::panic@GOTPCREL]
ud2 This new clamp also has slightly different functionality than the original one, since I think it mutates the original value? and returns a float, as opposed to just returning a new value. fn clamp(mut self, min: f32, max: f32) -> f32 {
assert!(min <= max);
if self <= min {
self = min;
} else if self > max {
self = max;
}
self
} |
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
This comment has been minimized.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hopefully just the one more change and things'll be good 🤞
If you could also squash all the commits, I'd appreciate it.
Simple Clamp Function I thought this was more robust and easier to read. I also allowed this function to return early in order to skip the extra bound check (I'm sure the difference is negligible). I'm not sure if there was a reason for binding `self` to `x`; if so, please correct me. Simple Clamp Function for f64 I thought this was more robust and easier to read. I also allowed this function to return early in order to skip the extra bound check (I'm sure the difference is negligible). I'm not sure if there was a reason for binding `self` to `x`; if so, please correct me. Floating point clamp test f32 clamp using mut self f64 clamp using mut self Update library/core/src/num/f32.rs Update f64.rs Update x86_64-floating-point-clamp.rs Update src/test/assembly/x86_64-floating-point-clamp.rs Update x86_64-floating-point-clamp.rs Co-Authored-By: scottmcm <scottmcm@users.noreply.github.com>
This comment has been minimized.
This comment has been minimized.
Co-authored-by: scottmcm <scottmcm@users.noreply.github.com>
Huzzah! The PR build finally stopped resisting. Thanks for your persistence here -- it's great to get an assembly test for this to record the work from the RFC in a way that's actually checked. @bors r+ |
@bors r- failed in a rollup #100664 (comment) |
@bors r+ |
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
Clamp Function for f32 and f64 I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks. If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
…iaskrgr Rollup of 11 pull requests Successful merges: - rust-lang#100556 (Clamp Function for f32 and f64) - rust-lang#100663 (Make slice::reverse const) - rust-lang#100697 ( Minor syntax and formatting update to doc comment on `find_vtable_types_for_unsizing`) - rust-lang#100760 (update test for LLVM change) - rust-lang#100761 (some general mir typeck cleanup) - rust-lang#100775 (rustdoc: Merge source code pages HTML elements together v2) - rust-lang#100813 (Add `/build-rust-analyzer/` to .gitignore) - rust-lang#100821 (Make some docs nicer wrt pointer offsets) - rust-lang#100822 (Replace most uses of `pointer::offset` with `add` and `sub`) - rust-lang#100839 (Make doc for stdin field of process consistent) - rust-lang#100842 (Add diagnostics lints to `rustc_transmute` module (zero diags)) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.
If there was a reason for binding
self
tox
or if this code is incorrect, please correct me :)