Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clamp Function for f32 and f64 #100556

Merged
merged 3 commits into from
Aug 21, 2022
Merged

Clamp Function for f32 and f64 #100556

merged 3 commits into from
Aug 21, 2022

Conversation

Alex-Velez
Copy link
Contributor

I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.

If there was a reason for binding self to x or if this code is incorrect, please correct me :)

@rustbot rustbot added the T-libs Relevant to the library team, which will review and decide on the PR/issue. label Aug 15, 2022
@rustbot

This comment was marked as resolved.

@rust-highfive
Copy link
Collaborator

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @scottmcm (or someone else) soon.

Please see the contribution instructions for more information.

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Aug 15, 2022
@scottmcm
Copy link
Member

scottmcm commented Aug 15, 2022

These are very susceptible to getting worse for different implementations, since they were defined in the RFC very intentionally to be able to be implemented as minss+maxss, rather than needing branching.

But I think we don't have a test for that right now. So could you please add a https://github.com/rust-lang/rust/tree/master/src/test/assembly test for it? I think this should work:

// Floating-point clamp is designed to be implementable as max+min,
// so check to make sure that's what it's actually emitting.

// assembly-output: emit-asm
// compile-flags: -O -C "llvm-args=-x86-asm-syntax=intel"
// only-x86_64

// CHECK-LABEL: clamp_demo:
#[no_mangle]
pub fn clamp_demo(a: f32, x: f32, y: f32) -> f32 {
    // CHECK: maxss
    // CHECK: minss
    a.clamp(x, y)
}

// CHECK-LABEL: clamp12_demo:
#[no_mangle]
pub fn clamp12_demo(a: f32) -> f32 {
    // CHECK-NEXT: movss   xmm1
    // CHECK-NEXT: maxss   xmm1, xmm0
    // CHECK-NEXT: movss   xmm0
    // CHECK-NEXT: minss   xmm0, xmm1
    // CHECK-NEXT: ret
    a.clamp(1.0, 2.0)
}

Then if the new implementation of the method still passes (or produces similarly good ASM), then I'd be happy to use it!

@rustbot author

@rustbot rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Aug 15, 2022
@5225225
Copy link
Contributor

5225225 commented Aug 15, 2022

I checked the codegen in godbolt

pub fn clamp_old(this: f32, min: f32, max: f32) -> f32 {
    assert!(min <= max);
    let mut x = this;
    if x < min {
        x = min;
    }
    if x > max {
        x = max;
    }
    x
}

pub fn clamp_new(this: f32, min: f32, max: f32) -> f32 {
    assert!(min <= max);
    if this <= min {
        return min;
    }
    if this > max {
        return max;
    }
    this
}
example::clamp_old:
        ucomiss xmm2, xmm1
        jb      .LBB0_2
        maxss   xmm1, xmm0
        minss   xmm2, xmm1
        movaps  xmm0, xmm2
        ret
.LBB0_2:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_2]
        mov     esi, 28
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2

example::clamp_new:
        ucomiss xmm2, xmm1
        jb      .LBB1_2
        minss   xmm2, xmm0
        cmpnless        xmm0, xmm1
        andps   xmm2, xmm0
        andnps  xmm0, xmm1
        orps    xmm0, xmm2
        ret
.LBB1_2:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_3]
        mov     esi, 28
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2

(https://godbolt.org/z/3hd33cfKW)

@scottmcm
Copy link
Member

        minss   xmm2, xmm0
        cmpnless        xmm0, xmm1
        andps   xmm2, xmm0
        andnps  xmm0, xmm1
        orps    xmm0, xmm2
        ret

Thanks, this is what I was concerned about.

For example, on Zen2 https://www.agner.org/optimize/instruction_tables.pdf lists maxss as latency 1, RThroughput ½ -- with {and(n)|or}ps as latency 1, RThroughput ¼. So it looks like this implementation is slightly more expensive, in addition to being more instructions.

Targeting that architecture specifically doesn't fix it either: https://godbolt.org/z/ET5sEff8q. That can vblendvps instead of and+andn+or, but that's Latency 1 RThroughput ½ just like maxss, and it still needs a vcmpltss first so it's slower.

(In general, down at the single instruction level, trying to branch to skip things is often slower than just running them.)

So I think you could update this to use a mut self if you prefer instead of the separate mut x, but the return approach seems to be suboptimal right now.

@scottmcm
Copy link
Member

(closing and immediately reopening to try to re-kick the PR build, since only 1 of the three jobs ran.)

@scottmcm scottmcm closed this Aug 16, 2022
@scottmcm scottmcm reopened this Aug 16, 2022
@Alex-Velez
Copy link
Contributor Author

Alex-Velez commented Aug 16, 2022

@scottmcm Thank you for your advice! I attempted to change the clamp function using mut self, however, it seems to produce this code according to godbolt.

example::clamp_old:
        ucomiss xmm2, xmm1
        jb      .LBB0_2
        maxss   xmm1, xmm0
        minss   xmm2, xmm1
        movaps  xmm0, xmm2
        ret
.LBB0_2:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_2]
        mov     esi, 28
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2

example::clamp_new:
        ucomiss xmm2, xmm1
        jb      .LBB1_5
        ucomiss xmm1, xmm0
        jae     .LBB1_4
        ucomiss xmm0, xmm2
        movaps  xmm1, xmm0
        jbe     .LBB1_4
        movaps  xmm1, xmm2
.LBB1_4:
        movaps  xmm0, xmm1
        ret
.LBB1_5:
        push    rax
        lea     rdi, [rip + .L__unnamed_1]
        lea     rdx, [rip + .L__unnamed_3]
        mov     esi, 28
        call    qword ptr [rip + core::panicking::panic@GOTPCREL]
        ud2

This new clamp also has slightly different functionality than the original one, since I think it mutates the original value? and returns a float, as opposed to just returning a new value.

fn clamp(mut self, min: f32, max: f32) -> f32 {
    assert!(min <= max);
    if self <= min {
        self = min;
    } else if self > max {
        self = max;
    }
    self
}

@rust-log-analyzer

This comment has been minimized.

@rust-log-analyzer

This comment has been minimized.

Copy link
Member

@scottmcm scottmcm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hopefully just the one more change and things'll be good 🤞

If you could also squash all the commits, I'd appreciate it.

Simple Clamp Function

I thought this was more robust and easier to read. I also allowed this function to return early in order to skip the extra bound check (I'm sure the difference is negligible). I'm not sure if there was a reason for binding `self` to `x`; if so, please correct me.

Simple Clamp Function for f64

I thought this was more robust and easier to read. I also allowed this function to return early in order to skip the extra bound check (I'm sure the difference is negligible). I'm not sure if there was a reason for binding `self` to `x`; if so, please correct me.

Floating point clamp test

f32 clamp using mut self

f64 clamp using mut self

Update library/core/src/num/f32.rs

Update f64.rs

Update x86_64-floating-point-clamp.rs

Update src/test/assembly/x86_64-floating-point-clamp.rs

Update x86_64-floating-point-clamp.rs

Co-Authored-By: scottmcm <scottmcm@users.noreply.github.com>
@rust-log-analyzer

This comment has been minimized.

Co-authored-by: scottmcm <scottmcm@users.noreply.github.com>
@scottmcm
Copy link
Member

Huzzah! The PR build finally stopped resisting. Thanks for your persistence here -- it's great to get an assembly test for this to record the work from the RFC in a way that's actually checked.

@bors r+

@bors
Copy link
Contributor

bors commented Aug 17, 2022

📌 Commit 302689f has been approved by scottmcm

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 17, 2022
@matthiaskrgr
Copy link
Member

@bors r- failed in a rollup #100664 (comment)

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Aug 17, 2022
@scottmcm
Copy link
Member

@bors r+

@bors
Copy link
Contributor

bors commented Aug 20, 2022

📌 Commit 0314647 has been approved by scottmcm

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Aug 20, 2022
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 21, 2022
Clamp Function for f32 and f64

I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.

If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 21, 2022
Clamp Function for f32 and f64

I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.

If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 21, 2022
Clamp Function for f32 and f64

I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.

If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 21, 2022
Clamp Function for f32 and f64

I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.

If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this pull request Aug 21, 2022
Clamp Function for f32 and f64

I thought the clamp function could use a little improvement for readability purposes. The function now returns early in order to skip the extra bound checks.

If there was a reason for binding `self` to `x` or if this code is incorrect, please correct me :)
bors added a commit to rust-lang-ci/rust that referenced this pull request Aug 21, 2022
…iaskrgr

Rollup of 11 pull requests

Successful merges:

 - rust-lang#100556 (Clamp Function for f32 and f64)
 - rust-lang#100663 (Make slice::reverse const)
 - rust-lang#100697 ( Minor syntax and formatting update to doc comment on `find_vtable_types_for_unsizing`)
 - rust-lang#100760 (update test for LLVM change)
 - rust-lang#100761 (some general mir typeck cleanup)
 - rust-lang#100775 (rustdoc: Merge source code pages HTML elements together v2)
 - rust-lang#100813 (Add `/build-rust-analyzer/` to .gitignore)
 - rust-lang#100821 (Make some docs nicer wrt pointer offsets)
 - rust-lang#100822 (Replace most uses of `pointer::offset` with `add` and `sub`)
 - rust-lang#100839 (Make doc for stdin field of process consistent)
 - rust-lang#100842 (Add diagnostics lints to `rustc_transmute` module (zero diags))

Failed merges:

r? `@ghost`
`@rustbot` modify labels: rollup
@bors bors merged commit a5c16a5 into rust-lang:master Aug 21, 2022
@rustbot rustbot added this to the 1.65.0 milestone Aug 21, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants