-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Optimize Ord trait implementation for bool #66881
Optimize Ord trait implementation for bool #66881
Conversation
Casting the booleans to `i8`s and converting their difference into `Ordering` generates better assembly than casting them to `u8`s and comparing them.
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @sfackler (or someone else) soon. If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes. Please see the contribution instructions for more information. |
In what context is the performance of bool's Ord implementation important? |
The job Click to expand the log.
I'm a bot! I can only do what humans tell me to, so if this was not helpful or you have suggestions for improvements, please ping or otherwise contact |
FWIW using slice::sort_by_key where key is bool, we can split a slice into the part where a predicate holds and the other part it doesn't hold, though sort_by_key is not ideal API for this purpose. |
If you're concerned about performance when partitioning a list, it seems like the first step would be to not use a comparison based sort at all in favor of a single pass shuffle. |
I am using bool's implementation of Ord at several places in my code when matching some_new_boolean_state.cmp(&some_old_boolean_state); but also I'm sure this comparison must take place many times when (as mentioned above) sorting, or when modifying a BTreeMap or BinaryHeap. That said if this doesn't seem worth the hassle, I'll just use a custom bool_cmp in all relevant places in my code, and that'll work for me. |
src/libcore/cmp.rs
Outdated
-1 => Less, | ||
0 => Equal, | ||
1 => Greater, | ||
// SAFETY: Unreachable code |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't seem like it's really doing anything. The unsafe code is calling unreachable_unchecked
- of course it's unreachable code!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had to document the unsafe block using the safety comment for the PR checks to pass. I can add a better comment but that's the reason for that comment. Specifically the tidy check fails with an undocumented unsafe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're going to require documentation for unsafe blocks, it may as well be useful documentation. e.g. "SAFETY: bool as i8 returns 0 or 1, so the subtraction can't end up with anything else."
c033780
to
1f07aa5
Compare
@bors r+ |
📌 Commit 1f07aa5 has been approved by |
…-ord-optimization, r=sfackler Optimize Ord trait implementation for bool Casting the booleans to `i8`s and converting their difference into `Ordering` generates better assembly than casting them to `u8`s and comparing them. Fixes rust-lang#66780 #### Comparison([Godbolt link](https://rust.godbolt.org/z/PjBpvF)) ##### Old assembly: ```asm example::boolean_cmp: mov ecx, edi xor ecx, esi test esi, esi mov eax, 255 cmove eax, ecx test edi, edi cmovne eax, ecx ret ``` ##### New assembly: ```asm example::boolean_cmp: mov eax, edi sub al, sil ret ``` ##### Old LLVM-MCA statistics: ``` Iterations: 100 Instructions: 800 Total Cycles: 234 Total uOps: 1000 Dispatch Width: 6 uOps Per Cycle: 4.27 IPC: 3.42 Block RThroughput: 1.7 ``` ##### New LLVM-MCA statistics: ``` Iterations: 100 Instructions: 300 Total Cycles: 110 Total uOps: 500 Dispatch Width: 6 uOps Per Cycle: 4.55 IPC: 2.73 Block RThroughput: 1.0 ```
Rollup of 6 pull requests Successful merges: - #66881 (Optimize Ord trait implementation for bool) - #67015 (Fix constant propagation for scalar pairs) - #67074 (Add options to --extern flag.) - #67164 (Ensure that panicking in constants eventually errors) - #67174 (Remove `checked_add` in `Layout::repeat`) - #67205 (Make `publish_toolstate.sh` executable) Failed merges: r? @ghost
Casting the booleans to
i8
s and converting their difference intoOrdering
generates better assembly than casting them tou8
s and comparing them.Fixes #66780
Comparison(Godbolt link)
Old assembly:
New assembly:
Old LLVM-MCA statistics:
New LLVM-MCA statistics: