-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Make checked
ops emit *unchecked* LLVM operations where feasible
#124114
Conversation
rustbot has assigned @workingjubilee. Use |
// Note that `shl` and `shr` in LLVM are already unchecked. So rather than | ||
// looking for no-wrap flags, we just need there to not be any masking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can see the masking in optimized LLVM IR and optimized MIR today: https://rust.godbolt.org/z/x5Podh9z5
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the same assembly emitted, tho'.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, on x86 the shift instructions are wrapping. To see a difference you'd need to make an example where LLVM can optimize based on knowing that the shift is in-range.
(Just like how sub
and sub nuw
have the same assembly too.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good point! I tried an architecture that actually has a slightly more annoying shift instruction in this regard, PowerPC! (specifically --target=powerpc64le-unknown-linux-gnu
)
old_checked_sub:
clrldi 5, 4, 32
clrlwi 4, 4, 27
addi 5, 5, -32
srw 4, 3, 4
rldicl 5, 5, 1, 63
mr 3, 5
blr
.long 0
.quad 0
new_checked_sub:
clrldi 5, 4, 32
slw 4, 3, 4
addi 5, 5, -32
rldicl 5, 5, 1, 63
mr 3, 5
blr
.long 0
.quad 0
One less instruction!
For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them. For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither ```rust a.checked_sub(b).unwrap() ``` nor ```rust a.checked_sub(b).unwrap_unchecked() ``` actually optimizes to `sub nuw`. After this PR they do.
b5ed099
to
986d9f1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
at first I thought "surely this doesn't need perf, how much could the compiler possibly stress over that?" but then I saw all the const eval machinery for implementing shl/shr does in fact flow through checked_{shl,shr}
, thus gets used to implement it, thus gets squeezed, aiui, every time we compile a const in the shape of
const NAME: i32 = 1 << N;
right?
tests/mir-opt/pre-codegen/checked_ops.checked_shl.PreCodegen.after.panic-unwind.mir
Show resolved
Hide resolved
@@ -1,5 +1,5 @@ | |||
// skip-filecheck | |||
//@ compile-flags: -O -Zmir-opt-level=2 -Cdebuginfo=2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why this change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, first I was going to say for consistency with the other pre-codegen tests, but it looks like they're not as consistent as I thought. The mem-replace and range-iter tests don't ask for debuginfo, but then the slice-filter and chained-comparison ones do :/
I guess it's because I think it's more common for -O
to be without debuginfo, and in general I don't think that the pre-codegen
tests are about looking at debuginfo. Certainly in a normal --debug
build the MIR inliner doesn't run so it'd look nothing like this.
I could put it back as it was if you'd like. It just adds 4 more debug
lines; nothing interesting.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't mind it, either way, tbh, I just wanted to know the reasoning so that I know it wasn't Perfectly Random.
// Note that `shl` and `shr` in LLVM are already unchecked. So rather than | ||
// looking for no-wrap flags, we just need there to not be any masking. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks like the same assembly emitted, tho'.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
would mostly expect the main difference to be from reduced LLVMIR if anything. |
Make `checked` ops emit *unchecked* LLVM operations where feasible For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them. For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither ```rust a.checked_sub(b).unwrap() ``` nor ```rust a.checked_sub(b).unwrap_unchecked() ``` actually optimizes to `sub nuw`. After this PR they do. cc rust-lang#103299
Ah, this PR doesn't change |
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
@DianQK Yeah, this is only shifts (which don't have a Addition, Signed Subtraction, and Multiplication I didn't touch here because those overflow conditions are non-trivial and thus using the intrinsic seems like still the correct choice. |
@scottmcm FWIW, for unsigned add overflow the recommended pattern is x + y < x, rather than uadd.with.overflow. We should probably give replacing that a try as well. |
Finished benchmarking commit (928c684): comparison URL. Overall result: ❌ regressions - no action neededBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 673.499s -> 673.605s (0.02%) |
Hmm, I got all excited by those results -- even -20% cycles on a runtime benchmark! -- but I think they're just noise from recovering from a huge spike in the previous runs. Nothing concerning from it, though, so I think this is good to go if people are happy with the code changes. |
Yeah, seems good to me! @bors r+ |
☀️ Test successful - checks-actions |
Finished benchmarking commit (9c7b1f4): comparison URL. Overall result: no relevant changes - no action needed@rustbot label: -perf-regression Instruction countThis benchmark run did not return any relevant results for this metric. Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Bootstrap: 670.134s -> 670.156s (0.00%) |
Ah, yup, those perf results make more sense. |
…strieb Docs: suggest `uN::checked_sub` instead of check-then-unchecked As of rust-lang#124114 it's exactly the same in codegen, so might as well not use `unsafe`. Note that this is only for *unsigned*, since the overflow conditions for `iN::checked_sub` are more complicated.
Rollup merge of rust-lang#124701 - scottmcm:unchecked_sub_docs, r=Nilstrieb Docs: suggest `uN::checked_sub` instead of check-then-unchecked As of rust-lang#124114 it's exactly the same in codegen, so might as well not use `unsafe`. Note that this is only for *unsigned*, since the overflow conditions for `iN::checked_sub` are more complicated.
Invert comparison in `uN::checked_sub` After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc). This is due to the use of `>=` here: https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593 For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742 This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299. When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary. [^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.
Invert comparison in `uN::checked_sub` After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc). This is due to the use of `>=` here: https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593 For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742 This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299. When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary. [^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.
Rollup merge of rust-lang#125038 - ivan-shrimp:checked_sub, r=joboet Invert comparison in `uN::checked_sub` After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc). This is due to the use of `>=` here: https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593 For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742 This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299. When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary. [^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.
…=<try> Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
…=<try> Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
…=Amanieu Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
…=Amanieu Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself
For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write the checked methods in such a way that we stop emitting wrapping versions of them.
For example, today https://rust.godbolt.org/z/qM9YK8Txb neither
nor
actually optimizes to
sub nuw
. After this PR they do.cc #103299