Make `checked` ops emit unchecked LLVM operations where feasible #124114

scottmcm · 2024-04-18T07:16:59Z

For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write the checked methods in such a way that we stop emitting wrapping versions of them.

For example, today https://rust.godbolt.org/z/qM9YK8Txb neither

a.checked_sub(b).unwrap()

nor

a.checked_sub(b).unwrap_unchecked()

actually optimizes to sub nuw. After this PR they do.

cc #103299

rustbot · 2024-04-18T07:17:07Z

r? @workingjubilee

rustbot has assigned @workingjubilee.
They will have a look at your PR within the next two weeks and either review your PR or reassign to another reviewer.

Use r? to explicitly pick a reviewer

scottmcm · 2024-04-18T07:18:21Z

tests/codegen/checked_math.rs

+// Note that `shl` and `shr` in LLVM are already unchecked. So rather than
+// looking for no-wrap flags, we just need there to not be any masking.


You can see the masking in optimized LLVM IR and optimized MIR today: https://rust.godbolt.org/z/x5Podh9z5

looks like the same assembly emitted, tho'.

Well, on x86 the shift instructions are wrapping. To see a difference you'd need to make an example where LLVM can optimize based on knowing that the shift is in-range.

(Just like how sub and sub nuw have the same assembly too.)

That's a good point! I tried an architecture that actually has a slightly more annoying shift instruction in this regard, PowerPC! (specifically --target=powerpc64le-unknown-linux-gnu)

old_checked_sub: clrldi 5, 4, 32 clrlwi 4, 4, 27 addi 5, 5, -32 srw 4, 3, 4 rldicl 5, 5, 1, 63 mr 3, 5 blr .long 0 .quad 0 new_checked_sub: clrldi 5, 4, 32 slw 4, 3, 4 addi 5, 5, -32 rldicl 5, 5, 1, 63 mr 3, 5 blr .long 0 .quad 0

One less instruction!

For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them. For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither ```rust a.checked_sub(b).unwrap() ``` nor ```rust a.checked_sub(b).unwrap_unchecked() ``` actually optimizes to `sub nuw`. After this PR they do.

workingjubilee

at first I thought "surely this doesn't need perf, how much could the compiler possibly stress over that?" but then I saw all the const eval machinery for implementing shl/shr does in fact flow through checked_{shl,shr}, thus gets used to implement it, thus gets squeezed, aiui, every time we compile a const in the shape of

const NAME: i32 = 1 << N;

right?

library/core/src/num/int_macros.rs

tests/mir-opt/pre-codegen/checked_ops.checked_shl.PreCodegen.after.panic-unwind.mir

workingjubilee · 2024-04-19T06:46:07Z

tests/mir-opt/pre-codegen/checked_ops.rs

@@ -1,5 +1,5 @@
 // skip-filecheck
-//@ compile-flags: -O -Zmir-opt-level=2 -Cdebuginfo=2


why this change?

Well, first I was going to say for consistency with the other pre-codegen tests, but it looks like they're not as consistent as I thought. The mem-replace and range-iter tests don't ask for debuginfo, but then the slice-filter and chained-comparison ones do :/

I guess it's because I think it's more common for -O to be without debuginfo, and in general I don't think that the pre-codegen tests are about looking at debuginfo. Certainly in a normal --debug build the MIR inliner doesn't run so it'd look nothing like this.

I could put it back as it was if you'd like. It just adds 4 more debug lines; nothing interesting.

I don't mind it, either way, tbh, I just wanted to know the reasoning so that I know it wasn't Perfectly Random.

workingjubilee · 2024-04-19T06:47:18Z

tests/codegen/checked_math.rs

+// Note that `shl` and `shr` in LLVM are already unchecked. So rather than
+// looking for no-wrap flags, we just need there to not be any masking.


looks like the same assembly emitted, tho'.

workingjubilee · 2024-04-19T06:55:20Z

@bors try @rust-timer queue

workingjubilee · 2024-04-19T06:55:54Z

would mostly expect the main difference to be from reduced LLVMIR if anything.

Make `checked` ops emit *unchecked* LLVM operations where feasible For things with easily pre-checked overflow conditions -- shifts and unsigned subtraction -- write then checked methods in such a way that we stop emitting wrapping versions of them. For example, today <https://rust.godbolt.org/z/qM9YK8Txb> neither ```rust a.checked_sub(b).unwrap() ``` nor ```rust a.checked_sub(b).unwrap_unchecked() ``` actually optimizes to `sub nuw`. After this PR they do. cc rust-lang#103299

bors · 2024-04-19T06:56:30Z

⌛ Trying commit 986d9f1 with merge 928c684...

DianQK · 2024-04-19T07:05:14Z

~~Could this solve rust-lang/hashbrown#509? Looks fine, I'll verify it later.~~

Ah, this PR doesn't change checked_add.

bors · 2024-04-19T08:29:08Z

☀️ Try build successful - checks-actions
Build commit: 928c684 (928c6843828d26201cb676a7cc0ab4cf9b6d518f)

scottmcm · 2024-04-19T08:39:02Z

@DianQK Yeah, this is only shifts (which don't have a with.overflow intrinsic in LLVM) and checked unsigned subtraction (where nikic said in #103299 that we shouldn't use usub.with.overflow even though it exists).

Addition, Signed Subtraction, and Multiplication I didn't touch here because those overflow conditions are non-trivial and thus using the intrinsic seems like still the correct choice.

nikic · 2024-04-19T09:19:04Z

@scottmcm FWIW, for unsigned add overflow the recommended pattern is x + y < x, rather than uadd.with.overflow. We should probably give replacing that a try as well.

rust-timer · 2024-04-19T10:04:09Z

Finished benchmarking commit (928c684): comparison URL.

Overall result: ❌ regressions - no action needed

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

@bors rollup=never
@rustbot label: -S-waiting-on-perf -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	1.6%	[1.6%, 1.6%]	1
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-	-	0

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	6.6%	[2.0%, 11.1%]	2
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.7%	[-3.3%, -0.0%]	2
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.4%	[-3.3%, 11.1%]	4

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.0%	[-4.0%, -2.4%]	7
Improvements ✅ (secondary)	-2.8%	[-3.6%, -2.2%]	12
All ❌✅ (primary)	-3.0%	[-4.0%, -2.4%]	7

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.1%, 0.4%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.0%	[-0.1%, -0.0%]	13
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.0%	[-0.1%, 0.4%]	17

Bootstrap: 673.499s -> 673.605s (0.02%)
Artifact size: 315.24 MiB -> 315.24 MiB (0.00%)

scottmcm · 2024-04-19T16:57:40Z

Hmm, I got all excited by those results -- even -20% cycles on a runtime benchmark! -- but I think they're just noise from recovering from a huge spike in the previous runs.

Nothing concerning from it, though, so I think this is good to go if people are happy with the code changes.

workingjubilee · 2024-04-20T03:26:44Z

Yeah, seems good to me!

@bors r+

bors · 2024-04-20T03:26:46Z

📌 Commit 986d9f1 has been approved by workingjubilee

It is now in the queue for this repository.

bors · 2024-04-20T04:11:20Z

⌛ Testing commit 986d9f1 with merge 9c7b1f4...

bors · 2024-04-20T06:36:43Z

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 9c7b1f4 to master...

rust-timer · 2024-04-20T08:19:25Z

Finished benchmarking commit (9c7b1f4): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	10.1%	[10.1%, 10.1%]	1
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-3.0%	[-3.0%, -3.0%]	1
Improvements ✅ (secondary)	-2.1%	[-2.1%, -2.1%]	1
All ❌✅ (primary)	3.5%	[-3.0%, 10.1%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	-	-	0
Regressions ❌ (secondary)	2.5%	[2.5%, 2.5%]	1
Improvements ✅ (primary)	-1.4%	[-1.4%, -1.4%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	-1.4%	[-1.4%, -1.4%]	1

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.2%	[0.1%, 0.4%]	4
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.0%]	4
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.1%, 0.4%]	8

Bootstrap: 670.134s -> 670.156s (0.00%)
Artifact size: 315.30 MiB -> 315.30 MiB (0.00%)

scottmcm · 2024-04-20T14:19:34Z

Ah, yup, those perf results make more sense.

…strieb Docs: suggest `uN::checked_sub` instead of check-then-unchecked As of rust-lang#124114 it's exactly the same in codegen, so might as well not use `unsafe`. Note that this is only for *unsigned*, since the overflow conditions for `iN::checked_sub` are more complicated.

Rollup merge of rust-lang#124701 - scottmcm:unchecked_sub_docs, r=Nilstrieb Docs: suggest `uN::checked_sub` instead of check-then-unchecked As of rust-lang#124114 it's exactly the same in codegen, so might as well not use `unsafe`. Note that this is only for *unsigned*, since the overflow conditions for `iN::checked_sub` are more complicated.

Invert comparison in `uN::checked_sub` After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc). This is due to the use of `>=` here: https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593 For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742 This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299. When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary. [^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.

Rollup merge of rust-lang#125038 - ivan-shrimp:checked_sub, r=joboet Invert comparison in `uN::checked_sub` After rust-lang#124114, LLVM no longer combines the comparison and subtraction in `uN::checked_sub` when either operand is a constant (demo: https://rust.godbolt.org/z/MaeoYbsP1). The difference is more pronounced when the expression is slightly more complex (https://rust.godbolt.org/z/4rPavsYdc). This is due to the use of `>=` here: https://github.com/rust-lang/rust/blob/ee97564e3a9f9ac8c65103abb37c6aa48d95bfa2/library/core/src/num/uint_macros.rs#L581-L593 For constant `C`, LLVM eagerly converts `a >= C` into `a > C - 1`, but the backend can only combine `a < C` with `a - C`, not `C - 1 < a` and `a - C`: https://github.com/llvm/llvm-project/blob/e586556e375fc5c4f7e76b5c299cb981f2016108/llvm/lib/CodeGen/CodeGenPrepare.cpp#L1697-L1742 This PR[^1] simply inverts the `>=` into `<` to restore the LLVM magic, and somewhat align this with the implementation of `uN::overflowing_sub` from rust-lang#103299. When the result is stored as an `Option` (rather than being branched/cmoved on), the discriminant is `self >= rhs`. This PR doesn't affect the codegen (and relevant tests) of that since LLVM will negate `self < rhs` to `self >= rhs` when necessary. [^1]: Note to `self`: My very first contribution to publicly-used code. Hopefully like what I should learn to always be, tiny and humble.

…=<try> Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself

…=Amanieu Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself

Also get `add nuw` from `uN::checked_add` When I was doing this for `checked_{sub,shl,shr}`, it was mentioned rust-lang/rust#124114 (comment) that it'd be worth trying for `checked_add` too. It makes a particularly-big difference for `x.checked_add(C)`, as doing this means that LLVM removes the intrinsic and does it as a normal `x <= MAX - C` instead. cc `@DianQK` who had commented about `checked_add` related to rust-lang/hashbrown#509 before cc llvm/llvm-project#80637 for how LLVM is unlikely to do this itself

rustbot assigned workingjubilee Apr 18, 2024

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Apr 18, 2024

scottmcm commented Apr 18, 2024

View reviewed changes

scottmcm force-pushed the better-checked branch from b5ed099 to 986d9f1 Compare April 19, 2024 01:12

workingjubilee reviewed Apr 19, 2024

View reviewed changes

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 19, 2024

This comment has been minimized.

Sign in to view

scottmcm mentioned this pull request Apr 19, 2024

LLVM failed to use the knowledge from a never-overflow assumption rust-lang/hashbrown#509

Open

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Apr 19, 2024

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 20, 2024

bors added the merged-by-bors This PR was explicitly merged by bors. label Apr 20, 2024

bors merged commit 9c7b1f4 into rust-lang:master Apr 20, 2024
13 checks passed

rustbot added this to the 1.79.0 milestone Apr 20, 2024

scottmcm deleted the better-checked branch April 20, 2024 14:19

scottmcm mentioned this pull request May 4, 2024

Docs: suggest uN::checked_sub instead of check-then-unchecked #124701

Merged

ivan-shrimp mentioned this pull request May 12, 2024

Invert comparison in uN::checked_sub #125038

Merged

scottmcm mentioned this pull request Jun 23, 2024

Also get add nuw from uN::checked_add #126852

Merged

scottmcm mentioned this pull request Aug 21, 2024

Add implementations for unbounded_shl/unbounded_shr #129377

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Make `checked` ops emit unchecked LLVM operations where feasible #124114

Make `checked` ops emit unchecked LLVM operations where feasible #124114

scottmcm commented Apr 18, 2024 •

edited

Loading

rustbot commented Apr 18, 2024

scottmcm Apr 18, 2024

workingjubilee Apr 19, 2024

scottmcm Apr 19, 2024 •

edited

Loading

workingjubilee Apr 20, 2024 •

edited

Loading

workingjubilee left a comment

workingjubilee Apr 19, 2024

scottmcm Apr 19, 2024

workingjubilee Apr 20, 2024

workingjubilee Apr 19, 2024

workingjubilee commented Apr 19, 2024

This comment has been minimized.

workingjubilee commented Apr 19, 2024

bors commented Apr 19, 2024

DianQK commented Apr 19, 2024 •

edited

Loading

bors commented Apr 19, 2024

This comment has been minimized.

scottmcm commented Apr 19, 2024

nikic commented Apr 19, 2024

rust-timer commented Apr 19, 2024

scottmcm commented Apr 19, 2024

workingjubilee commented Apr 20, 2024

bors commented Apr 20, 2024

bors commented Apr 20, 2024

bors commented Apr 20, 2024

rust-timer commented Apr 20, 2024

scottmcm commented Apr 20, 2024

		// Note that `shl` and `shr` in LLVM are already unchecked. So rather than
		// looking for no-wrap flags, we just need there to not be any masking.

		@@ -1,5 +1,5 @@
		// skip-filecheck
		//@ compile-flags: -O -Zmir-opt-level=2 -Cdebuginfo=2

Make checked ops emit *unchecked* LLVM operations where feasible #124114

Make checked ops emit *unchecked* LLVM operations where feasible #124114

Conversation

scottmcm commented Apr 18, 2024 • edited Loading

rustbot commented Apr 18, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottmcm Apr 19, 2024 • edited Loading

Choose a reason for hiding this comment

workingjubilee Apr 20, 2024 • edited Loading

Choose a reason for hiding this comment

workingjubilee left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

workingjubilee commented Apr 19, 2024

This comment has been minimized.

workingjubilee commented Apr 19, 2024

bors commented Apr 19, 2024

DianQK commented Apr 19, 2024 • edited Loading

bors commented Apr 19, 2024

This comment has been minimized.

scottmcm commented Apr 19, 2024

nikic commented Apr 19, 2024

rust-timer commented Apr 19, 2024

Overall result: ❌ regressions - no action needed

scottmcm commented Apr 19, 2024

workingjubilee commented Apr 20, 2024

bors commented Apr 20, 2024

bors commented Apr 20, 2024

bors commented Apr 20, 2024

rust-timer commented Apr 20, 2024

Overall result: no relevant changes - no action needed

scottmcm commented Apr 20, 2024

Make `checked` ops emit unchecked LLVM operations where feasible #124114

Make `checked` ops emit unchecked LLVM operations where feasible #124114

scottmcm commented Apr 18, 2024 •

edited

Loading

scottmcm Apr 19, 2024 •

edited

Loading

workingjubilee Apr 20, 2024 •

edited

Loading

DianQK commented Apr 19, 2024 •

edited

Loading