Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tune the inlinability of unwrap #119878

Merged
merged 1 commit into from
Jan 15, 2024
Merged

Conversation

scottmcm
Copy link
Member

@scottmcm scottmcm commented Jan 12, 2024

Fixes #115463
cc @thomcc

This tweaks unwrap on Option & Result to be two parts:

  • #[inline(always)] for checking the discriminant
  • #[cold] for actually panicking

The idea here is that checking the discriminant on a Result or Option should always be trivial enough to be worth inlining, even in opt-level=z, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like .try_into().unwrap()s that are actually infallible, but in a way that's only visible with the inlining.

EDIT: I've restricted this to Result to avoid combining effects

@rustbot
Copy link
Collaborator

rustbot commented Jan 12, 2024

r? @m-ou-se

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Jan 12, 2024
@scottmcm
Copy link
Member Author

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 12, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 12, 2024
Tune the inlinability of `unwrap`

Fixes rust-lang#115463
cc `@thomcc`

This tweaks `unwrap` on `Option` & `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking

The idea here is that checking the discriminant on a `Result` or `Option` should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.
@bors
Copy link
Contributor

bors commented Jan 12, 2024

⌛ Trying commit b996061 with merge 69b1ad3...

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Jan 12, 2024

☀️ Try build successful - checks-actions
Build commit: 69b1ad3 (69b1ad359005f2b807d029a0ddc00476ee89ad07)

@rust-timer

This comment has been minimized.

@NCGThompson
Copy link
Contributor

I wonder if we can generalize this optimization with static references to PanicInfos.

@lqd
Copy link
Member

lqd commented Jan 12, 2024

diesel-1.4.8 has taken 48 minutes to compile so far, seems high

(Note: The collector is not currently benchmarking this PR -- as it is 3rd in the queue right now -- and diesel took the expected amount of time there.)

image

@rust-timer

This comment was marked as outdated.

@rustbot rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 12, 2024
@Kobzol
Copy link
Contributor

Kobzol commented Jan 12, 2024

This benchmark run was quite weird, it didn't run bootstrap and didn't measure all the benchmarks. I'll investigate.

@Kobzol

This comment was marked as outdated.

@Kobzol
Copy link
Contributor

Kobzol commented Jan 12, 2024

@bors try @rust-timer queue

@rust-timer

This comment has been minimized.

@rustbot rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Jan 12, 2024
@bors
Copy link
Contributor

bors commented Jan 12, 2024

⌛ Trying commit b996061 with merge b03798d...

bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 12, 2024
Tune the inlinability of `unwrap`

Fixes rust-lang#115463
cc `@thomcc`

This tweaks `unwrap` on `Option` & `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking

The idea here is that checking the discriminant on a `Result` or `Option` should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.
@bors
Copy link
Contributor

bors commented Jan 12, 2024

☀️ Try build successful - checks-actions
Build commit: b03798d (b03798d9e197877b0318ced1bcb1d11ffb82a32a)

@rust-timer

This comment has been minimized.

@rust-timer
Copy link
Collaborator

Finished benchmarking commit (b03798d): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.8% [0.8%, 0.8%] 2
Regressions ❌
(secondary)
1.2% [1.2%, 1.2%] 1
Improvements ✅
(primary)
-0.6% [-0.6%, -0.6%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.3% [-0.6%, 0.8%] 3

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
2.3% [1.7%, 3.0%] 3
Regressions ❌
(secondary)
2.8% [2.6%, 3.0%] 2
Improvements ✅
(primary)
-2.9% [-4.4%, -0.2%] 4
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.7% [-4.4%, 3.0%] 7

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.8% [0.4%, 1.1%] 2
Regressions ❌
(secondary)
2.0% [2.0%, 2.0%] 1
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
-2.5% [-2.5%, -2.5%] 1
All ❌✅ (primary) 0.8% [0.4%, 1.1%] 2

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 665.086s -> 667.401s (0.35%)
Artifact size: 308.40 MiB -> 308.31 MiB (-0.03%)

@rustbot rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Jan 12, 2024
@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 14, 2024
bors added a commit to rust-lang-ci/rust that referenced this pull request Jan 14, 2024
…kingjubilee

Tune the inlinability of `unwrap`

Fixes rust-lang#115463
cc `@thomcc`

This tweaks `unwrap` on ~~`Option` &~~ `Result` to be two parts:
- `#[inline(always)]` for checking the discriminant
- `#[cold]` for actually panicking

The idea here is that checking the discriminant on a `Result` ~~or `Option`~~ should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function.

As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.

EDIT: I've restricted this to `Result` to avoid combining effects
@bors
Copy link
Contributor

bors commented Jan 14, 2024

⌛ Testing commit b858c59 with merge 6b0a633...

@rust-log-analyzer
Copy link
Collaborator

The job aarch64-gnu failed! Check out the build log: (web) (plain)

Click to see the possible cause of the failure (guessed by this bot)
    Checking rustc-hash v1.1.0
[RUSTC-TIMING] build_script_build test:false 0.265
[RUSTC-TIMING] rustc_hash test:false 0.077
    Checking log v0.4.19
##[error]The runner has received a shutdown signal. This can happen when the runner service is stopped, or a manually started runner is canceled.
    Checking either v1.8.1
[RUSTC-TIMING] memchr test:false 0.873
    Checking ryu v1.0.13
[RUSTC-TIMING] linux_raw_sys test:false 1.303
[RUSTC-TIMING] linux_raw_sys test:false 1.303
##[group]Clock drift check

Session terminated, killing shell... ...killed.
    Checking tracing-log v0.1.3
  local time: 
##[error]The operation was canceled.
Cleaning up orphan processes

@bors
Copy link
Contributor

bors commented Jan 15, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 15, 2024
@scottmcm
Copy link
Member Author

auto - aarch64-gnu "received a shutdown signal", but no other jobs show as having an issue

@bors retry

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Jan 15, 2024
@scottmcm
Copy link
Member Author

This is at the top of the queue, but it's not running. Let's try to poke things...

@bors r-

@bors bors added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Jan 15, 2024
@scottmcm
Copy link
Member Author

@bors r=workingjubilee

(Unchanged since #119878 (comment); just trying to unstick bors.)

@bors
Copy link
Contributor

bors commented Jan 15, 2024

📌 Commit b858c59 has been approved by workingjubilee

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. labels Jan 15, 2024
@bors
Copy link
Contributor

bors commented Jan 15, 2024

⌛ Testing commit b858c59 with merge 1ead476...

@bors
Copy link
Contributor

bors commented Jan 15, 2024

☀️ Test successful - checks-actions
Approved by: workingjubilee
Pushing 1ead476 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Jan 15, 2024
@bors bors merged commit 1ead476 into rust-lang:master Jan 15, 2024
12 checks passed
@rustbot rustbot added this to the 1.77.0 milestone Jan 15, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (1ead476): comparison URL.

Overall result: ❌✅ regressions and improvements - no action needed

@rustbot label: -perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
0.9% [0.9%, 0.9%] 1
Improvements ✅
(primary)
-0.9% [-0.9%, -0.9%] 1
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) -0.9% [-0.9%, -0.9%] 1

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.0% [3.0%, 3.0%] 1
Regressions ❌
(secondary)
3.0% [3.0%, 3.0%] 1
Improvements ✅
(primary)
-1.0% [-2.0%, -0.1%] 2
Improvements ✅
(secondary)
-4.4% [-4.4%, -4.4%] 1
All ❌✅ (primary) 0.3% [-2.0%, 3.0%] 3

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
- - 0
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-1.8% [-1.8%, -1.8%] 1
Improvements ✅
(secondary)
-1.9% [-2.2%, -1.6%] 2
All ❌✅ (primary) -1.8% [-1.8%, -1.8%] 1

Binary size

This benchmark run did not return any relevant results for this metric.

Bootstrap: 669.714s -> 668.966s (-0.11%)
Artifact size: 308.18 MiB -> 308.17 MiB (-0.00%)

@saethlin
Copy link
Member

After I'd written this up I realized that the godbolt link is going to rot within 24 hours or so. I've pasted the relevant IR below.

In #120863 I added inline(always) to the slice indexing machinery because adding delayed debug assertions to slice indexing broke the codegen test that was added in this PR.

I am now trying to re-evaluate that decision, but while looking more into that choice, I've concluded that this codegen test is deceitful. I can break the codegen test (on 2024-02-19) by adding more code to the crate but not touching the function which is being tested: https://godbolt.org/z/Pr34Mno46 (the break is that read_up_to_8 now contains a call to unwrap_failed).

I think the problem here is that IPSCCP (interprocedural sparse conditional constant propagation) has a strong synergistic effect with small crates and #[inline] functions... which is a good description of most of our codegen tests. In this case, it's able to rewrite all the slice indexing machinery to propagate the constant length 4 through it. And that enables the unwrap check to be optimized out, but the key is that he unwrap check was not optimized out by inlining away the slice indexing machinery, but by rewriting it. Adding too much additional use of slice indexing in the same crate causes the IPSCCP optimization to not fire, and the codegen becomes awful again. (I do not have any idea how many of our codegen tests are afflicted by this sort of problem)

This surprising behavior of IPSCCP in small examples has been observed elsewhere: #119923


Current IR for read_up_to_8:

define noundef i64 @read_up_to_8(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !39 {
start:
  %_2 = icmp ult i64 %buf.1, 4, !dbg !42
  br i1 %_2, label %bb12, label %bb2, !dbg !42

bb2:                                              ; preds = %start
  %0 = tail call fastcc ptr @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h49ef79cc6f68a95eE"(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_f34be9391f743b06d38618d13cbcb273), !dbg !43
  %1 = load i32, ptr %0, align 1, !dbg !44, !alias.scope !70
  %lo = zext i32 %1 to i64, !dbg !75
  %_16 = add i64 %buf.1, -4, !dbg !76
  %data.i.i = getelementptr inbounds i8, ptr %buf.0, i64 %_16, !dbg !78
  %2 = tail call fastcc ptr @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h49ef79cc6f68a95eE"(ptr noalias noundef nonnull readonly align 1 %data.i.i, i64 noundef 4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_d2f049aead5c842897e2e27e600065c4), !dbg !102
  %3 = load i32, ptr %2, align 1, !dbg !103, !alias.scope !109
  %hi = zext i32 %3 to i64, !dbg !114
  %_18 = shl i64 %_16, 3, !dbg !115
  %4 = and i64 %_18, 56, !dbg !117
  %_17 = shl i64 %hi, %4, !dbg !117
  %5 = or i64 %_17, %lo, !dbg !118
  br label %bb12, !dbg !119

bb12:                                             ; preds = %start, %bb2
  %_0.0 = phi i64 [ %5, %bb2 ], [ 0, %start ], !dbg !120
  ret i64 %_0.0, !dbg !119
}

But when I add read_up_to_4:

define noundef i64 @read_up_to_8(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !83 {
start:
  %e.i4 = alloca %"core::array::TryFromSliceError", align 1
  %_2 = icmp ult i64 %buf.1, 4, !dbg !86
  br i1 %_2, label %bb12, label %bb2, !dbg !86

bb2:
  %0 = tail call fastcc { ptr, i64 } @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h49ef79cc6f68a95eE"(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1, i64 noundef 4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_f34be9391f743b06d38618d13cbcb273), !dbg !87
  %_8.1 = extractvalue { ptr, i64 } %0, 1, !dbg !87
  %_3.i.i.not = icmp eq i64 %_8.1, 4, !dbg !88
  br i1 %_3.i.i.not, label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit10", label %bb2.i7, !dbg !88

bb2.i7:
  call void @llvm.lifetime.start.p0(i64 0, ptr nonnull %e.i4), !dbg !101
  call void @core::result::unwrap_failed(ptr noalias noundef nonnull readonly align 1 @alloc_00ae4b301f7fab8ac9617c03fcbd7274, i64 noundef 43, ptr noundef nonnull align 1 %e.i4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @vtable.0, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_370cde08944f8943831ca8b1deca1e93) #5, !dbg !107
  unreachable

"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit10":
  %_8.0 = extractvalue { ptr, i64 } %0, 0, !dbg !87
  %1 = load i32, ptr %_8.0, align 1, !dbg !109
  %_16 = add i64 %buf.1, -4, !dbg !127
  %2 = tail call fastcc { ptr, i64 } @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h80f63821a23bc3bfE"(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1, i64 noundef %_16, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_e4adcb4d8250fddd1b37b00bce08fe63), !dbg !129
  %_14.0 = extractvalue { ptr, i64 } %2, 0, !dbg !129
  %_14.1 = extractvalue { ptr, i64 } %2, 1, !dbg !129
  %3 = tail call fastcc { ptr, i64 } @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h49ef79cc6f68a95eE"(ptr noalias noundef nonnull readonly align 1 %_14.0, i64 noundef %_14.1, i64 noundef 4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_d2f049aead5c842897e2e27e600065c4), !dbg !130
  %_13.1 = extractvalue { ptr, i64 } %3, 1, !dbg !130
  %_3.i.i11.not = icmp eq i64 %_13.1, 4, !dbg !131
  br i1 %_3.i.i11.not, label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit", label %bb2.i, !dbg !131

bb2.i:
  call void @llvm.lifetime.start.p0(i64 0, ptr nonnull %e.i4), !dbg !135
  call void @core::result::unwrap_failed(ptr noalias noundef nonnull readonly align 1 @alloc_00ae4b301f7fab8ac9617c03fcbd7274, i64 noundef 43, ptr noundef nonnull align 1 %e.i4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @vtable.0, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_ac9535fe735cb77bb32d5ff848a81403) #5, !dbg !137
  unreachable

"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit":
  %4 = zext i32 %1 to i64, !dbg !138
  %_13.0 = extractvalue { ptr, i64 } %3, 0, !dbg !130
  %5 = load i32, ptr %_13.0, align 1, !dbg !139
  %6 = zext i32 %5 to i64, !dbg !148
  %_18 = shl i64 %_16, 3, !dbg !149
  %7 = and i64 %_18, 56, !dbg !151
  %_17 = shl i64 %6, %7, !dbg !151
  %8 = or i64 %_17, %4, !dbg !152
  br label %bb12, !dbg !153

bb12:
  %_0.0 = phi i64 [ %8, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit" ], [ 0, %start ], !dbg !154
  ret i64 %_0.0, !dbg !153
}

And with the incoming nightly which has put #[inline(always)] all over, but with some odd compile-time regressions:

define noundef i64 @read_up_to_8(ptr noalias nocapture noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1) unnamed_addr #1 personality ptr @rust_eh_personality {
start:
  %_2 = icmp ult i64 %buf.1, 4
  br i1 %_2, label %bb12, label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h6c548023a09f6572E.exit"

"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h6c548023a09f6572E.exit": ; preds = %start
  %_16 = add i64 %buf.1, -4
  %0 = load i32, ptr %buf.0, align 1, !alias.scope !3
  %lo = zext i32 %0 to i64 
  %data.i.i = getelementptr inbounds i8, ptr %buf.0, i64 %_16
  %1 = load i32, ptr %data.i.i, align 1, !alias.scope !8
  %hi = zext i32 %1 to i64 
  %_18 = shl i64 %_16, 3
  %2 = and i64 %_18, 56
  %_17 = shl i64 %hi, %2
  %3 = or i64 %_17, %lo 
  br label %bb12

bb12:                                             ; preds = %start, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h6c548023a09f6572E.exit"
  %_0.0 = phi i64 [ %3, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h6c548023a09f6572E.exit" ], [ 0, %start ]
  ret i64 %_0.0
}

bors added a commit to rust-lang-ci/rust that referenced this pull request Feb 20, 2024
Tweak inlining attributes for slice indexing

Doing some experiments in response to this unexpected regression: rust-lang#120863 (comment)

I expect the opt changes to be addressed by something like reviving rust-lang#91222. The debug changes are what I'm interested in.

Codegen tests will probably fail from time to time in this PR, I will fix them up later but also I don't trust the opt-level-z one: rust-lang#119878 (comment)

r? `@ghost`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs Relevant to the library team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

opt-level=z fails to remove bounds checks/panics even if every other opt-level succeeds