-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tune the inlinability of unwrap
#119878
Tune the inlinability of unwrap
#119878
Conversation
r? @m-ou-se (rustbot has picked a reviewer for you, use r? to override) |
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Tune the inlinability of `unwrap` Fixes rust-lang#115463 cc `@thomcc` This tweaks `unwrap` on `Option` & `Result` to be two parts: - `#[inline(always)]` for checking the discriminant - `#[cold]` for actually panicking The idea here is that checking the discriminant on a `Result` or `Option` should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function. As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.
This comment has been minimized.
This comment has been minimized.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
I wonder if we can generalize this optimization with static references to PanicInfos. |
This comment was marked as outdated.
This comment was marked as outdated.
This benchmark run was quite weird, it didn't run bootstrap and didn't measure all the benchmarks. I'll investigate. |
This comment was marked as outdated.
This comment was marked as outdated.
@bors try @rust-timer queue |
This comment has been minimized.
This comment has been minimized.
Tune the inlinability of `unwrap` Fixes rust-lang#115463 cc `@thomcc` This tweaks `unwrap` on `Option` & `Result` to be two parts: - `#[inline(always)]` for checking the discriminant - `#[cold]` for actually panicking The idea here is that checking the discriminant on a `Result` or `Option` should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function. As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining.
☀️ Try build successful - checks-actions |
This comment has been minimized.
This comment has been minimized.
Finished benchmarking commit (b03798d): comparison URL. Overall result: ❌✅ regressions and improvements - ACTION NEEDEDBenchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf. Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @bors rollup=never Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 665.086s -> 667.401s (0.35%) |
b996061
to
b858c59
Compare
…kingjubilee Tune the inlinability of `unwrap` Fixes rust-lang#115463 cc `@thomcc` This tweaks `unwrap` on ~~`Option` &~~ `Result` to be two parts: - `#[inline(always)]` for checking the discriminant - `#[cold]` for actually panicking The idea here is that checking the discriminant on a `Result` ~~or `Option`~~ should always be trivial enough to be worth inlining, even in `opt-level=z`, especially compared to passing it to a function. As seen in the issue and codegen test, this will hopefully help particularly for things like `.try_into().unwrap()`s that are actually infallible, but in a way that's only visible with the inlining. EDIT: I've restricted this to `Result` to avoid combining effects
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
auto - aarch64-gnu "received a shutdown signal", but no other jobs show as having an issue @bors retry |
This is at the top of the queue, but it's not running. Let's try to poke things... @bors r- |
@bors r=workingjubilee (Unchanged since #119878 (comment); just trying to unstick bors.) |
☀️ Test successful - checks-actions |
Finished benchmarking commit (1ead476): comparison URL. Overall result: ❌✅ regressions and improvements - no action needed@rustbot label: -perf-regression Instruction countThis is a highly reliable metric that was used to determine the overall result at the top of this comment.
Max RSS (memory usage)ResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
CyclesResultsThis is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.
Binary sizeThis benchmark run did not return any relevant results for this metric. Bootstrap: 669.714s -> 668.966s (-0.11%) |
After I'd written this up I realized that the godbolt link is going to rot within 24 hours or so. I've pasted the relevant IR below. In #120863 I added I am now trying to re-evaluate that decision, but while looking more into that choice, I've concluded that this codegen test is deceitful. I can break the codegen test (on I think the problem here is that IPSCCP (interprocedural sparse conditional constant propagation) has a strong synergistic effect with small crates and This surprising behavior of IPSCCP in small examples has been observed elsewhere: #119923 Current IR for define noundef i64 @read_up_to_8(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !39 {
start:
%_2 = icmp ult i64 %buf.1, 4, !dbg !42
br i1 %_2, label %bb12, label %bb2, !dbg !42
bb2: ; preds = %start
%0 = tail call fastcc ptr @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h49ef79cc6f68a95eE"(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_f34be9391f743b06d38618d13cbcb273), !dbg !43
%1 = load i32, ptr %0, align 1, !dbg !44, !alias.scope !70
%lo = zext i32 %1 to i64, !dbg !75
%_16 = add i64 %buf.1, -4, !dbg !76
%data.i.i = getelementptr inbounds i8, ptr %buf.0, i64 %_16, !dbg !78
%2 = tail call fastcc ptr @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h49ef79cc6f68a95eE"(ptr noalias noundef nonnull readonly align 1 %data.i.i, i64 noundef 4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_d2f049aead5c842897e2e27e600065c4), !dbg !102
%3 = load i32, ptr %2, align 1, !dbg !103, !alias.scope !109
%hi = zext i32 %3 to i64, !dbg !114
%_18 = shl i64 %_16, 3, !dbg !115
%4 = and i64 %_18, 56, !dbg !117
%_17 = shl i64 %hi, %4, !dbg !117
%5 = or i64 %_17, %lo, !dbg !118
br label %bb12, !dbg !119
bb12: ; preds = %start, %bb2
%_0.0 = phi i64 [ %5, %bb2 ], [ 0, %start ], !dbg !120
ret i64 %_0.0, !dbg !119
} But when I add define noundef i64 @read_up_to_8(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1) unnamed_addr #0 personality ptr @rust_eh_personality !dbg !83 {
start:
%e.i4 = alloca %"core::array::TryFromSliceError", align 1
%_2 = icmp ult i64 %buf.1, 4, !dbg !86
br i1 %_2, label %bb12, label %bb2, !dbg !86
bb2:
%0 = tail call fastcc { ptr, i64 } @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h49ef79cc6f68a95eE"(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1, i64 noundef 4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_f34be9391f743b06d38618d13cbcb273), !dbg !87
%_8.1 = extractvalue { ptr, i64 } %0, 1, !dbg !87
%_3.i.i.not = icmp eq i64 %_8.1, 4, !dbg !88
br i1 %_3.i.i.not, label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit10", label %bb2.i7, !dbg !88
bb2.i7:
call void @llvm.lifetime.start.p0(i64 0, ptr nonnull %e.i4), !dbg !101
call void @core::result::unwrap_failed(ptr noalias noundef nonnull readonly align 1 @alloc_00ae4b301f7fab8ac9617c03fcbd7274, i64 noundef 43, ptr noundef nonnull align 1 %e.i4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @vtable.0, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_370cde08944f8943831ca8b1deca1e93) #5, !dbg !107
unreachable
"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit10":
%_8.0 = extractvalue { ptr, i64 } %0, 0, !dbg !87
%1 = load i32, ptr %_8.0, align 1, !dbg !109
%_16 = add i64 %buf.1, -4, !dbg !127
%2 = tail call fastcc { ptr, i64 } @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h80f63821a23bc3bfE"(ptr noalias noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1, i64 noundef %_16, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_e4adcb4d8250fddd1b37b00bce08fe63), !dbg !129
%_14.0 = extractvalue { ptr, i64 } %2, 0, !dbg !129
%_14.1 = extractvalue { ptr, i64 } %2, 1, !dbg !129
%3 = tail call fastcc { ptr, i64 } @"_ZN4core5slice5index74_$LT$impl$u20$core..ops..index..Index$LT$I$GT$$u20$for$u20$$u5b$T$u5d$$GT$5index17h49ef79cc6f68a95eE"(ptr noalias noundef nonnull readonly align 1 %_14.0, i64 noundef %_14.1, i64 noundef 4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_d2f049aead5c842897e2e27e600065c4), !dbg !130
%_13.1 = extractvalue { ptr, i64 } %3, 1, !dbg !130
%_3.i.i11.not = icmp eq i64 %_13.1, 4, !dbg !131
br i1 %_3.i.i11.not, label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit", label %bb2.i, !dbg !131
bb2.i:
call void @llvm.lifetime.start.p0(i64 0, ptr nonnull %e.i4), !dbg !135
call void @core::result::unwrap_failed(ptr noalias noundef nonnull readonly align 1 @alloc_00ae4b301f7fab8ac9617c03fcbd7274, i64 noundef 43, ptr noundef nonnull align 1 %e.i4, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @vtable.0, ptr noalias noundef nonnull readonly align 8 dereferenceable(24) @alloc_ac9535fe735cb77bb32d5ff848a81403) #5, !dbg !137
unreachable
"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit":
%4 = zext i32 %1 to i64, !dbg !138
%_13.0 = extractvalue { ptr, i64 } %3, 0, !dbg !130
%5 = load i32, ptr %_13.0, align 1, !dbg !139
%6 = zext i32 %5 to i64, !dbg !148
%_18 = shl i64 %_16, 3, !dbg !149
%7 = and i64 %_18, 56, !dbg !151
%_17 = shl i64 %6, %7, !dbg !151
%8 = or i64 %_17, %4, !dbg !152
br label %bb12, !dbg !153
bb12:
%_0.0 = phi i64 [ %8, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h426e44bc45e636f4E.exit" ], [ 0, %start ], !dbg !154
ret i64 %_0.0, !dbg !153
} And with the incoming nightly which has put define noundef i64 @read_up_to_8(ptr noalias nocapture noundef nonnull readonly align 1 %buf.0, i64 noundef %buf.1) unnamed_addr #1 personality ptr @rust_eh_personality {
start:
%_2 = icmp ult i64 %buf.1, 4
br i1 %_2, label %bb12, label %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h6c548023a09f6572E.exit"
"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h6c548023a09f6572E.exit": ; preds = %start
%_16 = add i64 %buf.1, -4
%0 = load i32, ptr %buf.0, align 1, !alias.scope !3
%lo = zext i32 %0 to i64
%data.i.i = getelementptr inbounds i8, ptr %buf.0, i64 %_16
%1 = load i32, ptr %data.i.i, align 1, !alias.scope !8
%hi = zext i32 %1 to i64
%_18 = shl i64 %_16, 3
%2 = and i64 %_18, 56
%_17 = shl i64 %hi, %2
%3 = or i64 %_17, %lo
br label %bb12
bb12: ; preds = %start, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h6c548023a09f6572E.exit"
%_0.0 = phi i64 [ %3, %"_ZN4core6result19Result$LT$T$C$E$GT$6unwrap17h6c548023a09f6572E.exit" ], [ 0, %start ]
ret i64 %_0.0
} |
Tweak inlining attributes for slice indexing Doing some experiments in response to this unexpected regression: rust-lang#120863 (comment) I expect the opt changes to be addressed by something like reviving rust-lang#91222. The debug changes are what I'm interested in. Codegen tests will probably fail from time to time in this PR, I will fix them up later but also I don't trust the opt-level-z one: rust-lang#119878 (comment) r? `@ghost`
Fixes #115463
cc @thomcc
This tweaks
unwrap
onOption
&Result
to be two parts:#[inline(always)]
for checking the discriminant#[cold]
for actually panickingThe idea here is that checking the discriminant on a
Result
orshould always be trivial enough to be worth inlining, even inOption
opt-level=z
, especially compared to passing it to a function.As seen in the issue and codegen test, this will hopefully help particularly for things like
.try_into().unwrap()
s that are actually infallible, but in a way that's only visible with the inlining.EDIT: I've restricted this to
Result
to avoid combining effects