Hint optimizer about reserved capacity #116568

kornelski · 2023-10-09T14:12:20Z

The fact that Vec had capacity increased is not known to the optimizer, because functions like do_reserve_and_handle are not inlined. (#82801) Because of that, code such as:

vec.try_reserve(123)?;
vec.extend_from_slice(&s[..123]);

Tries to reserve the capacity twice. This is unnecessary code bloat.

https://rust.godbolt.org/z/YWY16Ezej

Adding a hint after reserve optimizes out the next check of self.needs_to_grow(len, additional).

rustbot · 2023-10-09T14:12:30Z

r? @m-ou-se

(rustbot has picked a reviewer for you, use r? to override)

CryZe · 2023-10-09T15:44:12Z

I think this is what core::intrinsics::assume is for, which should probably be preferred. Seems to generate the same assembly too.

workingjubilee · 2023-10-10T04:27:30Z

Let's give this a whirl.
@bors try @rust-timer queue

bors · 2023-10-10T04:28:39Z

⌛ Trying commit 9a55d7a with merge 5137d8a...

Hint optimizer about reserved capacity The fact that `Vec` had capacity increased is not known to the optimizer, because functions like `do_reserve_and_handle` are not inlined. Because of that, code such as: ```rust vec.try_reserve(123)?; vec.extend_from_slice(&s[..123]); ``` Tries to reserve the capacity **twice**. This is unnecessary code bloat. https://rust.godbolt.org/z/YWY16Ezej Adding a hint after reserve optimizes out the next check of `self.needs_to_grow(len, additional)`.

bors · 2023-10-10T05:42:20Z

☀️ Try build successful - checks-actions
Build commit: 5137d8a (5137d8ae3f0ffd6ebe1f4667505311a29e3c56a6)

rust-timer · 2023-10-10T07:03:14Z

Finished benchmarking commit (5137d8a): comparison URL.

Overall result: ❌ regressions - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.3%, 0.8%]	6
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-	-	0
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.5%	[0.3%, 0.8%]	6

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	4.9%	[4.9%, 4.9%]	1
Regressions ❌ (secondary)	4.2%	[4.2%, 4.2%]	1
Improvements ✅ (primary)	-2.6%	[-2.6%, -2.6%]	1
Improvements ✅ (secondary)	-1.1%	[-1.1%, -1.1%]	1
All ❌✅ (primary)	1.2%	[-2.6%, 4.9%]	2

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.4%	[0.8%, 8.0%]	6
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-1.9%	[-1.9%, -1.9%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	2.7%	[-1.9%, 8.0%]	7

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.2%]	49
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.1%	[-0.1%, -0.0%]	3
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.1%, 0.2%]	52

Bootstrap: 626.006s -> 627.675s (0.27%)
Artifact size: 270.80 MiB -> 270.90 MiB (0.04%)

workingjubilee · 2023-10-10T08:13:17Z

slightly concerned that at larger scales like rustc, we're somehow seeing even larger artifacts, despite the minimal example's well-demonstrated changes. with no concomitant perf gains, either. I think we should have a codegen test that unquestionably demonstrates the improvement here before we commit this.

and maybe a more comprehensive change that makes the results slightly less middling at the larger scale.

kornelski · 2023-10-10T13:49:22Z

Unfortunately, the hint in std doesn't seem to have the same effect as a hint in user code ;(

I've even tried using #[inline(always)] on the Vec::reserve methods:

    #[inline(always)]
    pub fn reserve(&mut self, additional: usize) {
        self.buf.reserve(self.len, additional);
        unsafe {
            core::intrinsics::assume(additional <= self.buf.capacity().wrapping_sub(self.len));
        }
    }

The unexpected speed regression happens also in #114370

library/alloc/src/raw_vec.rs

kornelski · 2023-10-22T16:10:44Z

I think the regressions are expected here:

Mixed-bag results in optimization cost could be because the hint is added always, but not always has redundant allocation code to optimize out. The hint adds additional (dead) code to inlined functions, and extra info for the optimizer to work with.
binary size increases are almost all in debug profile. I would not be surprised if Rust generated debug info for the extra assume lines. Rust's debug info is already really large.
the only non-debug binary that has increased in size is the image library (10046380 to 10065324), but that's .rlib size where 54% of the rlib size is still the debug info. I have tested that locally, and rlib does increase, but when compiling a stripped binary that uses the image crate, I see no difference in size.

So I think paths forward:

Accept that's the cost of hints, and maybe investigate faster/more efficient debug info for std separately.
I can limit the hint to only try_reserve functions, since .try_reserve(); extend() is an important pattern, the rest was a nice to have.

workingjubilee · 2023-10-28T05:53:49Z

This does look more promising. The binary size remarks earlier were more of a "hrm, apparent cost without benefit" kind of remark.

I think we'd still want a codegen test either way, and then if you feel it would be worth testing the hint on try_reserve specifically, that might be good to see. Otherwise I think we could accept this with the test.

Hint optimizer about try-reserved capacity This is rust-lang#116568, but limited only to the less-common `try_reserve` functions to reduce bloat in debug binaries from debug info, while still addressing the main use-case rust-lang#116570

…ngjubilee Hint optimizer about try-reserved capacity This is rust-lang#116568, but limited only to the less-common `try_reserve` functions to reduce bloat in debug binaries from debug info, while still addressing the main use-case rust-lang#116570

bors · 2023-11-05T02:40:13Z

☔ The latest upstream changes (presumably #117503) made this pull request unmergeable. Please resolve the merge conflicts.

workingjubilee · 2023-11-06T00:48:53Z

Well, that's surprising.

workingjubilee · 2023-11-06T00:53:04Z

I wonder if that's just noise or if something changed in the meantime or the smaller change is genuinely that much better on its own. Let's see what we can see.

@bors try @rust-timer queue

bors · 2023-11-06T00:54:14Z

⌛ Trying commit 3e1cf5b with merge 361898b...

Hint optimizer about reserved capacity The fact that `Vec` had capacity increased is not known to the optimizer, because functions like `do_reserve_and_handle` are not inlined. (rust-lang#82801) Because of that, code such as: ```rust vec.try_reserve(123)?; vec.extend_from_slice(&s[..123]); ``` Tries to reserve the capacity **twice**. This is unnecessary code bloat. https://rust.godbolt.org/z/YWY16Ezej Adding a hint after reserve optimizes out the next check of `self.needs_to_grow(len, additional)`.

bors · 2023-11-06T02:21:18Z

☀️ Try build successful - checks-actions
Build commit: 361898b (361898b6c86fdf12241aea3b6d338064507e220a)

rust-timer · 2023-11-06T14:48:07Z

Finished benchmarking commit (361898b): comparison URL.

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

Benchmarking this pull request likely means that it is perf-sensitive, so we're automatically marking it as not fit for rolling up. While you can manually mark this PR as fit for rollup, we strongly recommend not doing so since this PR may lead to changes in compiler perf.

Next Steps: If you can justify the regressions found in this try perf run, please indicate this with @rustbot label: +perf-regression-triaged along with sufficient written justification. If you cannot justify the regressions please fix the regressions and do another perf run. If the next run shows neutral or positive results, the label will be automatically removed.

@bors rollup=never
@rustbot label: -S-waiting-on-perf +perf-regression

Instruction count

This is a highly reliable metric that was used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.5%	[0.3%, 0.8%]	9
Regressions ❌ (secondary)	-	-	0
Improvements ✅ (primary)	-0.2%	[-0.2%, -0.2%]	1
Improvements ✅ (secondary)	-0.3%	[-0.4%, -0.3%]	3
All ❌✅ (primary)	0.4%	[-0.2%, 0.8%]	10

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	3.9%	[0.6%, 12.0%]	9
Regressions ❌ (secondary)	3.3%	[2.3%, 4.3%]	2
Improvements ✅ (primary)	-1.7%	[-2.8%, -0.6%]	5
Improvements ✅ (secondary)	-0.6%	[-0.6%, -0.6%]	1
All ❌✅ (primary)	1.9%	[-2.8%, 12.0%]	14

Cycles

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.6%	[0.4%, 0.7%]	4
Regressions ❌ (secondary)	0.7%	[0.7%, 0.7%]	1
Improvements ✅ (primary)	-0.5%	[-0.5%, -0.4%]	2
Improvements ✅ (secondary)	-2.4%	[-2.4%, -2.4%]	1
All ❌✅ (primary)	0.2%	[-0.5%, 0.7%]	6

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

	mean	range	count
Regressions ❌ (primary)	0.1%	[0.0%, 0.3%]	45
Regressions ❌ (secondary)	0.2%	[0.2%, 0.2%]	1
Improvements ✅ (primary)	-0.4%	[-0.4%, -0.4%]	1
Improvements ✅ (secondary)	-	-	0
All ❌✅ (primary)	0.1%	[-0.4%, 0.3%]	46

Bootstrap: 636.185s -> 634.65s (-0.24%)
Artifact size: 304.50 MiB -> 304.39 MiB (-0.04%)

kornelski · 2023-11-06T20:26:16Z

It looks like a mixed bag again. With try_reserve() hint merged I think it's fine to leave this one out. Users of reserve() can get this benefit by switching to try_reserve().

Maybe eventually debug info/assume in llvm or whatever is costing here will get optimized and this can be revisited.

workingjubilee · 2023-11-06T23:07:15Z

Yeah, that's a much less beneficial diff, whereas the try-reserve-only version was much better on its own (even if it was mostly noise, it had no regressions), so this seems like the right balance.

mati865 · 2023-11-06T23:24:29Z

Well, that's surprising.

This is likely mostly noise, look at PR that changed only a comment:

#115333 (comment)

Similar things has happened to other PRs as well.

workingjubilee · 2023-11-06T23:36:17Z

@mati865 It would be nice to have a better idea somehow of how noisy a statistical output is so that we can make better-informed decisions, I feel the current metrics don't adequately capture this thing. I feel like the decisions here were correct anyways, though, because it seems to be that this PR was consistently mixed and the try-reserve version was consistently mildly-positive-or-not-bad.

mati865 · 2023-11-07T01:02:53Z

It's not noisy typically but lately something odd happened, take a look at this graph:

I feel like the decisions here were correct anyways, though, because it seems to be that this PR was consistently mixed and the try-reserve version was consistently mildly-positive-or-not-bad.

I do agree with that, previous runs had no significant noise.
The noise that we observed is "all or nothing", so either there is this two digit variation meaning the result is not meaningful or there is virtually no noise.

One more PR with similar noise: #116412 (comment)

EDIT: Created Zulip thread: https://rust-lang.zulipchat.com/#narrow/stream/247081-t-compiler.2Fperformance/topic/Recent.20noise.20spikes

rustbot assigned m-ou-se Oct 9, 2023

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 9, 2023

kornelski force-pushed the hint-reserved branch from 128562c to 6767186 Compare October 9, 2023 14:14

This comment has been minimized.

Sign in to view

kornelski force-pushed the hint-reserved branch from 6767186 to d4c78ff Compare October 9, 2023 15:00

kornelski force-pushed the hint-reserved branch from d4c78ff to 9a55d7a Compare October 9, 2023 17:12

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 10, 2023

This comment has been minimized.

Sign in to view

rustbot added perf-regression Performance regression. and removed S-waiting-on-perf Status: Waiting on a perf run to be completed. labels Oct 10, 2023

kornelski mentioned this pull request Oct 13, 2023

Handle out of memory errors in io::Read::read_to_end #116570

Closed

scottmcm reviewed Oct 14, 2023

View reviewed changes

library/alloc/src/raw_vec.rs Outdated Show resolved Hide resolved

krtab mentioned this pull request Oct 17, 2023

The compiler looses vector capacity information #82801

Open

kornelski force-pushed the hint-reserved branch from 0191b93 to 471ec70 Compare October 20, 2023 02:52

This comment has been minimized.

Sign in to view

kornelski force-pushed the hint-reserved branch 2 times, most recently from 048c423 to aa25a20 Compare October 20, 2023 03:15

This comment has been minimized.

Sign in to view

kornelski force-pushed the hint-reserved branch from aa25a20 to b90a3af Compare October 20, 2023 14:28

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Oct 22, 2023

kornelski force-pushed the hint-reserved branch from 36d4f56 to c03a3af Compare October 30, 2023 01:01

kornelski mentioned this pull request Nov 2, 2023

Hint optimizer about try-reserved capacity #117503

Merged

Hint optimizer about reserved capacity

3e1cf5b

kornelski force-pushed the hint-reserved branch from c03a3af to 3e1cf5b Compare November 6, 2023 00:41

This comment has been minimized.

Sign in to view

rustbot added the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 6, 2023

workingjubilee assigned workingjubilee and unassigned m-ou-se Nov 6, 2023

bors mentioned this pull request Nov 6, 2023

Couple of small changes #117615

Merged

This comment has been minimized.

Sign in to view

rustbot removed the S-waiting-on-perf Status: Waiting on a perf run to be completed. label Nov 6, 2023

kornelski closed this Nov 6, 2023

kornelski deleted the hint-reserved branch February 3, 2024 11:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hint optimizer about reserved capacity #116568

Hint optimizer about reserved capacity #116568

kornelski commented Oct 9, 2023 •

edited

Loading

rustbot commented Oct 9, 2023

This comment has been minimized.

CryZe commented Oct 9, 2023 •

edited

Loading

workingjubilee commented Oct 10, 2023

This comment has been minimized.

bors commented Oct 10, 2023

bors commented Oct 10, 2023

This comment has been minimized.

rust-timer commented Oct 10, 2023

workingjubilee commented Oct 10, 2023 •

edited

Loading

kornelski commented Oct 10, 2023 •

edited

Loading

This comment has been minimized.

This comment has been minimized.

kornelski commented Oct 22, 2023 •

edited

Loading

workingjubilee commented Oct 28, 2023 •

edited

Loading

bors commented Nov 5, 2023

workingjubilee commented Nov 6, 2023

workingjubilee commented Nov 6, 2023

This comment has been minimized.

bors commented Nov 6, 2023

bors commented Nov 6, 2023

This comment has been minimized.

rust-timer commented Nov 6, 2023

kornelski commented Nov 6, 2023

workingjubilee commented Nov 6, 2023

mati865 commented Nov 6, 2023 •

edited

Loading

workingjubilee commented Nov 6, 2023 •

edited

Loading

mati865 commented Nov 7, 2023 •

edited

Loading

Hint optimizer about reserved capacity #116568

Hint optimizer about reserved capacity #116568

Conversation

kornelski commented Oct 9, 2023 • edited Loading

rustbot commented Oct 9, 2023

This comment has been minimized.

CryZe commented Oct 9, 2023 • edited Loading

workingjubilee commented Oct 10, 2023

This comment has been minimized.

bors commented Oct 10, 2023

bors commented Oct 10, 2023

This comment has been minimized.

rust-timer commented Oct 10, 2023

Overall result: ❌ regressions - ACTION NEEDED

workingjubilee commented Oct 10, 2023 • edited Loading

kornelski commented Oct 10, 2023 • edited Loading

This comment has been minimized.

This comment has been minimized.

kornelski commented Oct 22, 2023 • edited Loading

workingjubilee commented Oct 28, 2023 • edited Loading

bors commented Nov 5, 2023

workingjubilee commented Nov 6, 2023

workingjubilee commented Nov 6, 2023

This comment has been minimized.

bors commented Nov 6, 2023

bors commented Nov 6, 2023

This comment has been minimized.

rust-timer commented Nov 6, 2023

Overall result: ❌✅ regressions and improvements - ACTION NEEDED

kornelski commented Nov 6, 2023

workingjubilee commented Nov 6, 2023

mati865 commented Nov 6, 2023 • edited Loading

workingjubilee commented Nov 6, 2023 • edited Loading

mati865 commented Nov 7, 2023 • edited Loading

kornelski commented Oct 9, 2023 •

edited

Loading

CryZe commented Oct 9, 2023 •

edited

Loading

workingjubilee commented Oct 10, 2023 •

edited

Loading

kornelski commented Oct 10, 2023 •

edited

Loading

kornelski commented Oct 22, 2023 •

edited

Loading

workingjubilee commented Oct 28, 2023 •

edited

Loading

mati865 commented Nov 6, 2023 •

edited

Loading

workingjubilee commented Nov 6, 2023 •

edited

Loading

mati865 commented Nov 7, 2023 •

edited

Loading