Remove most `#[inline]` annotations #119

alexcrichton · 2019-10-01T13:03:39Z

This commit goes through and deletes almost all #[inline] annotations
in this crate. It looks like before this commit basically every single
function is #[inline], but this is generally not necessary for
performance and can have a severe impact on compile times in both debug
and release modes, most severely in release mode.

Some #[inline] annotations are definitely necessary, however. Most
functions in this crate are already candidates for inlining because
they're generic, but functions like Group and BitMask aren't
candidates for inlining without #[inline]. Additionally LLVM is by no
means perfect, so some #[inline] may still be necessary to get some
further speedups.

The procedure used to generate this commit looked like:

Remove all #[inline] annotations.
Run cargo bench, comparing against the master branch, and add
#[inline] to hot spots as necessary.
A PR was made against rust-lang/rust to evaluate the impact
on the compiler for more performance data.
Using this data, perf diff was used locally to determine further hot
spots and more #[inline] annotations were added.
A second round of benchmarking was done

The numbers are at the point where I think this should land in the crate
and get published to move into the standard library. There are up to 20%
wins in compile time for hashmap-heavy crates (like Cargo) and milder
wins (up to 10%) for a number of other large crates. The regressions are
all in the 1-3% range and are largely on benchmarks taking a few handful
of milliseconds anyway, which I'd personally say is a worthwhile
tradeoff.

For comparison, the benchmarks of this crate before and after this
commit look like so:

   name                         baseline ns/iter  new ns/iter  diff ns/iter   diff %  speedup
   insert_ahash_highbits        7,137             9,044               1,907   26.72%   x 0.79
   insert_ahash_random          7,575             9,789               2,214   29.23%   x 0.77
   insert_ahash_serial          9,833             9,476                -357   -3.63%   x 1.04
   insert_erase_ahash_highbits  15,824            19,164              3,340   21.11%   x 0.83
   insert_erase_ahash_random    16,933            20,353              3,420   20.20%   x 0.83
   insert_erase_ahash_serial    20,857            27,675              6,818   32.69%   x 0.75
   insert_erase_std_highbits    35,117            38,385              3,268    9.31%   x 0.91
   insert_erase_std_random      35,357            37,236              1,879    5.31%   x 0.95
   insert_erase_std_serial      30,617            34,136              3,519   11.49%   x 0.90
   insert_std_highbits          15,675            18,180              2,505   15.98%   x 0.86
   insert_std_random            16,566            17,803              1,237    7.47%   x 0.93
   insert_std_serial            14,612            16,025              1,413    9.67%   x 0.91
   iter_ahash_highbits          1,715             1,640                 -75   -4.37%   x 1.05
   iter_ahash_random            1,721             1,634                 -87   -5.06%   x 1.05
   iter_ahash_serial            1,723             1,636                 -87   -5.05%   x 1.05
   iter_std_highbits            1,715             1,634                 -81   -4.72%   x 1.05
   iter_std_random              1,715             1,637                 -78   -4.55%   x 1.05
   iter_std_serial              1,722             1,637                 -85   -4.94%   x 1.05
   lookup_ahash_highbits        4,565             5,809               1,244   27.25%   x 0.79
   lookup_ahash_random          4,632             4,047                -585  -12.63%   x 1.14
   lookup_ahash_serial          4,612             4,906                 294    6.37%   x 0.94
   lookup_fail_ahash_highbits   4,206             3,976                -230   -5.47%   x 1.06
   lookup_fail_ahash_random     4,327             4,211                -116   -2.68%   x 1.03
   lookup_fail_ahash_serial     8,999             4,386              -4,613  -51.26%   x 2.05
   lookup_fail_std_highbits     13,284            13,342                 58    0.44%   x 1.00
   lookup_fail_std_random       13,172            13,614                442    3.36%   x 0.97
   lookup_fail_std_serial       11,240            11,539                299    2.66%   x 0.97
   lookup_std_highbits          13,075            13,333                258    1.97%   x 0.98
   lookup_std_random            13,257            13,193                -64   -0.48%   x 1.00
   lookup_std_serial            10,782            10,917                135    1.25%   x 0.99

The summary of this from what I can tell is that the microbenchmarks are
sort of all over the place, but they're neither consistently regressing
nor improving, as expected. In general I would be surprised if there's
much of a significant performance regression attributed to this commit,
and #[inline] can always be selectively added back in easily without
adding it to every function in the crate.

build.rs

nnethercote · 2019-10-01T19:50:53Z

the microbenchmarks are sort of all over the place, but they're neither consistently regressing nor improving

I see more regressions than improvements, and with the exception of lookup_fail_ahash_serial, the regressions are mostly larger than the improvements. This becomes clearer if you sort the table by diff %.

   name                         baseline ns/iter  new ns/iter  diff ns/iter   diff %  speedup
   insert_erase_ahash_serial    20,857            27,675              6,818   32.69%   x 0.75
   insert_ahash_random          7,575             9,789               2,214   29.23%   x 0.77
   lookup_ahash_highbits        4,565             5,809               1,244   27.25%   x 0.79
   insert_ahash_highbits        7,137             9,044               1,907   26.72%   x 0.79
   insert_erase_ahash_highbits  15,824            19,164              3,340   21.11%   x 0.83
   insert_erase_ahash_random    16,933            20,353              3,420   20.20%   x 0.83
   insert_std_highbits          15,675            18,180              2,505   15.98%   x 0.86
   insert_erase_std_serial      30,617            34,136              3,519   11.49%   x 0.90
   insert_std_serial            14,612            16,025              1,413    9.67%   x 0.91
   insert_erase_std_highbits    35,117            38,385              3,268    9.31%   x 0.91
   insert_std_random            16,566            17,803              1,237    7.47%   x 0.93
   lookup_ahash_serial          4,612             4,906                 294    6.37%   x 0.94
   insert_erase_std_random      35,357            37,236              1,879    5.31%   x 0.95
   lookup_fail_std_random       13,172            13,614                442    3.36%   x 0.97
   lookup_fail_std_serial       11,240            11,539                299    2.66%   x 0.97
   lookup_std_highbits          13,075            13,333                258    1.97%   x 0.98
   lookup_std_serial            10,782            10,917                135    1.25%   x 0.99
   lookup_fail_std_highbits     13,284            13,342                 58    0.44%   x 1.00
   lookup_std_random            13,257            13,193                -64   -0.48%   x 1.00
   lookup_fail_ahash_random     4,327             4,211                -116   -2.68%   x 1.03
   insert_ahash_serial          9,833             9,476                -357   -3.63%   x 1.04
   iter_ahash_highbits          1,715             1,640                 -75   -4.37%   x 1.05
   iter_std_random              1,715             1,637                 -78   -4.55%   x 1.05
   iter_std_highbits            1,715             1,634                 -81   -4.72%   x 1.05
   iter_std_serial              1,722             1,637                 -85   -4.94%   x 1.05
   iter_ahash_serial            1,723             1,636                 -87   -5.05%   x 1.05
   iter_ahash_random            1,721             1,634                 -87   -5.06%   x 1.05
   lookup_fail_ahash_highbits   4,206             3,976                -230   -5.47%   x 1.06
   lookup_ahash_random          4,632             4,047                -585  -12.63%   x 1.14
   lookup_fail_ahash_serial     8,999             4,386              -4,613  -51.26%   x 2.05

I don't know anything about the relative importance of each benchmark, though.

Amanieu · 2019-10-01T20:13:10Z

The general order of importance for hash table operations is (from most important to least):

successful lookup
unsuccessful lookup
insert (which implies an unsuccessful lookup)
remove
iteration

I am particularly worried about the regression in insertion benchmarks. Looking at the disassembly shows that there are 2 out-of-line functions: HashMap::insert (which checks whether the key exists) and RawTable::insert (which doesn't).

As a point of comparison, in the C++ version of SwissTables, every function is inline except for prepare_for_insert which roughly maps to RawTable::insert.

alexcrichton · 2019-10-01T21:58:30Z

I'm personally very wary to consider these microbenchmarks serious regressions and/or grounds for skipping this PR entirely. One benchmark got 100% faster by removing #[inline] which shows that these I think are sort of all over the place and extremely susceptible to decisions in LLVM, and penalizing all users with more codegen does not seem like a fair tradeoff. I've also seen that when enabling LTO on master most of these benchmarks 'regress'. I think they've just got a good deal of variation.

I am particularly worried about the regression in insertion benchmarks.

One thing I've tried to emphasize with this PR is drawing from data. Data sources like perf.r-l.o and these local benchmarks are showing that 90% of the wins are just inlining the functions which otherwise would not be candidates for inlining (like non-generic functions). There's an extremely long tail of "regressions" elsewhere because I think we should make an explicit decision to trade off a miniscule amount of perf in microbenchmarks for 20% compile time in hashmap-heavy crates.

This is a balancing act, and I think it's fine to use concrete data to guide insertion of #[inline], but I want to push back very hard against the idea that everything needs #[inline]. With what I mentioned above, I'm very wary of using these benchmarks in this repository to guide the insertion of #[inline] beyond the "90% of the perf matters" case. Getting a few percent on these benchmarks I don't think actually translates to real-world wins anywhere else.

As a point of comparison, in the C++ version of SwissTables, every function is inline except for prepare_for_insert which roughly maps to RawTable::insert.

I'd want to be clear though, Rust's compilation model has no parallel in C++. What C++ does with headers does not at all match Rust generics and #[inline]. While it's similar there are some subtle crucial differences.

The entire crate is "inlined" anyway since it's generic. Using #[inline] caues causes the compiler to do extra work, such as codegen'ing into every single codegen unit which references it as well as adding inlinehint to LLVM. Those latter two are the source of quite large compile-time slowdowns when using HashMap heavily (as seen on perf.r-l.o). The latter two are also almost always "fixed" via ThinLTO, just like all other non-#[inline] function which may or may not be generic in Rust.

Amanieu · 2019-10-01T23:43:52Z

Using #[inline] caues causes the compiler to do extra work, such as codegen'ing into every single codegen unit which references it as well as adding inlinehint to LLVM.

Isn't the codegen'ing done anyways for generic functions? This means that effectively in hashbrown all we are doing is adding the inlinehint attribute to a few functions, which in turn causes LLVM to inline more aggressively.

One thing I've tried to emphasize with this PR is drawing from data.

I disagree with your interpretation of the perf.r-l.o data: if you filter the results to only look at the check results, you can see that this change is a 1%-3% regression across the board. I would argue that this is a very significant regression considering that introducing hashbrown only achieved an average speedup of 5%.

Mark-Simulacrum · 2019-10-02T00:27:01Z

Looking at the wall time measurements it's pretty clear to me that most of the check regressions, while theoretically 1-3% are actually less than ~100ms longer in terms of compiletime. I agree with @alexcrichton here that the trade-off in compile time on optimized/debug LLVM builds is more than worth the possibly tiny losses in performance.

alexcrichton · 2019-10-02T15:10:09Z

No, #[inline] is very different than simply just an inline hint. As I mentioned before, there's no equivalent in C++ for what #[inline] does. In debug mode rustc basically ignores #[inline], pretending you didn't even write it. In release mode the compiler will, by default, codegen an #[inline] function into every single referencing codegen unit, and then it will also add inlinehint. This means that if you have 16 CGUs and they all reference a hash map, every single one is getting the entire hash map implementation inlined into it.

Instead the behavior of this PR is that only one CGU has a hash map (because it must be monomorphized), all 16 cgus reference it, and then ThinLTO will inline across codegen units as necessary.

I will again, as I usually do, strongly push back against religiously adhering to the numbers provided by perf.rust-lang.org. What you're looking at is instruction counts which does not guarantee any sort of correlation with respect to runtime. It can, and often is, an indicator that when instruction counts change something about the wall-time changes. Moving around a few percent of instructions here or there doesn't mean anything though in terms of a meaningful number, it simply means "please take the time to investigate more to understand what this change means".

As @Mark-Simulacrum points out the "regressions" here are on the order of milliseconds. I don't think anyone's going to lament that rustc is a few milliseconds slower on each crate, no one is even close to the scale where that matters at all. What people do care about is shaving 20% of their compile time when using hash maps. That's actually a significant win, and has real-world impacts on any "big" crate.

nnethercote · 2019-10-03T00:11:01Z

I made some comments about the rustc perf effects yesterday, here.

More generally, this change has two effects.

It hurts somewhat the performance of code that uses HashMap/HashSet.
It improves compile times of code that uses HashMap/HashSet, sometimes significantly. (This is for debug and opt builds, but not for check builds).

So the question is: what's the right balance between performance and compile times? Different people will have different opinions about this. @Amanieu worked hard to get hashbrown as fast as possible, and so will naturally be reluctant to make changes that compromise that. @alexcrichton works on Cargo, which stands to gain 18% compile time speedups, and so will naturally have a different opinion.

I don't know what the right answer is here, but having #[inline] on every (or almost every) function does seem excessive. Looking at executable sizes might be instructive, too.

BurntSushi · 2019-10-03T00:13:58Z

As a small note here, you could put inlining behind a feature that is enabled by default. That's what I did for regex (among other things): https://docs.rs/regex/1.3.1/regex/#performance-features

Of course, if you depend on anything that depends on hashbrown that enables the feature, then I don't think it can be turned off.

novacrazy · 2019-10-03T00:16:05Z

Just my two-cents, but I would gladly wait multiple minutes extra on compile times if it improves final runtime performance by even 5%. If you need faster debug iteration, why not just do something like:

#[cfg_attr(not(debug_assertions), inline)]?

nnethercote · 2019-10-03T00:18:08Z

I guess one way to do this is to measure multiple versions: no inlining, full inlining, and several points in between. If we had data on, say, five different versions, it might show that there is a sweet spot where we can get a big chunk of the compile-time wins with very little runtime performance cost. (Or it might show that there is no such sweet spot.)

I can see that @alexcrichton did some of that already. It would be instructive to have more data points; I understand that this would take a significant amount of time.

alexcrichton · 2019-10-03T14:25:23Z

I sort of get the impression that very few folks are ok admitting that getting compile times under control will require changing code we send to rustc. I feel that most of the discussion here is "look at the red number on perf.r-l.o, that means we can't land anything right?" That line of reasoning I find pretty unproductive and also misses the point of what perf.r-l.o even is, which I'll say again is purely instruction counts which may correlate with wall-time performance, but don't always.

I don't think it's the case that 100% of Rust users want the absolute fastest code at all costs no matter how long it takes. I'm sure that's the case for some but I think there's a very large chunk of users that want to also be able to reasonably iterate fast as well (cue everyone who's ever thought that Rust compiles slowly). I feel like our job in the standard library is to strike a balance, and adding #[inline] on every function we can find is a case of gross overuse and feels entirely driven by fear that someone might eventually show a benchmark that's slower. The hashbrown crate is just that, a crate on crates.io. If @Amanieu you really want to keep #[inline] everywhere then @BurntSushi's idea seems reasonable, but I would personally have blocked hashbrown landing in the standard library had I seen that #[inline] was applied literally everywhere.

I personally find it extremely difficult and frustrating to make these sorts of changes. As I mentioned above I feel that few want to admit that these sorts of changes are necessary for getting compile times under control. This has been true for all of Rust's history, for example I was quite frustrated that parallel codegen was stymied due to the lack of ThinLTO originally. This later ended up being the only major dip in compile times in Rust's history when we finally got it enabled with ThinLTO. This is a way of saying that I'm running out of steam for making these kinds of changes since for years basically no one seems to "be on my side". That's a sign that I'm one of the only who cares enough about this to put energy into it, and it's not really worth my time if I'm always the sole advocate.

bluss · 2019-10-03T19:19:35Z

@alexcrichton Your explanation of #[inline] here is hugely helpful! It would be great if we could spread some up to date information in the community about how inlining works in Rust in the landscape of codegen units, debug vs release compiles etc.

I think what you are doing always sets an example (it does for me at least, I started working on a de-inlining PR for a crate yesterday, though it's possible I'm too optimistic about inline(always) too, for small methods)

Amanieu · 2019-10-03T21:22:37Z

Thanks @alexcrichton for your explanation of #[inline], it makes the problem much clearer. I agree that we should aim to reduce compile times (18% reduction in compile time for cargo is huge and definitely worth the performance cost).

I am happy to accept what @BurntSushi suggested, which is to put the inlining behind an inline feature, with the exception of some hand-picked hot functions which are always marked #[inline].

However I would also like to get a better understanding of how #[inline] interacts with codegen units in Rust. If my understanding is correct, the main performance cost is that we need to generate LLVM IR multiple times if a function is referenced by multiple codegen units. What if we only marked internal methods with #[inline] but not the public API of HashMap? If my understanding is correct (which it probably isn't) the public methods will only be monomorphized in one codegen unit, and all the inlined internal methods will only be referenced from that codegen unit. Would this avoid the issue of generating LLVM IR multiple times for the same method?

nnethercote · 2019-10-03T22:27:41Z

It's clear to me that rust-lang/rust#64600 and this PR have identified that excessive inlining of library functions can have a shockingly large effect on debug/opt compile times. As someone who has put a lot of energy into improving compile times, I'm taking this as good news -- there's a whole new area of potential improvements that I previously didn't know about. But every one of those potential improvements could involve a compile time vs runtime trade-off. New tools to identify which inlined functions cause the most bloat will be very valuable.

Avoids unnecessary rebuilds when locally developing the crate.

Helps when debugging and looking at symbols to see what we got.

This commit goes through and deletes almost all `#[inline]` annotations in this crate. It looks like before this commit basically every single function is `#[inline]`, but this is generally not necessary for performance and can have a severe impact on compile times in both debug and release modes, most severely in release mode. Some `#[inline]` annotations are definitely necessary, however. Most functions in this crate are already candidates for inlining because they're generic, but functions like `Group` and `BitMask` aren't candidates for inlining without `#[inline]`. Additionally LLVM is by no means perfect, so some `#[inline]` may still be necessary to get some further speedups. The procedure used to generate this commit looked like: * Remove all `#[inline]` annotations. * Run `cargo bench`, comparing against the `master` branch, and add `#[inline]` to hot spots as necessary. * A [PR] was made against rust-lang/rust to [evaluate the impact][run1] on the compiler for more performance data. * Using this data, `perf diff` was used locally to determine further hot spots and more `#[inline]` annotations were added. * A [second round of benchmarking][run2] was done The numbers are at the point where I think this should land in the crate and get published to move into the standard library. There are up to 20% wins in compile time for hashmap-heavy crates (like Cargo) and milder wins (up to 10%) for a number of other large crates. The regressions are all in the 1-3% range and are largely on benchmarks taking a few handful of milliseconds anyway, which I'd personally say is a worthwhile tradeoff. For comparison, the benchmarks of this crate before and after this commit look like so: name baseline ns/iter new ns/iter diff ns/iter diff % speedup insert_ahash_highbits 7,137 9,044 1,907 26.72% x 0.79 insert_ahash_random 7,575 9,789 2,214 29.23% x 0.77 insert_ahash_serial 9,833 9,476 -357 -3.63% x 1.04 insert_erase_ahash_highbits 15,824 19,164 3,340 21.11% x 0.83 insert_erase_ahash_random 16,933 20,353 3,420 20.20% x 0.83 insert_erase_ahash_serial 20,857 27,675 6,818 32.69% x 0.75 insert_erase_std_highbits 35,117 38,385 3,268 9.31% x 0.91 insert_erase_std_random 35,357 37,236 1,879 5.31% x 0.95 insert_erase_std_serial 30,617 34,136 3,519 11.49% x 0.90 insert_std_highbits 15,675 18,180 2,505 15.98% x 0.86 insert_std_random 16,566 17,803 1,237 7.47% x 0.93 insert_std_serial 14,612 16,025 1,413 9.67% x 0.91 iter_ahash_highbits 1,715 1,640 -75 -4.37% x 1.05 iter_ahash_random 1,721 1,634 -87 -5.06% x 1.05 iter_ahash_serial 1,723 1,636 -87 -5.05% x 1.05 iter_std_highbits 1,715 1,634 -81 -4.72% x 1.05 iter_std_random 1,715 1,637 -78 -4.55% x 1.05 iter_std_serial 1,722 1,637 -85 -4.94% x 1.05 lookup_ahash_highbits 4,565 5,809 1,244 27.25% x 0.79 lookup_ahash_random 4,632 4,047 -585 -12.63% x 1.14 lookup_ahash_serial 4,612 4,906 294 6.37% x 0.94 lookup_fail_ahash_highbits 4,206 3,976 -230 -5.47% x 1.06 lookup_fail_ahash_random 4,327 4,211 -116 -2.68% x 1.03 lookup_fail_ahash_serial 8,999 4,386 -4,613 -51.26% x 2.05 lookup_fail_std_highbits 13,284 13,342 58 0.44% x 1.00 lookup_fail_std_random 13,172 13,614 442 3.36% x 0.97 lookup_fail_std_serial 11,240 11,539 299 2.66% x 0.97 lookup_std_highbits 13,075 13,333 258 1.97% x 0.98 lookup_std_random 13,257 13,193 -64 -0.48% x 1.00 lookup_std_serial 10,782 10,917 135 1.25% x 0.99 The summary of this from what I can tell is that the microbenchmarks are sort of all over the place, but they're neither consistently regressing nor improving, as expected. In general I would be surprised if there's much of a significant performance regression attributed to this commit, and `#[inline]` can always be selectively added back in easily without adding it to every function in the crate. [PR]: rust-lang/rust#64846 [run1]: rust-lang/rust#64846 (comment) [run2]: rust-lang/rust#64846 (comment)

alexcrichton · 2019-10-09T05:17:16Z

I've pushed up a version which adds back #[inline] behind a #[cfg_attr] for all "removed" #[inline] annotations

Amanieu · 2019-10-09T08:52:12Z

Thanks @alexcrichton!

Just to satisfy my curiosity (and check that I understand #[inline] correctly), could you tell me if the following would work in theory? I'm not asking you to change the PR, I'm happy with it as it is.

However I would also like to get a better understanding of how #[inline] interacts with codegen units in Rust. If my understanding is correct, the main performance cost is that we need to generate LLVM IR multiple times if a function is referenced by multiple codegen units. What if we only marked internal methods with #[inline] but not the public API of HashMap? If my understanding is correct (which it probably isn't) the public methods will only be monomorphized in one codegen unit, and all the inlined internal methods will only be referenced from that codegen unit. Would this avoid the issue of generating LLVM IR multiple times for the same method?

alexcrichton · 2019-10-09T16:57:54Z

I would need to verify, but I think your understanding is correct and that would have the same effect of causing std::collections::HashMap to not forcibly get inlined into all CGUs.

Amanieu · 2019-10-13T11:55:57Z

Sorry about the delay, I'm dealing with some CI issues in #121.

Amanieu · 2019-10-15T13:05:30Z

@bors r+

bors · 2019-10-15T13:05:31Z

📌 Commit 4e9e27d has been approved by Amanieu

bors · 2019-10-15T13:05:37Z

⌛ Testing commit 4e9e27d with merge b8c34c9...

Remove most `#[inline]` annotations This commit goes through and deletes almost all `#[inline]` annotations in this crate. It looks like before this commit basically every single function is `#[inline]`, but this is generally not necessary for performance and can have a severe impact on compile times in both debug and release modes, most severely in release mode. Some `#[inline]` annotations are definitely necessary, however. Most functions in this crate are already candidates for inlining because they're generic, but functions like `Group` and `BitMask` aren't candidates for inlining without `#[inline]`. Additionally LLVM is by no means perfect, so some `#[inline]` may still be necessary to get some further speedups. The procedure used to generate this commit looked like: * Remove all `#[inline]` annotations. * Run `cargo bench`, comparing against the `master` branch, and add `#[inline]` to hot spots as necessary. * A [PR] was made against rust-lang/rust to [evaluate the impact][run1] on the compiler for more performance data. * Using this data, `perf diff` was used locally to determine further hot spots and more `#[inline]` annotations were added. * A [second round of benchmarking][run2] was done The numbers are at the point where I think this should land in the crate and get published to move into the standard library. There are up to 20% wins in compile time for hashmap-heavy crates (like Cargo) and milder wins (up to 10%) for a number of other large crates. The regressions are all in the 1-3% range and are largely on benchmarks taking a few handful of milliseconds anyway, which I'd personally say is a worthwhile tradeoff. For comparison, the benchmarks of this crate before and after this commit look like so: ``` name baseline ns/iter new ns/iter diff ns/iter diff % speedup insert_ahash_highbits 7,137 9,044 1,907 26.72% x 0.79 insert_ahash_random 7,575 9,789 2,214 29.23% x 0.77 insert_ahash_serial 9,833 9,476 -357 -3.63% x 1.04 insert_erase_ahash_highbits 15,824 19,164 3,340 21.11% x 0.83 insert_erase_ahash_random 16,933 20,353 3,420 20.20% x 0.83 insert_erase_ahash_serial 20,857 27,675 6,818 32.69% x 0.75 insert_erase_std_highbits 35,117 38,385 3,268 9.31% x 0.91 insert_erase_std_random 35,357 37,236 1,879 5.31% x 0.95 insert_erase_std_serial 30,617 34,136 3,519 11.49% x 0.90 insert_std_highbits 15,675 18,180 2,505 15.98% x 0.86 insert_std_random 16,566 17,803 1,237 7.47% x 0.93 insert_std_serial 14,612 16,025 1,413 9.67% x 0.91 iter_ahash_highbits 1,715 1,640 -75 -4.37% x 1.05 iter_ahash_random 1,721 1,634 -87 -5.06% x 1.05 iter_ahash_serial 1,723 1,636 -87 -5.05% x 1.05 iter_std_highbits 1,715 1,634 -81 -4.72% x 1.05 iter_std_random 1,715 1,637 -78 -4.55% x 1.05 iter_std_serial 1,722 1,637 -85 -4.94% x 1.05 lookup_ahash_highbits 4,565 5,809 1,244 27.25% x 0.79 lookup_ahash_random 4,632 4,047 -585 -12.63% x 1.14 lookup_ahash_serial 4,612 4,906 294 6.37% x 0.94 lookup_fail_ahash_highbits 4,206 3,976 -230 -5.47% x 1.06 lookup_fail_ahash_random 4,327 4,211 -116 -2.68% x 1.03 lookup_fail_ahash_serial 8,999 4,386 -4,613 -51.26% x 2.05 lookup_fail_std_highbits 13,284 13,342 58 0.44% x 1.00 lookup_fail_std_random 13,172 13,614 442 3.36% x 0.97 lookup_fail_std_serial 11,240 11,539 299 2.66% x 0.97 lookup_std_highbits 13,075 13,333 258 1.97% x 0.98 lookup_std_random 13,257 13,193 -64 -0.48% x 1.00 lookup_std_serial 10,782 10,917 135 1.25% x 0.99 ``` The summary of this from what I can tell is that the microbenchmarks are sort of all over the place, but they're neither consistently regressing nor improving, as expected. In general I would be surprised if there's much of a significant performance regression attributed to this commit, and `#[inline]` can always be selectively added back in easily without adding it to every function in the crate. [PR]: rust-lang/rust#64846 [run1]: rust-lang/rust#64846 (comment) [run2]: rust-lang/rust#64846 (comment)

bors · 2019-10-15T13:27:40Z

☀️ Test successful - checks-travis
Approved by: Amanieu
Pushing b8c34c9 to master...

alexcrichton · 2019-10-22T15:17:13Z

Thanks @Amanieu! Mind publishing so I can include this in rust-lang/rust as well?

Amanieu · 2019-10-23T19:50:35Z

I've just published hashbrown 0.6.2. I made the unwind-more feature enabled by default since I don't want to regress performance for anyone using the hashbrown crate directly. However libstd uses hashbrown with default features disabled, and shouldn't be affected.

Pulls in rust-lang/hashbrown#119 which should be a good improvement for compile times of hashmap-heavy crates.

alexcrichton · 2019-10-24T15:08:14Z

Thanks! I've opened rust-lang/rust#65766 to merge this into libstd

… r=Mark-Simulacrum Update hashbrown to 0.6.2 Pulls in rust-lang/hashbrown#119 which should be a good improvement for compile times of hashmap-heavy crates.

alexcrichton mentioned this pull request Oct 1, 2019

[WIP] Test out removing lots of #[inline] from hashbrown rust-lang/rust#64846

Closed

Amanieu reviewed Oct 1, 2019

View reviewed changes

build.rs Show resolved Hide resolved

alexcrichton added 4 commits October 8, 2019 22:15

Only rerun build script if build script changes

7a548a5

Avoids unnecessary rebuilds when locally developing the crate.

Give modules native name instead of imp

94ca780

Helps when debugging and looking at symbols to see what we got.

Add back many #[inline] behind a #[cfg]

4e9e27d

alexcrichton force-pushed the less-generics branch from f1666de to 4e9e27d Compare October 9, 2019 05:16

bors merged commit 4e9e27d into rust-lang:master Oct 15, 2019

alexcrichton deleted the less-generics branch October 22, 2019 15:16

alexcrichton added a commit to alexcrichton/rust that referenced this pull request Oct 24, 2019

Update hashbrown to 0.6.2

060b6cb

Pulls in rust-lang/hashbrown#119 which should be a good improvement for compile times of hashmap-heavy crates.

alexcrichton mentioned this pull request Oct 24, 2019

Update hashbrown to 0.6.2 rust-lang/rust#65766

Merged

xStrom mentioned this pull request May 28, 2020

Statically typed selectors, take two. linebender/druid#993

Merged

cuviper mentioned this pull request Jun 25, 2020

Switch to hashbrown's RawTable internally indexmap-rs/indexmap#131

Merged

cbeck88 mentioned this pull request Sep 2, 2020

MCC-697 Digestible rework to use merlin mobilecoinfoundation/mobilecoin#389

Merged

Zoxc mentioned this pull request Jul 21, 2021

Make rehashing and resizing less generic #282

Merged

saethlin mentioned this pull request Jan 19, 2022

Documentation for #[inline] in the reference doesn't align with external resources rust-lang/reference#1140

Open

CAGS295 mentioned this pull request Jul 11, 2022

CSUB-202 OCW Lock gluwa/creditcoin#388

Merged

9 tasks

SUPERCILEX mentioned this pull request Oct 17, 2022

Make transpose const and inline rust-lang/rust#103127

Merged

taiki-e mentioned this pull request Mar 1, 2023

Inline project methods taiki-e/pin-project-lite#74

Merged

JohnBobbo96 mentioned this pull request Apr 16, 2023

Add the inlined_generics lint. rust-lang/rust-clippy#10650

Closed

cmyr mentioned this pull request Jun 9, 2023

Consistently use #[inline] in font-types googlefonts/fontations#490

Open

daxpedda mentioned this pull request Aug 14, 2024

Remove unnecessary #[inline] hints rust-windowing/winit#3867

Open

trbritt mentioned this pull request Aug 26, 2024

Tristan/war 540 sylow initial performance optimization warlock-labs/sylow#27

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove most `#[inline]` annotations #119

Remove most `#[inline]` annotations #119

alexcrichton commented Oct 1, 2019

nnethercote commented Oct 1, 2019

Amanieu commented Oct 1, 2019

alexcrichton commented Oct 1, 2019

Amanieu commented Oct 1, 2019

Mark-Simulacrum commented Oct 2, 2019

alexcrichton commented Oct 2, 2019

nnethercote commented Oct 3, 2019

BurntSushi commented Oct 3, 2019 •

edited

Loading

novacrazy commented Oct 3, 2019

nnethercote commented Oct 3, 2019

alexcrichton commented Oct 3, 2019

bluss commented Oct 3, 2019

Amanieu commented Oct 3, 2019

nnethercote commented Oct 3, 2019

alexcrichton commented Oct 9, 2019

Amanieu commented Oct 9, 2019

alexcrichton commented Oct 9, 2019

Amanieu commented Oct 13, 2019

Amanieu commented Oct 15, 2019

bors commented Oct 15, 2019

bors commented Oct 15, 2019

bors commented Oct 15, 2019

alexcrichton commented Oct 22, 2019

Amanieu commented Oct 23, 2019

alexcrichton commented Oct 24, 2019

Remove most #[inline] annotations #119

Remove most #[inline] annotations #119

Conversation

alexcrichton commented Oct 1, 2019

nnethercote commented Oct 1, 2019

Amanieu commented Oct 1, 2019

alexcrichton commented Oct 1, 2019

Amanieu commented Oct 1, 2019

Mark-Simulacrum commented Oct 2, 2019

alexcrichton commented Oct 2, 2019

nnethercote commented Oct 3, 2019

BurntSushi commented Oct 3, 2019 • edited Loading

novacrazy commented Oct 3, 2019

nnethercote commented Oct 3, 2019

alexcrichton commented Oct 3, 2019

bluss commented Oct 3, 2019

Amanieu commented Oct 3, 2019

nnethercote commented Oct 3, 2019

alexcrichton commented Oct 9, 2019

Amanieu commented Oct 9, 2019

alexcrichton commented Oct 9, 2019

Amanieu commented Oct 13, 2019

Amanieu commented Oct 15, 2019

bors commented Oct 15, 2019

bors commented Oct 15, 2019

bors commented Oct 15, 2019

alexcrichton commented Oct 22, 2019

Amanieu commented Oct 23, 2019

alexcrichton commented Oct 24, 2019

Remove most `#[inline]` annotations #119

Remove most `#[inline]` annotations #119

BurntSushi commented Oct 3, 2019 •

edited

Loading