Clean up Vec's benchmarks #83458

saethlin · 2021-03-25T04:26:38Z

The Vec benchmarks need a lot of love. I sort of noticed this in #83357 but the overall situation is much less awesome than I thought at the time. The first commit just removes a lot of asserts and does a touch of other cleanup.

A number of these benchmarks are poorly-named. For example, bench_map_fast is not in fact fast, bench_rev_1 and bench_rev_2 are vague, bench_in_place_zip_iter_mut doesn't call zip, bench_in_place* don't do anything in-place... Should I fix these, or is there tooling that depend on the names not changing?

I've also noticed that bench_rev_1 and bench_rev_2 are remarkably fragile. It looks like poking other code in Vec can cause the codegen of this benchmark to switch to a version that has almost exactly half its current throughput and I have absolutely no idea why.

Here's the fast version:

  0.69 │110:   movdqu -0x20(%rbx,%rdx,4),%xmm0
  1.76 │       movdqu -0x10(%rbx,%rdx,4),%xmm1
  0.71 │       pshufd $0x1b,%xmm1,%xmm1
  0.60 │       pshufd $0x1b,%xmm0,%xmm0
  3.68 │       movdqu %xmm1,-0x30(%rcx)
 14.36 │       movdqu %xmm0,-0x20(%rcx)
 13.88 │       movdqu -0x40(%rbx,%rdx,4),%xmm0
  6.64 │       movdqu -0x30(%rbx,%rdx,4),%xmm1
  0.76 │       pshufd $0x1b,%xmm1,%xmm1
  0.77 │       pshufd $0x1b,%xmm0,%xmm0
  1.87 │       movdqu %xmm1,-0x10(%rcx)
 13.01 │       movdqu %xmm0,(%rcx)
 38.81 │       add    $0x40,%rcx
  0.92 │       add    $0xfffffffffffffff0,%rdx
  1.22 │     ↑ jne    110

And the slow one:

  0.42 │9a880:   movdqa     %xmm2,%xmm1
  4.03 │9a884:   movq       -0x8(%rbx,%rsi,4),%xmm4
  8.49 │9a88a:   pshufd     $0xe1,%xmm4,%xmm4
  2.58 │9a88f:   movq       -0x10(%rbx,%rsi,4),%xmm5
  7.02 │9a895:   pshufd     $0xe1,%xmm5,%xmm5
  4.79 │9a89a:   punpcklqdq %xmm5,%xmm4
  5.77 │9a89e:   movdqu     %xmm4,-0x18(%rdx)
 15.74 │9a8a3:   movq       -0x18(%rbx,%rsi,4),%xmm4
  3.91 │9a8a9:   pshufd     $0xe1,%xmm4,%xmm4
  5.04 │9a8ae:   movq       -0x20(%rbx,%rsi,4),%xmm5
  5.29 │9a8b4:   pshufd     $0xe1,%xmm5,%xmm5
  4.60 │9a8b9:   punpcklqdq %xmm5,%xmm4
  9.81 │9a8bd:   movdqu     %xmm4,-0x8(%rdx)
 11.05 │9a8c2:   paddq      %xmm3,%xmm0
  0.86 │9a8c6:   paddq      %xmm3,%xmm2
  5.89 │9a8ca:   add        $0x20,%rdx
  0.12 │9a8ce:   add        $0xfffffffffffffff8,%rsi
  1.16 │9a8d2:   add        $0x2,%rdi
  2.96 │9a8d6: → jne        9a880 <<alloc::vec::Vec<T,A> as core::iter::traits::collect::Extend<&T>>::extend+0xd0>

Many of the Vec benchmarks assert what values should be produced by the benchmarked code. In some cases, these asserts dominate the runtime of the benchmarks they are in, causing the benchmarks to understate the impact of an optimization or regression.

rust-highfive · 2021-03-25T04:26:41Z

r? @dtolnay

(rust-highfive has picked a reviewer for you, use r? to override)

the8472

bench_in_place_zip_iter_mut doesn't call zip

My bad, I could have picked better name. If I recall correctly it was baseline to which bench_in_place_zip_recycle gets compared.

bench_in_place* don't do anything in-place...

There's a specialization for vec.into_iter()....collect() which reuses the allocation, it should be exercising that.

the8472 · 2021-03-25T14:33:15Z

library/alloc/benches/vec.rs

-    b.iter(|| {
-        let v: Vec<u32> = Vec::with_capacity(src_len);
-        assert_eq!(v.len(), 0);
-        assert_eq!(v.capacity(), src_len);


note that benchmarks are also executed when you run ./x.py test, i.e. they can double as test-cases. If you remove those asserts here because they get in the way of measuring things you'll have to check if there are roughtly equivalent proper tests somewhere.

They seem to be quite trivial, so hopefully this isn't the only place where this stuff is tested, but I'd still check.

saethlin · 2021-03-25T18:13:13Z

There's a specialization for vec.into_iter()....collect() which reuses the allocation, it should be exercising that.

When I read the name of these benchmarks, I was expecting to find an implementation that doesn't allocate or free. In my mind, the whole point of an operation being in-place is that it doesn't allocate or free. So my concern/confusion is that the bench_in_place* benchmarks runtime is dominated by malloc and free because every iteration creates a new Vec, even though the whole point of the benchmarks seems to be exercising a specialization that reduce allocator activity. I don't think there's another benchmark to compare against to see the impact of the specialization.

Do we want to demonstrate the impact of the into_iter() -> collect specialization? Maybe we should have benchmarks that do zero allocation because of this specialization (like bench_in_place_recycle) and another set of benchmarks that aren't eligible for the specialization for some reason that has minimal impact on the runtime of the iteration (maybe alignment of the new T?).

the8472 · 2021-03-25T19:18:30Z

There's a specialization for vec.into_iter()....collect() which reuses the allocation, it should be exercising that.

When I read the name of these benchmarks, I was expecting to find an implementation that doesn't allocate or free. In my mind, the whole point of an operation being in-place is that it doesn't allocate or free. So my concern/confusion is that the bench_in_place* benchmarks runtime is dominated by malloc and free because every iteration creates a new Vec, even though the whole point of the benchmarks seems to be exercising a specialization that reduce allocator activity. I don't think there's another benchmark to compare against to see the impact of the specialization.

It still should be half the amount of malloc/frees compared to the naive approach. Which was possible to see when I did before/after benchmarks when developing. But granted it took quite some noise-reduction on my system to even see the effects. The _recycle approach makes it more obvious.
They also helped checking the assembly to see whether the xor vectorized. In retrospect I should have written codegen tests for that instead.

Do we want to demonstrate the impact of the into_iter() -> collect specialization? Maybe we should have benchmarks that do zero allocation because of this specialization (like bench_in_place_recycle) and another set of benchmarks that aren't eligible for the specialization for some reason that has minimal impact on the runtime of the iteration (maybe alignment of the new T?).

Well, the iter_mut based one should already be doing the same, but sure something based on alignment might be a more obvious baseline.

the8472 · 2021-03-25T19:24:30Z

I've also noticed that bench_rev_1 and bench_rev_2 are remarkably fragile. It looks like poking other code in Vec can cause the codegen of this benchmark to switch to a version that has almost exactly half its current throughput and I have absolutely no idea why.

Try benchmarking with incremental compilation off and codgenunits == 1 if that's not already the case. Without that the codegen partitioning ruins some inline opportunities in easily perturbed ways.

saethlin · 2021-03-28T17:56:44Z

Try benchmarking with incremental compilation off and codgenunits == 1 if that's not already the case. Without that the codegen partitioning ruins some inline opportunities in easily perturbed ways.

I believe I already have those settings? Those are the defaults, as far as I can tell. Though honestly the whole x.py thing is deeply mysterious to me, I can't for example figure out how to get debug symbols in the benchmark binaries.

the8472 · 2021-03-28T18:07:26Z

x.py delegates to src/bootstrap, which pieces together the build procedure. I think the debuginfo is controlled by debuginfo-level-std or debuginfo-level-tests in config.toml

dtolnay

Looks good! I am happy you are willing to get these in better shape.

I am on board with removing asserts from the benchmarks. Trying to get the benchmark suite to serve double duty as tests and benchmarks seems misguided. Too much test-only code inline complicates seeing what is actually being benchmarked. The functionality of cargo test / x.py test of also running 1 iteration of your benchmarks is just to smoke test that they don't crash, it isn't meant to be the place that you do the main exhaustive testing of library functionality.

A number of these benchmarks are poorly-named. Should I fix these, or is there tooling that depend on the names not changing?

I do not know of anything like that. Feel free to fix the names.

dtolnay · 2021-03-30T02:23:23Z

@bors r+

bors · 2021-03-30T02:23:25Z

📌 Commit 8c88418 has been approved by dtolnay

bors · 2021-03-30T09:03:34Z

⌛ Testing commit 8c88418 with merge 689e847...

bors · 2021-03-30T11:44:29Z

☀️ Test successful - checks-actions
Approved by: dtolnay
Pushing 689e847 to master...

rust-highfive assigned dtolnay Mar 25, 2021

rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 25, 2021

the8472 reviewed Mar 25, 2021

View reviewed changes

dtolnay approved these changes Mar 30, 2021

View reviewed changes

bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Mar 30, 2021

bors added the merged-by-bors This PR was explicitly merged by bors. label Mar 30, 2021

bors merged commit 689e847 into rust-lang:master Mar 30, 2021

rustbot added this to the 1.53.0 milestone Mar 30, 2021

saethlin deleted the improve-vec-benches branch May 16, 2022 04:36

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clean up Vec's benchmarks #83458

Clean up Vec's benchmarks #83458

saethlin commented Mar 25, 2021 •

edited

Loading

rust-highfive commented Mar 25, 2021

the8472 left a comment

the8472 Mar 25, 2021

saethlin commented Mar 25, 2021

the8472 commented Mar 25, 2021 •

edited

Loading

the8472 commented Mar 25, 2021

saethlin commented Mar 28, 2021

the8472 commented Mar 28, 2021

dtolnay left a comment

dtolnay commented Mar 30, 2021

bors commented Mar 30, 2021

bors commented Mar 30, 2021

bors commented Mar 30, 2021

Clean up Vec's benchmarks #83458

Clean up Vec's benchmarks #83458

Conversation

saethlin commented Mar 25, 2021 • edited Loading

rust-highfive commented Mar 25, 2021

the8472 left a comment

Choose a reason for hiding this comment

the8472 Mar 25, 2021

Choose a reason for hiding this comment

saethlin commented Mar 25, 2021

the8472 commented Mar 25, 2021 • edited Loading

the8472 commented Mar 25, 2021

saethlin commented Mar 28, 2021

the8472 commented Mar 28, 2021

dtolnay left a comment

Choose a reason for hiding this comment

dtolnay commented Mar 30, 2021

bors commented Mar 30, 2021

bors commented Mar 30, 2021

bors commented Mar 30, 2021

saethlin commented Mar 25, 2021 •

edited

Loading

the8472 commented Mar 25, 2021 •

edited

Loading