Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iterator::unzip not zero cost #72085

Closed
ThouCheese opened this issue May 10, 2020 · 11 comments
Closed

Iterator::unzip not zero cost #72085

ThouCheese opened this issue May 10, 2020 · 11 comments
Labels
A-iterators Area: Iterators C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.

Comments

@ThouCheese
Copy link
Contributor

In my attempt to move some code to a more functional style I found that the performance regressed significantly. I have narrowed this regression down to the Iterator::unzip function.

I tried this code:

extern crate test;
use test::{Bencher, black_box};

fn run_functional(l: &Vec<(usize, usize)>) -> (Vec<usize>, Vec<usize>) {
    l.iter().copied().unzip()
}

fn run_imperative(l: &Vec<(usize, usize)>) -> (Vec<usize>, Vec<usize>) {
    let len = l.len();
    let (mut result1, mut result2) = (Vec::with_capacity(len), Vec::with_capacity(len));
    for item in l.iter().copied() {
        result1.push(item.0);
        result2.push(item.1);
    }
    (result1, result2)
}

#[bench]
fn bench_functional(b: &mut Bencher) {
    let list = black_box(vec![(1, 2); 256]);
    b.iter(|| run_functional(&list));
}

#[bench]
fn bench_imperative(b: &mut Bencher) {
    let list = black_box(vec![(1, 2); 256]);
    b.iter(|| run_imperative(&list));
}

I expected to see this happen:
I expect these benchmarks to yield the same results, since they both attempt to do the same thing.

Instead, this happened:
The benchmark results are:

test bench_functional ... bench:       1,440 ns/iter (+/- 66)
test bench_imperative ... bench:         443 ns/iter (+/- 43)

From what I can tell from reading the code in the standard library, there are two reasons why the imperative style loop is so much faster. The imperative version uses Vec::push instead of Extend::extend. Here push is significantly faster. The rest of the difference comes from the fact that the vectors are initialized using with_capacity, whereas the functional variant seems to reallocate. I get the same performance when I change my imperative code to this:

fn run_imperative(l: &Vec<(usize, usize)>) -> (Vec<usize>, Vec<usize>) {
    let (mut result1, mut result2) = (Vec::new(), Vec::new());
    for item in l.iter().copied() {
        result1.extend(Some(item.0));
        result2.extend(Some(item.1));
    }
    (result1, result2)
}

Link to the Reddit thread, with a faster but hacky implementation from u/SkiFire13

Meta

I have reproduced this issue on the latest nightly compiler (as of writing):

$ rustc --version --verbose
rustc 1.45.0-nightly (bad3bf622 2020-05-09)
binary: rustc
commit-hash: bad3bf622bded50a97c0a54e29350eada2a3a169
commit-date: 2020-05-09
host: x86_64-unknown-linux-gnu
release: 1.45.0-nightly
LLVM version: 9.0
@ThouCheese ThouCheese added the C-bug Category: This is a bug. label May 10, 2020
@jonas-schievink jonas-schievink added A-iterators Area: Iterators I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. labels May 10, 2020
@leo60228
Copy link
Contributor

Usage of extend is irrelevant here, vec.extend(Some(x)) has the same performance as vec.push(x) when it's benchmarked on its own.

@leo60228
Copy link
Contributor

Wait, what? extend seems to be only sometimes worse than push...

@ThouCheese
Copy link
Contributor Author

ThouCheese commented May 13, 2020

In my experiments i got some significantly worse performance by just replacing .push(a) with .extend(Some(a)), so I think that usage of extend is definitely relevant here.

@cuviper
Copy link
Member

cuviper commented May 13, 2020

There's no common push/with_capacity methods -- unzip has to work with its constraints:

fn unzip<A, B, FromA, FromB>(self) -> (FromA, FromB)
where
    FromA: Default + Extend<A>,
    FromB: Default + Extend<B>,
    Self: Iterator<Item = (A, B)>, 

However, I'm playing with adding those as optional methods on Extend...

@leo60228
Copy link
Contributor

@cuviper The root cause seems to be that vec.extend(Some(x)) has very poor performance in some scenarios. I think we should try to fix that instead of working around it.

@cuviper
Copy link
Member

cuviper commented May 13, 2020

@leo60228 Another possibility is to add SpecExtend for option::IntoIter. That won't help with pre-allocating capacity, but it can at least be streamlined to act like push. However, specialization is still unstable, while new Extend methods could help any user's collection type.

Quick preview doing it my way, before:

test bench_functional ... bench:       1,522 ns/iter (+/- 57)
test bench_imperative ... bench:         408 ns/iter (+/- 15)

After:

test bench_functional ... bench:         423 ns/iter (+/- 8)
test bench_imperative ... bench:         414 ns/iter (+/- 14)

leo60228 added a commit to leo60228/rust that referenced this issue May 13, 2020
Closes rust-lang#72085

This consists of the following optimizations:

* Adds a `with_capacity` function to `Extend`. This definitely needs more thought if it's going to be stabilized, so I'm not writing an RFC yet. This takes off most of the performance gap.
* Optimizes `Vec`'s `Extend` implementation for the case where `size_hint` is 1. This shaves off the remaining performance gap.
@leo60228
Copy link
Contributor

Another possibility is to add SpecExtend for option::IntoIter.

I don't believe this is possible. You can't currently specialize on trait generics, only the type the trait is implemented for.

@cuviper
Copy link
Member

cuviper commented May 13, 2020

You can't currently specialize on trait generics, only the type the trait is implemented for.

I don't know why you say that -- if you look at each of the implementations below, they're all for Vec<T>, just specializing based on the I type parameter.

rust/src/liballoc/vec.rs

Lines 2059 to 2061 in 4802f09

impl<T, I> SpecExtend<T, I> for Vec<T>
where
I: Iterator<Item = T>,

rust/src/liballoc/vec.rs

Lines 2090 to 2092 in 4802f09

impl<T, I> SpecExtend<T, I> for Vec<T>
where
I: TrustedLen<Item = T>,

impl<T> SpecExtend<T, IntoIter<T>> for Vec<T> {

So I'm suggesting:

impl<T> SpecExtend<T, option::IntoIter<T>> for Vec<T> {

@leo60228
Copy link
Contributor

I tried that and got coherence errors, and I asked on Discord and that was the answer I got. Perhaps it's just a diagnostics issue: I was using Option, not option::IntoIter.

@cuviper
Copy link
Member

cuviper commented May 13, 2020

Hmm, for coherence I'm guessing that it couldn't guarantee whether or not Option overlaps those generic I specializations, whereas option::IntoIter should be a strict subset.

@ThouCheese
Copy link
Contributor Author

This is resolved on the latest nightly, that you for your efforts @cuviper and others!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators C-bug Category: This is a bug. I-slow Issue: Problems and improvements with respect to performance of generated code. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.

4 participants