Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Iterator::collect_into<E: Extend>(e: E)? #45840

Closed
stuhood opened this issue Nov 7, 2017 · 13 comments
Closed

Add Iterator::collect_into<E: Extend>(e: E)? #45840

stuhood opened this issue Nov 7, 2017 · 13 comments
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@stuhood
Copy link

stuhood commented Nov 7, 2017

Hey gang.

As originally determined in https://twitter.com/kot_2010/status/927119253324619776, building a pre-size-hinted mutable Vec as the output of a loop (even for very small n) can occasionally be more than twice as fast as collecting into a Vec. As demonstrated here https://github.com/stuhood/rust-vec-bench, using extend into a pre-size-hinted Vec closes most of the gap.

Focusing on improving Iterator::size_hint implementations so that they suggest better sizes for collect would likely help reduce this gap. But there are diminishing returns to heuristics, and in many cases users should be able to give better explicit hints.

This issue initially proposed adding an Iterator::collect_with_size_hint method, but @scottmcm 's suggestion to add Iterator::collect_into<E: Extend>(e: E) -> E seems more general, and suggests a useful analogue in unzip_into<E1: Extend, E2: Extend>(e1: E1, e2: E2) -> (E1, E2).

@durka
Copy link
Contributor

durka commented Nov 8, 2017

The problem is that Filter et al can't return a "better" size hint because the trait contract is that the lower bound must actually be a lower bound. Maybe that should be relaxed?

Another possibility is collect could look at the upper bound and be smarter about what capacity to use? This wouldn't always be advantageous because you have no idea how many elements will be filtered out.

You can also do this -- but note that OverrideSizeHint is explicitly violating the trait contract:

type SizeHint = (usize, Option<usize>);
struct OverrideSizeHint<I>(I, SizeHint);
extension_trait! {
    <I: Iterator> SizeHintExt<I> for I {
        fn override_size_hint(self, hint: SizeHint) -> OverrideSizeHint<I> {
            OverrideSizeHint(self, hint)
        }
    }
}
impl<I: Iterator> Iterator for OverrideSizeHint<I> {
    type Item = I::Item;

    fn next(&mut self) -> Option<I::Item> {
        self.0.next()
    }

    fn size_hint(&self) -> SizeHint {
        self.1
    }
}

fn collect_hint(ratio: u8, chars: &[char]) -> Vec<char> {
    chars.iter().enumerate().filter_map(|(i, c)| {
        if filter(ratio, i) {
            Some(*c)
        } else {
            None
        }
    }).override_size_hint((chars.len(), Some(chars.len()))).collect()
}

@scottmcm
Copy link
Member

Hmm, what about .collect_into(Vec::with_capacity(5)), or something? (Name TBD of course)

collect is the method-chain-syntax for FromIterator; this would be the equivalent for Extend.

@stuhood
Copy link
Author

stuhood commented Nov 27, 2017

@scottmcm : I like that alternative as well.

@kennytm kennytm added C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. labels Nov 27, 2017
@scottmcm
Copy link
Member

I noticed the other day that unzip has a similar problem. It always just extends with Some(x), so the size_hint seen by the extend code is always exactly 1, and thus useless for pre-allocating.

@stuhood stuhood changed the title collect_with_size_hint? Add Iterator::collect_into<E: Extend>(e: E)? Nov 29, 2017
@stuhood
Copy link
Author

stuhood commented Nov 29, 2017

Updated description/title to apply @scottmcm 's collect_into approach instead.

@dtolnay
Copy link
Member

dtolnay commented Dec 18, 2017

Do any Iterator impls need to override the default collect_into or would the default always be optimal?

trait Iterator {
    /* ... */

    fn collect_into<E: Extend<Self::Item>>(self, e: E) -> E {
        e.extend(self);
        e
    }
}

As I understand it, the following are always equivalent right?

let vec = whatever.into_iter().etc(...).collect_into(Vec::with_capacity(N));
let mut vec = Vec::with_capacity(N);
vec.extend(whatever.into_iter().etc(...));

Is the advantage of collect_into that the method nesting works out nicer?

@stuhood
Copy link
Author

stuhood commented Dec 18, 2017

Is the advantage of collect_into that the method nesting works out nicer?

Yes... and avoiding the mutable local variable is nice as well. Finally, the symmetry with collect is appealing.

@CodeSandwich
Copy link

CodeSandwich commented Jan 27, 2018

I love the idea of collecting into an existing collection, but for another reason: ergonomics. I've proposed something similar on Rust user and internals forums.

I think, that the function (extend_into?) should accept collections as BorrowMut to allow passing them as references. Also it could return the passed collection to allow nice chaining.

@cuviper
Copy link
Member

cuviper commented Jan 29, 2018

We might also consider partition_into and unzip_into for symmetry. Iterator::partition and unzip already use Extend, but they create new Default collections to extend into.

Also, FWIW rayon already has collect_into and unzip_into with different semantics than those proposed here, but we could consider renaming in rayon before it reaches 1.0 -- rayon-rs/rayon#519.

@CodeSandwich
Copy link

I feel that extend_into was received quite well. I want to make a next step and propose an official RFC for it. Here's what I want to issue.

@stuhood I've included your motivation, are you OK with that?

bors added a commit that referenced this issue Aug 5, 2018
Try to guess a smarter initial capacity in Vec::from_iter

> Another possibility is collect could look at the upper bound and be smarter about what capacity to use?  ~ #45840 (comment)

This is obviously good for hints like `(60, Some(61))` where today we allocate space for 60, then double it should the additional element show up, and it'd be much better to just always allocate 61.

More nuanced are hints like `(0, Some(150))`, where today the code uses just the `0`, and thus starts at a capacity of `1`, but with this change will start at `10` instead.

This can undeniably increase memory pressure over where it is today, so I expect at least some controversy 🙂  It does use `try_reserve` for the allocation that's more than the lower bound, and thus shouldn't introduce new aborts, at least.  And the starting point grows by the root, keeping things fairly contained: even with an hint of `(0, Some(1_000_000_000))` it'll only start at `30_517`.

cc @ljedrz
cc #48994
@Kinrany
Copy link
Contributor

Kinrany commented Sep 10, 2021

Were there any further developments?

@CodeSandwich your RFC link seems to be out of date.

@CodeSandwich
Copy link

As of now it's dead 😆 It was too small for an RFC, so I was asked to implement it, but it was rejected #48597

@rust-lang rust-lang deleted a comment from MatrixDev Oct 22, 2021
@cuviper
Copy link
Member

cuviper commented Mar 30, 2023

Iterator::collect_into was added in #93057, now tracked in #94780.

@cuviper cuviper closed this as completed Mar 30, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

8 participants