Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inclusive version of take_while #62208

Open
jonhoo opened this issue Jun 28, 2019 · 17 comments
Open

Inclusive version of take_while #62208

jonhoo opened this issue Jun 28, 2019 · 17 comments
Labels
A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@jonhoo
Copy link
Contributor

jonhoo commented Jun 28, 2019

Iterator::take_while stops before the first element where the condition evaluates to false. That is the correct behavior given its name, but it makes it difficult to write an iterator that should include the boundary element. In particular consider something like iterating over a histogram, trimming off the tails:

for v in histogram
    .iter_linear(1_000)
    .skip_while(|v| v.quantile() < 0.01)
    .take_while(|v| v.quantile() < 0.99)

This may seem right, but consider an iterator where the elements are such that v.quantile() yields this sequence: [0, 0.02, 0.99]. Here, the bin that spans <0.02-0.99] will be dropped, even though it includes the majority of the samples. What we really want to do is take from the iterator until we have received an element whose v.quantile() > 0.99. To write that with take_while, we need something like:

let mut got_true = true;
for v in h
    .iter_linear(1_000)
    .skip_while(|v| v.quantile() < 0.01)
    .take_while(move |v| {
        if got_true {
            // we've already yielded when condition was true
            return false;
        }
        if v.quantile() > 0.99 {
            // this must be the first time condition returns true
            // we should yield i, and then no more
            got_true = true;
        }
        // we should keep yielding
        true
    })

Which isn't exactly obvious.

So, I propose we add Iterator::take_until_inclusive, which yields elements up to and including the first element for which its argument returns true. The name isn't great, but I'm also struggling to come up with a better one. In some sense, I want the function to communicate that it yields every item it evaluates. Some other possible names if we're willing to invert the boolean: break_once, break_after, end_at, end_once, end_after.

Thoughts?

@jonas-schievink jonas-schievink added A-iterators Area: Iterators T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. C-feature-request Category: A feature request, i.e: not implemented / a PR. labels Jun 28, 2019
@Lonami
Copy link
Contributor

Lonami commented Jun 28, 2019

How about a take_until?

for v in h
    .iter_linear(1_000)
    .skip_while(|v| v.quantile() < 0.01)
    .take_until(|v| v.quantile() >= 0.99)

@jonhoo
Copy link
Contributor Author

jonhoo commented Jun 28, 2019

@Lonami take_until to me implies that it does not take the first element where the condition is true, which is why I went with take_until_inclusive instead. In some sense, at least to my ears, take_until(f) is the same as take_while(!f).

@Lonami
Copy link
Contributor

Lonami commented Jun 28, 2019

Oh, I completely missed you mentioned take_until_inclusive already, sorry!

@jonhoo
Copy link
Contributor Author

jonhoo commented Jun 28, 2019

No worries at all, happens to us all! I'd be curious to hear your thoughts on whether the _inclusive bit is necessary though :)

@yaahc
Copy link
Member

yaahc commented Jun 28, 2019

take_to maybe?

@clarfonthey
Copy link
Contributor

clarfonthey commented Jun 29, 2019

How about a generic take_while_then? The signature would be:

fn take_while_then<F, T, I>(self, while: F, then: T) -> TakeWhileThen<Self, F, T, I>
while
    for<'a> F: FnOnce(&'a Self::Item) -> bool,
    T: FnOnce(Self) -> I,
    I: IntoIterator;

The resulting adapter would:

  1. Return the elements of self while while returns true.
  2. Then, return the elements of then(self).

The existing take_while would be equivalent to self.take_while(while, |_| empty()), and take_while_inclusive would be equivalent to self.take_while(while, |x| x.take(1)), although presumably more efficient.

Although I'm not sure how useful this kind of method would be, and it may be better to just have some form of take_while_inclusive by itself, as this is probably a common pattern.

@hdevalke
Copy link

hdevalke commented Sep 9, 2019

I also needed this recently to decode some protocol where a stop bit is used to indicate the last byte of a field. The other 7 bits are used to encode the data. I could not use the iterator API because of this.

The implementation should be less or more the same as TakeWhile.

next could be implemented as:

    #[inline]
    fn next(&mut self) -> Option<I::Item> {
        if self.flag {
            None
        } else {
            let x = self.iter.next()?;
            if (self.predicate)(&x) {
                self.flag = true;
            }
            Some(x)
        }
    }

This would, however, take all elements until some condition is true, which is not the same as take_while_inclusive.

The BufRead::read_until exists and reads up to inclusive the delimiter. So take_until seems to be a good name.

@hdevalke
Copy link

I created a crate with this functionality: https://github.com/hdevalke/take-until

@michaelsproul
Copy link
Contributor

Could be a good fit for the itertools crate (@hdevalke you could PR your code upstream)

@hdevalke
Copy link

@michaelsproul itertools already contains take_while_ref.

@michaelsproul
Copy link
Contributor

michaelsproul commented Apr 15, 2020

I don't think take_while_ref or peeking_take_while are well suited to this because they require you to assign the iterator to a local variable, and muck around with mutation, which breaks the flow of the iterator chain. The best I could do with them was this, which isn't as nice as take_until.

If you don't mind, I'd be happy to try and get your code into itertools, giving you credit of course! (Co-authored-by on the commit)

@hdevalke
Copy link

@michaelsproul no problem if you want to do it, most of the code I wrote was inspired by the implementation of TakeWhile.

@JohnScience
Copy link
Contributor

@hdevalke @michaelsproul Did it get into itertools? I didn't find it there at least by this name.

@michaelsproul
Copy link
Contributor

I'm sorry, I never got around to adding it!

@JohnScience
Copy link
Contributor

@michaelsproul That's regrettable. I opened an issue (rust-itertools/itertools#597). I got around in my project using the take_until crate but I might come back only much much later to check the implementation and make a PR.

@quixoticaxis
Copy link

quixoticaxis commented Feb 18, 2022

The implementation in the first post also seems to suffer from the need to take one more element from the underlying iterator when condition has been already met, which is probably fine for iterators working on data, but may become an issue for iterators that hide long operations.
It is not obvious for me from the discussion, what is the current stance on adding take_until to std?

On a side note, is there any idiomatic way to construct impl Iterator<Item = Result<T, E>> that stops after the first error and fetches no more elements based on the infinite iterator impl Iterator<Item = Result<T, E>>? Apart from using a library or rolling your own implementation, of course.

Samyak2 added a commit to psiayn/spressolisp that referenced this issue Feb 21, 2022
Lambda now stores tokens of the params:
- Beacuse we store the params as just a vec of strings, we can't store
  tokens inside each param.
- Impl TokenHoarder and TokenGiver for Lambda, but these must only be
  used for tokens of the parameters, not the body
- Impl constructor for Lambda since the `param_tokens` are private

Better error display in lambda:
- During creation, lambdas now store the tokens of the parameters
- Shows nice error if wrong number of args are given to `lambda` the
  built-in function
- Shows nice error if any of the params are not a symbol
- Shows nice error if number of args and params mismatch - shows both

Fix lambda return value bug:
- When the body of a lambda had an error, it would return `false`
  instead of showing the error
- This was because `take_while` would consume the `Result` which had
  error, discard it and stop taking.
    - when it was the only expr in the body, it would return an empty
      iterator which causes the default value in `unwrap_or` to be
      returned (which was `false`)
- Replaced the take_while with a manual loop
    - returns the first error we see and does not consume further
    - keeps storing the last (not error) result
    - returns the last result or `false` is nothing is available
- Note: this could have been written a little more elegantly using
  something like `take_until`:
    - rust-lang/rust#62208
    - rust-itertools/itertools#597
Jazzinghen added a commit to Jazzinghen/AdventOfCode that referenced this issue Dec 8, 2022
Since rust doest have an inclusive "take_while"
(rust-lang/rust#62208) for iterators I had to
use an extra crate (https://crates.io/crates/take-until), which is
something I don't particularly like as I believe crates should do
something more than just adding a single thing.

Oh, well, they'll add it!
fistons added a commit to fistons/AOC-2022 that referenced this issue Dec 8, 2022
@UnlimitedCookies
Copy link

Would it make sense to extend the PR to allow taking n additional elements after the condition is untrue? Perhaps proposing an even more general api design, where you can get a slice of an Iterator, which you could extend by some number? I also like the approach of take_while_then(…).

let mut got_true = true;
for v in h
    .iter_linear(1_000)
    .skip_while(|v| v.quantile() < 0.01)
    .take_while(move |v| {
        if got_true {
            // we've already yielded when condition was true
            return false;
        }
        if v.quantile() > 0.99 {
            // this must be the first time condition returns true
            // we should yield i, and then no more
            got_true = true;
        }
        // we should keep yielding
        true
    })

Also, shouldn't the example initialize got_true with false? And I guess you can also drop the move from the closure.

bors bot added a commit to rust-itertools/itertools that referenced this issue Jun 19, 2023
616: Add `take_while_inclusive` method r=jswrenn a=e-rhodes

Adds `take_while_inclusive` method that returns elements as long as a predicate is satisfied, but including the first element that doesn't satisfy the predicate.

First discussed addition to std in <rust-lang/rust#62208>.

Closes #597.

Co-authored-by: Erik Rhodes <erik@space-nav.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-iterators Area: Iterators C-feature-request Category: A feature request, i.e: not implemented / a PR. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

No branches or pull requests

10 participants