Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix Itertools::k_smallest on short unfused iterators #900

Merged
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 11 additions & 14 deletions src/k_smallest.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@ use alloc::vec::Vec;
use core::cmp::Ordering;

/// Consumes a given iterator, returning the minimum elements in **ascending** order.
pub(crate) fn k_smallest_general<I, F>(mut iter: I, k: usize, mut comparator: F) -> Vec<I::Item>
pub(crate) fn k_smallest_general<I, F>(iter: I, k: usize, mut comparator: F) -> Vec<I::Item>
where
I: Iterator,
F: FnMut(&I::Item, &I::Item) -> Ordering,
Expand Down Expand Up @@ -44,6 +44,7 @@ where
if k == 0 {
return Vec::new();
}
let mut iter = iter.fuse();
let mut storage: Vec<I::Item> = iter.by_ref().take(k).collect();

let mut is_less_than = move |a: &_, b: &_| comparator(a, b) == Ordering::Less;
Expand All @@ -55,19 +56,15 @@ where
sift_down(&mut storage, &mut is_less_than, i);
}

if k == storage.len() {
// If we fill the storage, there may still be iterator elements left so feed them into the heap.
// Also avoids unexpected behaviour with restartable iterators.
iter.for_each(|val| {
if is_less_than(&val, &storage[0]) {
// Treating this as an push-and-pop saves having to write a sift-up implementation.
// https://en.wikipedia.org/wiki/Binary_heap#Insert_then_extract
storage[0] = val;
// We retain the smallest items we've seen so far, but ordered largest first so we can drop the largest efficiently.
sift_down(&mut storage, &mut is_less_than, 0);
}
});
}
iter.for_each(|val| {
if is_less_than(&val, &storage[0]) {
// Treating this as an push-and-pop saves having to write a sift-up implementation.
// https://en.wikipedia.org/wiki/Binary_heap#Insert_then_extract
storage[0] = val;
// We retain the smallest items we've seen so far, but ordered largest first so we can drop the largest efficiently.
sift_down(&mut storage, &mut is_less_than, 0);
}
});

// Ultimately the items need to be in least-first, strict order, but the heap is currently largest-first.
// To achieve this, repeatedly,
Expand Down
7 changes: 4 additions & 3 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -2949,7 +2949,7 @@ pub trait Itertools: Iterator {
/// itertools::assert_equal(five_smallest, 0..5);
/// ```
#[cfg(feature = "use_alloc")]
fn k_smallest(mut self, k: usize) -> VecIntoIter<Self::Item>
fn k_smallest(self, k: usize) -> VecIntoIter<Self::Item>
where
Self: Sized,
Self::Item: Ord,
Expand All @@ -2963,9 +2963,10 @@ pub trait Itertools: Iterator {
return Vec::new().into_iter();
}

let mut heap = self.by_ref().take(k).collect::<BinaryHeap<_>>();
let mut iter = self.fuse();
let mut heap: BinaryHeap<_> = iter.by_ref().take(k).collect();

self.for_each(|i| {
iter.for_each(|i| {
debug_assert_eq!(heap.len(), k);
// Equivalent to heap.push(min(i, heap.pop())) but more efficient.
// This should be done with a single `.peek_mut().unwrap()` but
Expand Down