-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
expose rebuild and implement drain_filter for BinaryHeap #104210
Conversation
dis-da-moe
commented
Nov 9, 2022
•
edited
Loading
edited
- fixes Add drain_filter for BinaryHeap #42849
- exposes rebuild to allow for restoration of the heap after interior mutation
Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @Mark-Simulacrum (or someone else) soon. Please see the contribution instructions for more information. |
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
@rustbot label +T-libs-api -T-lib |
BinaryHeap doesn't expose iter_mut today either (see #98522 -- this is largely intentional) -- it seems like there's not an API for mutably accessing arbitrary elements. It seems like the implementation here would not leave a correct heap behind if elements are removed (unless I'm missing something)? If that's the intended API, it doesn't seem like a huge benefit over converting into Vec and then calling drain_filter on that. Can you say more about your use case for this? r? libs-api -- I'm personally not convinced we should have this API, so shifting to libs-api for review. |
Seeing the issues with |
I think &T/&mut T help, but the broader issue is that drain_filter will leave the BinaryHeap in a non-heap state unless we do a full rebuild afterwards, presuming a subset of the elements are removed. (I think this PR may be missing that rebuild). If we do a full rebuild, there's no real advantage to having the API on BinaryHeap vs. on Vec, and it seems like an unusual thing to want to do, so that further decreases in my eyes the desire to add it. The (new) assigned reviewer or libs-api in general will likely weigh in within the next week or two on whether this is an API we want on BinaryHeap or not. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The documentation/example code as currently written in the PR is definitely misleading as to the implemented behavior and the impact on BinaryHeap's heap invariant and all its other methods. In particular it appears as though the initial data in the example code has been cherry-picked so that the heap coincidentally remains a heap after the drain_filter operation (notice the specific omitted integers in the 1..=15 range). The same code with different initial data would not behave in the way that is implied by the PR:
let mut a = BinaryHeap::from(vec![4, 1, 3]);
let mut _evens = a.drain_filter(|x| *x % 2 == 0).collect::<Vec<_>>();
let odds = a.into_sorted_vec();
assert_eq!(odds, vec![1, 3]); // fail, it's actually [3, 1] which is not sorted
I would prefer not to add this method, nor the alternative discussed above where it rebuilds the heap during Drop of the iterator. My counterproposal is that users should use either into_vec()
(if they own the BinaryHeap) or mem::take
(if they have &mut), then do their drain_filter
on the Vec, then use BinaryHeap::from
to rebuild the heap invariant.
I think it could be reasonable for BinaryHeap to expose methods that allow temporarily breaking the heap invariant and later cleaning it up, as long as someone proposes a naming convention or other API approach that makes it clear when a method is destructive toward the heap invariant. The "later cleaning it up" part is already potentially useful even without adding a drain_filter-like method, since heaps that contain interior mutability (like BinaryHeap<Cell<i32>>
) can already end up broken just using the current API. This is described as a logic error here:
rust/library/alloc/src/collections/binary_heap.rs
Lines 165 to 172 in 28a53cd
/// It is a logic error for an item to be modified in such a way that the | |
/// item's ordering relative to any other item, as determined by the [`Ord`] | |
/// trait, changes while it is in the heap. This is normally only possible | |
/// through [`Cell`], [`RefCell`], global state, I/O, or unsafe code. The | |
/// behavior resulting from such a logic error is not specified, but will | |
/// be encapsulated to the `BinaryHeap` that observed the logic error and not | |
/// result in undefined behavior. This could include panics, incorrect results, | |
/// aborts, memory leaks, and non-termination. |
but suppose we document which methods in BinaryHeap's API do not assume the heap invariant as a precondition (like iter
and into_vec
), and then say it's only a logic error to call methods outside of that set while the heap invariant is messed up, and change rebuild()
to public (the same operation used by From<Vec<T>> for BinaryHeap<T>
) so that the user can restore the heap invariant in-place after a sequence of destructive operations, or mutating Cells or whatever they've done.
Once that's figured out I think it would be reasonable to talk about reintroducing a drain_filter operation that is documented as leaving the heap invariant violated.
I didn't mean to cherry-pick the data. The omitted integers were a typo. I also didn't know that the heap was left in an invalid state. Would it be useful to encode whether the BinaryHeap is valid or not in the type system instead of relying on documentation? For example a drain_filter would return an |
An The use case for pub struct Stuff {
heap: BinaryHeap<T>,
}
impl Stuff {
pub fn method(&mut self) {
// temporarily violate heap invariant in place
...
// then
self.heap.rebuild();
}
} |
☔ The latest upstream changes (presumably #106209) made this pull request unmergeable. Please resolve the merge conflicts. |
Leak amplification for peek_mut() to ensure BinaryHeap's invariant is always met In the libs-api team's discussion around rust-lang#104210, some of the team had hesitations around exposing malformed BinaryHeaps of an element type whose Ord and Drop impls are trusted, and which does not contain interior mutability. For example in the context of this kind of code: ```rust use std::collections::BinaryHeap; use std::ops::Range; use std::slice; fn main() { let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']; let cut_points = BinaryHeap::from(vec![4, 2, 7]); println!("{:?}", chop(slice, cut_points)); } // This is a souped up slice::split_at_mut to split in arbitrary many places. // // usize's Ord impl is trusted, so 1 single bounds check guarantees all those // output slices are non-overlapping and in-bounds fn chop<T>(slice: &mut [T], mut cut_points: BinaryHeap<usize>) -> Vec<&mut [T]> { let mut vec = Vec::with_capacity(cut_points.len() + 1); let max = match cut_points.pop() { Some(max) => max, None => { vec.push(slice); return vec; } }; assert!(max <= slice.len()); let len = slice.len(); let ptr: *mut T = slice.as_mut_ptr(); let get_unchecked_mut = unsafe { |range: Range<usize>| &mut *slice::from_raw_parts_mut(ptr.add(range.start), range.len()) }; vec.push(get_unchecked_mut(max..len)); let mut end = max; while let Some(start) = cut_points.pop() { vec.push(get_unchecked_mut(start..end)); end = start; } vec.push(get_unchecked_mut(0..end)); vec } ``` ```console [['7', '8', '9'], ['4', '5', '6'], ['2', '3'], ['0', '1']] ``` In the current BinaryHeap API, `peek_mut()` is the only thing that makes the above function unsound. ```rust let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']; let mut cut_points = BinaryHeap::from(vec![4, 2, 7]); { let mut max = cut_points.peek_mut().unwrap(); *max = 0; std::mem::forget(max); } println!("{:?}", chop(slice, cut_points)); ``` ```console [['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], [], ['2', '3'], ['0', '1']] ``` Or worse: ```rust let slice = &mut ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9']; let mut cut_points = BinaryHeap::from(vec![100, 100]); { let mut max = cut_points.peek_mut().unwrap(); *max = 0; std::mem::forget(max); } println!("{:?}", chop(slice, cut_points)); ``` ```console [['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'], [], ['0', '1', '2', '3', '4', '5', '6', '7', '8', '9', '\u{1}', '\0', '?', '翾', '?', '翾', '\0', '\0', '?', '翾', '?', '翾', '?', '啿', '?', '啿', '?', '啿', '?', '啿', '?', '啿', '?', '翾', '\0', '\0', '', '啿', '\u{5}', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\0', '\u{8}', '\0', '`@',` '\0', '\u{1}', '\0', '?', '翾', '?', '翾', '?', '翾', ' thread 'main' panicked at 'index out of bounds: the len is 33 but the index is 33', library/core/src/unicode/unicode_data.rs:319:9 note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace ``` --- This PR makes `peek_mut()` use leak amplification (https://doc.rust-lang.org/1.66.0/nomicon/leaking.html#drain) to preserve the heap's invariant even in the situation that `PeekMut` gets leaked. I'll also follow up in the tracking issue of unstable `drain_sorted()` (rust-lang#59278) and `retain()` (rust-lang#71503).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We discussed this proposal with the library API team, and came away not on board with exposing invalid heaps. The preference was in the opposite direction (#105851) by closing up all the API gaps that currently allow invalid heaps to occur.
A drain_filter
which involves rebuilding the heap in Drop would perhaps be okay (there wasn't consensus about this; some saw drain_filter as just not a good fit for a heap data structure, and the heap rebuild operation is already available via the conversion from Vec) but only if leaking the DrainFilter iterator cannot possibly leave the heap in the invalid state.
Alright. I don't think I'll go through with this since it seems like not the best design. Thanks for the experience. |