Dont segfault if btree range is not in order #39457

bvinc · 2017-02-02T07:32:16Z

This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum.

Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case.

Open questions:

Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible.

The same question for other weird cases:
2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards.
3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5.
4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?

rust-highfive · 2017-02-02T07:32:59Z

Thanks for the pull request, and welcome! The Rust team is excited to review your changes, and you should hear from @BurntSushi (or someone else) soon.

If any changes to this PR are deemed necessary, please add them as extra commits. This ensures that the reviewer can see what has changed since they last reviewed the code. Due to the way GitHub handles out-of-date commits, this should also make it reasonably obvious what issues have or haven't been addressed. Large or tricky changes may require several passes of review and changes.

Please see the contribution instructions for more information.

BurntSushi · 2017-02-02T11:50:28Z

src/libcollections/btree/map.rs

+
+            min_edge = match (min_found, min) {
+                (false, Included(key)) => {
+                    match search::search_linear(&min_node, key) {


Why change the search_tree calls to search_linear? (FYI, I'm not familiar with the btree impl.)

search_tree searches the entire tree and returns a handle anywhere in the tree. I need to search a single node at a time and be able to compare 2 searches. You might think that I want search_node, but that returns a handle. I need to be able to compare 2 edges of the same node, which I can't. search_linear just gives me an index value into the node, which is exactly what I need.

BurntSushi · 2017-02-02T11:54:04Z

cc @rust-lang/libs

Someone more familiar with the btree impl should review this. cc @gereeter

With respect to the panic, I think that's consistent behavior with other collections. Either way, I think this behavior should be documented.

alexcrichton · 2017-02-03T22:05:39Z

also cc @arthurprs and @apasel422

arthurprs · 2017-02-04T11:38:08Z

This is very smart 👍

I wonder if it makes sense to have a fn nodes_diverge(n1, n2) in node.rs that take 2 handles and returns if n1.node == n2.node && n1.idx != n2.idx, I didn't really try it but I guess it can really simplify the range function (with less local vars like the previous version).

alexcrichton · 2017-02-04T22:40:02Z

@bors: delegate=arthurprs

bors · 2017-02-04T22:40:03Z

✌️ @arthurprs can now approve this pull request

bvinc · 2017-02-07T04:41:51Z

I've made some small cleanup changes, implemented range_mut as well, and changed the implementation so that it should never panic, but only return an empty iterator.

@arthurprs That would only get rid of one variable, right? Personally, it doesn't seem worth it, and it seems to obfuscate the way I only care about comparing them until the point that they diverge, after which I don't care anymore.

bvinc · 2017-02-07T04:55:44Z

I'd really like to steer the discussion towards the strange cases of range(..) and what the behavior should be. I've read all of the rfcs and proposals and documentation that I can find, but no one seems to address these cases. If you know of where it's documented, let me know.

If it's not currently documented, who makes these decisions? It might seem like an extremely small detail, but changing behavior later would be a breaking change.

The way I see it, there are 8 interesting cases to think about, and 3 basic ways to handle them:

Weird cases

range(Included(A), Included(A))
range(Included(A), Excluded(A))
range(Excluded(A), Included(A))
range(Excluded(A), Excluded(A))

where B > A:
5. range(Included(B), Included(A))
6. range(Included(B), Excluded(A))
7. range(Excluded(B), Included(A))
8. range(Excluded(B), Excluded(A))

Option 1

Return an empty iterator on cases 2-3. Immediately panic on cases 4-8. During iteration of the tree, if the 2 tree searches "cross directions" in the middle of searching, we know for sure that Ord is ill-defined, and the function will panic.

Pros:

Will catch errors early

Cons:

Might cause panics in programs where an empty iterator would have been fine
Must oddly document case add an IL type checker #4 as a case that panics

Option 2

Return an empty iterator on cases 2-4. Immediately panic on cases 5-8. Case 4 would be a strange case where in order to prevent a seg fault, one side of the Range might not end up where you would expect in order to force an empty iterator. Eg: the range.back handle might be on the right side of A, even though A was excluded.

Pros:

Can easily document that it panics when min_key > max_key

Cons:

Might cause panics in programs where an empty iterator would have been fine
Case add an IL type checker #4 will be a weird special case

Option 3

Return an empty iterator on cases 2-8, never panic. Declare that the range returned will always have range.back equal or to the right of range.front. This will naturally cause 2-8 to all be empty iterators.

Pros:

Never panics
Matches the intuition that you're getting every key where key > min_key && key < max_key. If that's nothing, that's fine.

Cons:

Might not catch programmer errors early

gereeter

Sorry for not getting around to looking at this earlier. This looks good to me, modulo unifying the mutable and immutable cases.

Personally, I think the most logical behavior is to panic on cases 4-8 and return an empty iterator on cases 2 and 3 (Option 1). This corresponds to panicking when the two ends cross, but not when they touch. I don't think documenting case 4 as panicking is that odd - intuitively, it is starting after A and ending before A. That just sounds wrong. In contrast, cases 2 and 3 are starting and ending at the same place.

gereeter · 2017-02-07T17:12:18Z

src/libcollections/btree/map.rs

+
+            if !diverged {
+                // Don't allow max_edge to go to the left of min_edge
+                if max_edge < min_edge { max_edge = min_edge; }


If we do go this route of never panicking, this also could be an optimization - just search for max_edge starting from min_edge. This would never produced "crossed nodes" by construction, and would require fewer comparisons. I don't know whether this is worth it though. Additionally, it can be done later.

gereeter · 2017-02-07T17:15:29Z

src/libcollections/btree/map.rs

-                    }
-                    GoDown(bottom) => bottom,
-                }
+        loop {


This code is identical to the range case and fairly complex, so I think it would be useful to pull this logic out into a function that is generic over mutability (taking two aliasing root pointers and returning two leaf handles).

gereeter · 2017-02-07T17:25:57Z

src/libcollections/btree/map.rs

-                    GoDown(bottom) => bottom,
-                }
+        loop {
+            let min_edge = match (min_found, range.start()) {


I'm not sure that there is an at-all clean way to do this, but I'm curious whether it would be more performant to check Unbounded vs. Included vs. Excluded before the loop. I expect it doesn't matter, though - it would just be removing some predictable branches, I think.

gereeter · 2017-02-07T17:28:58Z

src/libcollections/btree/map.rs

@@ -777,69 +773,65 @@ impl<K: Ord, V> BTreeMap<K, V> {
               reason = "matches collection reform specification, waiting for dust to settle",
               issue = "27787")]
    pub fn range_mut<T: ?Sized, R>(&mut self, range: R) -> RangeMut<K, V>
-        where T: Ord, K: Borrow<T>, R: RangeArgument<T>
+       where T: Ord, K: Borrow<T>, R: RangeArgument<T>


Nit: this indentation change should probably not be here.

aturon · 2017-02-07T17:44:19Z

cc @rust-lang/libs about the behavioral options laid out in #39457 (comment).

Analogous APIs for slices, such as subslicing, tend to panic on nonsensical ranges, which I think corresponds most closely to Option 1. I think this is the right convention to follow, because such ranges are usually produced by a bug.

arthurprs · 2017-02-07T17:46:17Z

I'm not fond of panicking in recoverable errors but Option 1 has precedents from the slice api, so that's probably the way to go.

alexcrichton · 2017-02-07T20:45:50Z

I agree with @aturon that erring on the side of more panics here seems good. That should be the conservative change here and I think we've got room to return empty iterators in the future if we really need to?

bvinc · 2017-02-08T08:06:07Z

I've changed the code to implement option #1 and added unit tests for those cases. I took the advice of @gereeter and pulled out the range search logic into a common function used by range and range_mut.

arthurprs · 2017-02-09T12:01:37Z

@bvinc could you add some docstrings explaining the panic behavior?

alexcrichton · 2017-02-14T00:43:31Z

@bors: r+

bors · 2017-02-14T00:43:32Z

📌 Commit fb91047 has been approved by alexcrichton

@gereeter

Dont segfault if btree range is not in order This is a first attempt to fix issue rust-lang#33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?

@gereeter

Dont segfault if btree range is not in order This is a first attempt to fix issue rust-lang#33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?

bors · 2017-02-14T12:37:03Z

⌛ Testing commit fb91047 with merge ea451ec...

@gereeter

Dont segfault if btree range is not in order This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?

bors · 2017-02-14T14:01:44Z

💔 Test failed - status-appveyor

alexcrichton · 2017-02-14T15:55:04Z

@bors: retry * spurious windows failure #38620

…

On Tue, Feb 14, 2017 at 8:01 AM, bors ***@***.***> wrote: 💔 Test failed - status-appveyor <https://ci.appveyor.com/project/rust-lang/rust/build/1.0.1949> — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#39457 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD95NxubWXK0gk-yoiytj5A6Bc_vzKRks5rcbPKgaJpZM4L0zQe> .

@gereeter

Dont segfault if btree range is not in order This is a first attempt to fix issue rust-lang#33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?

bors · 2017-02-15T06:09:59Z

⌛ Testing commit fb91047 with merge 7edcd67...

bors · 2017-02-15T07:30:05Z

💔 Test failed - status-travis

alexcrichton · 2017-02-15T15:00:00Z

@bors: retry * spurious network failure w/ crates.io

…

On Wed, Feb 15, 2017 at 1:30 AM, bors ***@***.***> wrote: 💔 Test failed - status-travis <https://travis-ci.org/rust-lang/rust/builds/201763727> — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#39457 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD95Ia5UeoiZyfzCrHO3J7ArUTzCldeks5rcql_gaJpZM4L0zQe> .

bors · 2017-02-15T17:30:16Z

⌛ Testing commit fb91047 with merge 4c4925f...

@gereeter

Dont segfault if btree range is not in order This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?

bors · 2017-02-15T19:06:05Z

💔 Test failed - status-travis

alexcrichton · 2017-02-15T19:16:06Z

@bors: retry * osx linker segfault #38878

…

On Wed, Feb 15, 2017 at 1:06 PM, bors ***@***.***> wrote: 💔 Test failed - status-travis <https://travis-ci.org/rust-lang/rust/builds/201950532> — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#39457 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD95E8GboearsXQq-1nJWu3U2h8jxDjks5rc0yfgaJpZM4L0zQe> .

bors · 2017-02-15T19:17:52Z

⌛ Testing commit fb91047 with merge 0d54a22...

bors · 2017-02-15T20:06:07Z

💔 Test failed - status-travis

alexcrichton · 2017-02-15T20:09:12Z

@bors: retry * sccache wedged #39003

…

On Wed, Feb 15, 2017 at 2:06 PM, bors ***@***.***> wrote: 💔 Test failed - status-travis <https://travis-ci.org/rust-lang/rust/builds/201984747> — You are receiving this because you are on a team that was mentioned. Reply to this email directly, view it on GitHub <#39457 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAD95C5GCcWO2QxZqu7w1-gf6XStsUTHks5rc1qwgaJpZM4L0zQe> .

bors · 2017-02-15T22:02:41Z

⌛ Testing commit fb91047 with merge 4d6019d...

@gereeter

Dont segfault if btree range is not in order This is a first attempt to fix issue #33197. The issue is that the BTree iterator uses next_unchecked for fast iteration, but it can be tricked into running off the end of the tree and segfaulting if range is called with a maximum that is less than the minimum. Since a user defined Ord should not determine the safety of BTreeMap, and we still want fast iteration, I've implemented the idea of @gereeter and walk the tree simultaneously searching for both keys to make sure that if our keys diverge, the min key is to the left of our max key. I currently panic if that is not the case. Open questions: 1. Do we want to panic in this error case or do we want to return an empty iterator? The drain API panics if the range is bad, but drain is given a range of index values, while this is a generic key type. Panicking is brittle and returning an empty iterator is probably the most flexible and matches what people would want it to do... but artificially returning a BTreeMap::Range with start==end seems like a pretty weird and unnatural thing to do, although it's doable since those fields are not accessible. The same question for other weird cases: 2. (Included(101), Excluded(100)) on a map that contains [1,2,3]. Both BTree edges end up on the same part of the map, but comparing the keys shows the range is backwards. 3. (Excluded(5), Excluded(5)). The keys are equal but BTree edges end up backwards if the map contains 5. 4. (Included(5), Excluded(5)). Should naturally produce an empty iterator, right?

bors · 2017-02-16T00:35:05Z

☀️ Test successful - status-appveyor, status-travis
Approved by: alexcrichton
Pushing 4d6019d to master...

@Mark-Simulacrum

BtreeMap range_search spruced up #39457 created a lower level entry point for `range_search` to operate on, but it's really not hard to move it up a level of abstraction, making it somewhat shorter and reusing existing unsafe code (`new_edge` is unsafe although it is currently not tagged as such). Benchmark added. Comparison says there's no real difference: ``` >cargo benchcmp old3.txt new3.txt --threshold 5 name old3.txt ns/iter new3.txt ns/iter diff ns/iter diff % speedup btree::map::find_seq_100 19 21 2 10.53% x 0.90 btree::map::range_excluded_unbounded 3,117 2,838 -279 -8.95% x 1.10 btree::map::range_included_unbounded 1,768 1,871 103 5.83% x 0.94 btree::set::intersection_10k_neg_vs_10k_pos 35 37 2 5.71% x 0.95 btree::set::intersection_staggered_100_vs_10k 2,488 2,314 -174 -6.99% x 1.08 btree::set::is_subset_10k_vs_100 3 2 -1 -33.33% x 1.50 ``` r? @Mark-Simulacrum

rust-highfive assigned BurntSushi Feb 2, 2017

BurntSushi reviewed Feb 2, 2017

View reviewed changes

bvinc force-pushed the master branch from 56700cc to 42ffe00 Compare February 7, 2017 04:38

bvinc changed the title ~~Panic if btree range is not in order to avoid segfault~~ Dont segfault if btree range is not in order Feb 7, 2017

gereeter approved these changes Feb 7, 2017

View reviewed changes

bvinc force-pushed the master branch 3 times, most recently from 01e4816 to e8c63d0 Compare February 8, 2017 07:57

Dont segfault if btree range is not in order

fb91047

bvinc force-pushed the master branch from e8c63d0 to fb91047 Compare February 10, 2017 06:19

alexcrichton added the T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. label Feb 13, 2017

frewsxcv mentioned this pull request Feb 14, 2017

Rollup of 10 pull requests #39800

Closed

frewsxcv mentioned this pull request Feb 14, 2017

Rollup of 9 pull requests #39805

Closed

frewsxcv mentioned this pull request Feb 15, 2017

Rollup of 6 pull requests #39833

Closed

bors merged commit fb91047 into rust-lang:master Feb 16, 2017

bors mentioned this pull request Feb 16, 2017

run rustfmt on btree folder #38530

Closed

Manishearth mentioned this pull request Feb 19, 2017

Segfault on Nightly in BTreeSet and BTreeMap #33197

Closed

ssomers mentioned this pull request Jan 24, 2020

BtreeMap range_search spruced up #68499

Merged

Dont segfault if btree range is not in order #39457

Dont segfault if btree range is not in order #39457

Conversation

bvinc commented Feb 2, 2017 • edited Loading

rust-highfive commented Feb 2, 2017

BurntSushi Feb 2, 2017

Choose a reason for hiding this comment

bvinc Feb 2, 2017

Choose a reason for hiding this comment

BurntSushi commented Feb 2, 2017

alexcrichton commented Feb 3, 2017

arthurprs commented Feb 4, 2017 • edited Loading

alexcrichton commented Feb 4, 2017

bors commented Feb 4, 2017

bvinc commented Feb 7, 2017

bvinc commented Feb 7, 2017

Weird cases

Option 1

Pros:

Cons:

Option 2

Pros:

Cons:

Option 3

Pros:

Cons:

gereeter left a comment • edited Loading

Choose a reason for hiding this comment

gereeter Feb 7, 2017

Choose a reason for hiding this comment

gereeter Feb 7, 2017

Choose a reason for hiding this comment

gereeter Feb 7, 2017

Choose a reason for hiding this comment

gereeter Feb 7, 2017

Choose a reason for hiding this comment

aturon commented Feb 7, 2017

arthurprs commented Feb 7, 2017

alexcrichton commented Feb 7, 2017

bvinc commented Feb 8, 2017

arthurprs commented Feb 9, 2017

alexcrichton commented Feb 14, 2017

bors commented Feb 14, 2017

bors commented Feb 14, 2017

bors commented Feb 14, 2017

alexcrichton commented Feb 14, 2017 via email

bors commented Feb 15, 2017

bors commented Feb 15, 2017

alexcrichton commented Feb 15, 2017 via email

bors commented Feb 15, 2017

bors commented Feb 15, 2017

alexcrichton commented Feb 15, 2017 via email

bors commented Feb 15, 2017

bors commented Feb 15, 2017

alexcrichton commented Feb 15, 2017 via email

bors commented Feb 15, 2017

bors commented Feb 16, 2017

bvinc commented Feb 2, 2017 •

edited

Loading

arthurprs commented Feb 4, 2017 •

edited

Loading

gereeter left a comment •

edited

Loading