Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BTreeMap: respect pointer provenance rules in split_off #79347

Merged
merged 1 commit into from
Dec 25, 2020

Conversation

ssomers
Copy link
Contributor

@ssomers ssomers commented Nov 23, 2020

The test cases for split_off reported a few more violations (now that they support -Zmiri-track-raw-pointers). The functions shift_kv and shift_edges do not fix anything, I think, but if move_kv and move_edges exist, they deserve to live too.

r? @Mark-Simulacrum

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Nov 23, 2020
@ssomers
Copy link
Contributor Author

ssomers commented Nov 23, 2020

I'm pretty sure move_suffix should be failing just as well, but it doesn't. Nevertheless, this PR somehow makes the tests pass, if you know how to run them with -Zmiri-track-raw-pointers. Which I think is still not possible in their normal form as unit tests, so it's probably best to wait.

@ssomers ssomers force-pushed the btree_split_pointer_provenance branch from 188008d to 81f72ac Compare November 24, 2020 19:06
@ssomers
Copy link
Contributor Author

ssomers commented Nov 24, 2020

Let's call the previous pointer-provenance-proofed part of the node.rs API X and the support for split_off, a separate set of functions, Y.

X uses completely generic functions like slice_insert, and therefore often needs to have a slice sized to node length.

  • In a few occasions, the code calls into_key_area_slice and doesn't use the length. Didn't seem like an issue to me.
  • It invokes steps in triplicate, once for the key array, once for the value array, and then (conditionally) for the edge array. Doesn't seem like a problem to me.
  • I added methods like key_area on immutable nodes. But that doesn't make sense. We only need access to the arrays to move a portion of the area to another node in the same tree, when we need exclusive access to the tree, and we need to mutate or deallocate the source node anyway, otherwise we would end up with copied keys, values or edges.

Y uses a separate set of functions based around into_kv_pointers_mut and never (intentionally) considers node length.

  • A function like move_kv (with a body of 2 lines) avoids duplicating the code for the key and the value array. Not sure this is worth it, and apparently it wasn't worth doing for the step I called shift_kv. What I do like is that it expresses intent through the name "move" instead of "copy_overlapping".
  • The difference with edges is bigger, because there's no into_edge_pointer_mut (which is trivial to write).
  • Therefore we sometimes get the pointer to the edge array twice (sequentially). Whereas the result of into_kv_pointers_mut is used both times. Which might save some instructions, but if that genuinely matters, it shouldn't be limited to split_off.
  • But we also need to correct parent links, and (probably) need NodefRef for that, instead of just a raw pointer to the edge array.
  • A reason to have into_kv_pointers_mut is the same as explained in kv_mut: we only access the LeafNode pointer once, then intersperse the treatment of the key array with the treatment of the value array. We never intersperse that with use of the edge array, because that happens under the condition the node is internal, and therefore it's never a problem that getting the edge pointer invalidates the pointers that into_kv_pointers_mut got earlier.
  • into_kv_pointers_mut erases the MaybeUninit-ness of the arrays, while X is explicit about that. No idea which one is better.
  • If we'd change this code to work of slices too, would the slice methods copy_from_slice and copy_within already do the job? No, they require Copy types.

The problem described here is really about the X and Y worlds interfering, so barring there's a way for them to live in harmony, I've committed another fix, that keeps them separate to reduce (future) confusion.

@ssomers ssomers force-pushed the btree_split_pointer_provenance branch 4 times, most recently from 716e6c3 to b0f3584 Compare November 27, 2020 23:05
@ssomers ssomers force-pushed the btree_split_pointer_provenance branch from b0f3584 to 2f39bf3 Compare November 28, 2020 12:05
@ssomers
Copy link
Contributor Author

ssomers commented Nov 28, 2020

Everything uses slices, and they no longer implicitly use node length. Includes #79363.

Copy link
Member

@Mark-Simulacrum Mark-Simulacrum left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't yet have time to review all of the changed code -- I think it's mostly trivial changes but wide-spread so will need to just sit down and review. I am a bit worried about the extent of these changes; they don't look quite like a revert of previous provenance PRs but are certainly changing similar code. I am not sure that I feel completely certain of the framework to a sufficient extent to be able to review the code with an eye towards ensuring provenance is respected in these operations, and I'm not sure if miri is currently checking sufficiently that relying on it makes sense.

I am thinking that the extent of the mangling we seem to require based on this (and previous) patches to support the provenance may point to needing a simpler model of provenance or some way to bypass the requirement.

In any case, that is all mostly not relevant for this PR -- just some musings. cc @RalfJung

I will try to review this later today or tomorrow.

@@ -1796,5 +1694,42 @@ unsafe fn slice_remove<T>(slice: &mut [MaybeUninit<T>], idx: usize) -> T {
}
}

/// Shifts the slice such that the element at `mid` becomes the first.
/// Is similar to `rotate_left`, but assumes the leading elements are
/// uninitialized and leaves the trailing elements uninitialized.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the comparison to rotate_left is confusing. This does not perform any sort of rotation, it just shifts elements to the left. slice_shift_left might be a more appropriate name, and mid could be distance in that case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, I meant it could be implemented using rotate_left.

///
/// # Safety
/// The `dest` slice is at least as long as the `source` slice.
unsafe fn slice_move<T>(source: &[MaybeUninit<T>], dest: &mut [MaybeUninit<T>]) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we note in the comment here that this is equivalent to copy_from_slice aside from avoiding bounds checking?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and aside from not requiring T to be Copy and not requiring the destination to be the same length.

@ssomers
Copy link
Contributor Author

ssomers commented Nov 28, 2020

they don't look quite like a revert of previous provenance PRs

Well, they do pretty much redo those, more thoroughly.

I am thinking that the extent of the mangling we seem to require based on this (and previous) patches to support the provenance may point to needing a simpler model of provenance or some way to bypass the requirement.

The extent here is purely because I wanted to get rid of the node length sneaking in left and right, when it only mattered considering pointer provenance. Now the slice length is explicit, used prominently in release code, and debug_asserted, and therefore I believe Miri won't be able to find much to complain about anymore.

I will try to review this later today or tomorrow.

I'll chip off a few layers as separate PRs. That's how I started them anyway, but I had to adjust each layer and threw it all in one bowl.

@ssomers
Copy link
Contributor Author

ssomers commented Nov 29, 2020

The extent here is purely because I wanted to get rid of the node length sneaking in left and right

That's not entirely true. The node length snuck in a bit deeper because of the initial attempt at pointer provenance proofing. The issue pointer provenance stumbled over, here and in previous PRs, is that the destination of ptr::copy_nonoverlapping didn't seem big enough. Initially that was because that destination didn't bother to figure out a size and just got a single pointer. Later, as in this PR, pointer provenance complained because it got the wrong size. In hindsight, I think the latter (i.e. what's currently is master) is worse, because it confuses human readers of the code too. With this PR, a debug_assert checks the destination size much earlier, before Miri comes along or before you need to debug a buffer write overrun. Also, using the same slice API for source and destination, I think it's actually quite easy to supply an accurate size, it helps readers, and I doubt it costs a penny in release builds. So, I think we end up with more readable and more maintainable code.

may point to needing a simpler model of provenance or some way to bypass the requirement.

I do agree that I doubt everyone sees it that way, or is prepared to invest in more defensive code. It might help adoption if there's a separate option that doesn't sound the alarm if it sees something like ptr::copy_nonoverlapping with a destination that is a simple pointer (assuming it's able to see the difference with a slice of 1 element), but still sound the alarm the destination is a slice that is too short, like in this PR.

@RalfJung
Copy link
Member

I am thinking that the extent of the mangling we seem to require based on this (and previous) patches to support the provenance may point to needing a simpler model of provenance or some way to bypass the requirement.

Yeah, I am worried about what this entire experience of making BTree compatible with Stacked Borrows says about Stacked Borrows. Once the next aliasing model proposal is on the table, we should evaluate it against some older versions of BTree to check if that model would have declared less UB here, and thus required fewer changes.

FWIW, there is a way to bypass provenance -- just use raw pointers everywhere. But there cannot be a way to bypass provenance when references are being used, as that would subvert the entire point of the aliasing model: if we cannot rely on all aliases of a reference following the rules, then we might a swell have no rules since at that point they are just pointless.

To be fair, I also think that part of the problem is that this code here was written without giving much thought to conforming with a potential aliasing model of Rust. Sure, no concrete model existed, but it is curious that BTree seems to have way more issues than any other code in the standard library.

@RalfJung
Copy link
Member

I'm not sure if miri is currently checking sufficiently that relying on it makes sense.

Miri with -Zmiri-track-raw-pointers is as strict as it gets. If this code fixes errors with that flag, I think that definitely means something.

@bors
Copy link
Contributor

bors commented Dec 13, 2020

☔ The latest upstream changes (presumably #79988) made this pull request unmergeable. Please resolve the merge conflicts.

Note that reviewers usually do not review pull requests until merge conflicts are resolved! Once you resolve the conflicts, you should change the labels applied by bors to indicate that your PR is ready for review. Post this as a comment to change the labels:

@rustbot modify labels: +S-waiting-on-review -S-waiting-on-author

@ssomers ssomers force-pushed the btree_split_pointer_provenance branch 2 times, most recently from f0ed9ac to f8d47fa Compare December 17, 2020 21:03
@ssomers
Copy link
Contributor Author

ssomers commented Dec 18, 2020

Most of the pointer provenance violations were already resolved through #78631. The only ones remaining are (again) due to the node length being updated too soon. So the current commit (based on #79521) only addresses that.

@ssomers ssomers force-pushed the btree_split_pointer_provenance branch from f8d47fa to 8824efd Compare December 24, 2020 11:09
@Mark-Simulacrum
Copy link
Member

@bors r+ rollup=never

@bors
Copy link
Contributor

bors commented Dec 24, 2020

📌 Commit 8824efd has been approved by Mark-Simulacrum

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Dec 24, 2020
@bors
Copy link
Contributor

bors commented Dec 24, 2020

⌛ Testing commit 8824efd with merge 2c308b9...

@bors
Copy link
Contributor

bors commented Dec 25, 2020

☀️ Test successful - checks-actions
Approved by: Mark-Simulacrum
Pushing 2c308b9 to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Dec 25, 2020
@bors bors merged commit 2c308b9 into rust-lang:master Dec 25, 2020
@rustbot rustbot added this to the 1.50.0 milestone Dec 25, 2020
@ssomers ssomers deleted the btree_split_pointer_provenance branch December 25, 2020 08:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants