-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
WIP PROOF-OF-CONCEPT: experiment with very strict pointer provenance #95199
Conversation
(rust-highfive has picked a reviewer for you, use r? to override) |
This is based on the "distinguish pointers and addresses" proposal in my recent article: |
Currently I am struggling with how to properly |
This comment has been minimized.
This comment has been minimized.
Would |
Ah, ok I just typoed the name of the lint and was too tired to notice, making progress again... |
I would like to advocate this idea of having explicit pointer to address cast ( I feel it would be great to be explicit when we really want to do this pointer to int casting. |
Yeah i will say that there's a lot of really hairy code where I see "as usize" and I'm like "wait is this just integer stuff or are they doing cute pointer stuff" that is making this conversion feel a lot more compelling. That said as the AVR stuff in core::ptr notes, there are actual representational differences between function pointers and data pointers on some platforms and that has....... implications. Other thing of note: I am WONTFIXing all the hackery in panic_unwind. Not because I think it's impossible, but just because it's brain melty to think about pointer semantics for a part of the runtime that gets to do Magic, and it's just not necessary to figure out for an MVP that miri can mess with. Definitely an interesting case study though. |
@crazyboycjr , as an aside, note that there are variants of this that are even more hilarious: #81686 |
This patch series examines the question: how bad would it be if we adopted an extremely strict pointer provenance model that completely banished all int<->ptr casts. The key insight to making this approach even *vaguely* pallatable is the ptr.with_addr(addr) -> ptr function, which takes a pointer and an address and creates a new pointer with that address and the provenance of the input pointer. In this way the "chain of custody" is completely and dynamically restored, making the model suitable even for dynamic checkers like CHERI and Miri. This is not a formal model, but lots of the docs discussing the model have been updated to try to the *concept* of this design in the hopes that it can be iterated on. Many new methods have been added to ptr to attempt to fill in semantic gaps that this introduces, or to just get the ball rolling on "hey this is a problem that needs to be solved, here's a bad solution as a starting point".
…casts. ALL OF THEM
eb9f2f9
to
a5e2033
Compare
status: completes stage 1 with no warnings (but some WONTFIXes) getting through stage2 involves fixing all the warnings in rustc itself, which, is unfortunately more work than I thought given it loves pointer tagging. Exhausted, done for the day. |
This comment has been minimized.
This comment has been minimized.
Oh also I'm working on windows so Current rustc TODO
|
where | ||
T: Sized, | ||
{ | ||
self as usize | ||
unsafe { core::mem::transmute(self) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Uuh I don't like this part.^^ See rust-lang/unsafe-code-guidelines#286.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You may notice I deprecated its twin -- in this form it is purely for "I really need intptr_t just as a way to talk about a pointer's bytes in an irreversible way" purposes.
But yes I would just deprecate it, imo.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also IMO it should be transmute(self.addr())
to avoid the sketchy ptr-to-int transmute.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have almost no experience with rust, so I may be mis-parsing this. However, it seems to me that this does indeed need to be self
and not self.addr()
if you want the raw pointer representation. For CHERI self.addr()
will not give the all the pointer bits, only the 32/64 address bits without the metadata.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh I see.
Hm but this produces u8
with provenance on them which I don't think we want, either.^^
/// let val = *my_untagged_ptr; | ||
/// ``` | ||
#[unstable(feature = "strict_provenance", issue = "99999999")] | ||
pub fn with_addr(self, addr: usize) -> Self |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW another kind of API that has been proposed is
pub fn map_addr(self, impl FnOnce(usize) -> usize) -> Self
so that might also be fun to experiment with.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems to be both more restrictive and more verbose?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm no ok it's less verbose in the ideal situation.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ptr.map_addr(|a| a & ~0x1)
is a neat way to express "reset last bit of this pointer" in a single expression, IMO (and works well even if ptr
is a slightly larger expression).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
having both might be reasonable
@@ -20,12 +23,8 @@ | |||
//! be *dereferenceable*: the memory range of the given size starting at the pointer must all be | |||
//! within the bounds of a single allocated object. Note that in Rust, | |||
//! every (stack-allocated) variable is considered a separate allocated object. | |||
//! * Even for operations of [size zero][zst], the pointer must not be pointing to deallocated |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW this was meant to be a paragraph on ZST, not on deallocation, so the rewording kind of defeats its purpose.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I hoisted it out is that I can only rationalize this rule under strict provenance, where you can distinguish "reference to a ZST subfield" (can be invalidated by a deallocation) and "a magical forged allocation" (can't be invalidated by a deallocation).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Wow, that is a daring experiment. :D I am fully onboard! Just left a few comments.
//! *derived* from the One True Handle through operations like `offset` or borrowing. | ||
//! | ||
//! (Unclear detail: taking a subslice is proposed to create a slice that is no longer allowed | ||
//! to access the full range of memory -- is this part of provenance or another system?) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I view this as part of provenance. With Stacked Borrows, provenance is more refined than just an allocation ID; that's what the "tags" are for. (And technically we probably would not even need allocation IDs any more; Stacked Borrows tags could be our only provenance.)
#[must_use] | ||
#[rustc_const_stable(feature = "strict_provenance", since = "1.61.0")] | ||
#[unstable(feature = "strict_provenance", issue = "99999999")] | ||
pub const fn zst_exists<T>(addr: usize) -> *mut T |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this say that NULL is forbidden?
read/write at NULL is still UB, even with a ZST. NULL is the one address that the compiler knows is not part of any allocation. (At least that's how I thought about it, and it's what Miri checks.)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah in spite of how much text I wrote I truly just slapped some of these in here. I basically wrote the "novel" docs and left out the boring parts like "all normal rules for things apply" which obviously should be filled out and made explicit, just, hacking.
Still working on this, but it seems to largely be a lot of `as usize` -> `.addr()`
Why does rustc do oh so many crimes? Oh so many...
a5e2033
to
09be027
Compare
So this basically turns usize from being both size_t and uintptr_t to being just size_t. That leaves the question of whether Rust needs a uintptr_t equivalent; avoiding the type is generally a good thing since it's really a union in disguise and doesn't give you much help when it comes to type-checking, but I could imagine it still being a thing people want in unsafe Rust, as well as FFI things (otherwise what type do you use for a C uintptr_t?). |
This comment has been minimized.
This comment has been minimized.
Yeah uintptr_t is a bit of an open question. I think on principle you don't want it but that something will force our hand eventually. This PR avoids adding it for the sake of "does that work at all"? |
Latest push manages to build stage 2 (basically just marked a dozen files in rustc as TODO and supressed the warnings, because rustc's code doesn't actually execute in a binary it compiles. |
This comment has been minimized.
This comment has been minimized.
hey i wonder if i'm still in charge here |
@bors try |
@Gankra: 🔑 Insufficient privileges: not in try users |
fair bors, fair. |
I mean we can fix that.^^ |
✌️ @Gankra can now approve this pull request |
Small nit: instead of |
yeah the 9999999 was entirely on the assumption that this wasn't ever gonna land and i was just gonna use bors to build some artifacts but I'm down to actually land a subset |
@bors try |
⌛ Trying commit 8643aa4 with merge 776e3314cbfb4507e62fbdc866de866823746bb9... |
The job Click to see the possible cause of the failure (guessed by this bot)
|
💔 Test failed - checks-actions |
The job Click to see the possible cause of the failure (guessed by this bot)
|
☔ The latest upstream changes (presumably #95173) made this pull request unmergeable. Please resolve the merge conflicts. |
Closing in favour of #95228 which is the cleaned up shippable version of the libs changes. However note that there's extra stuff on here like the rustc lint that should be factored out. |
…chaelwoerister Strict provenance lints See rust-lang#95488. This PR introduces two unstable (allow by default) lints to which lint on int2ptr and ptr2int casts, as the former is not possible in the strict provenance model and the latter can be written nicer using the `.addr()` API. Based on an initial version of the lint by `@Gankra` in rust-lang#95199.
…chaelwoerister Strict provenance lints See rust-lang#95488. This PR introduces two unstable (allow by default) lints to which lint on int2ptr and ptr2int casts, as the former is not possible in the strict provenance model and the latter can be written nicer using the `.addr()` API. Based on an initial version of the lint by ``@Gankra`` in rust-lang#95199.
…chaelwoerister Strict provenance lints See rust-lang#95488. This PR introduces two unstable (allow by default) lints to which lint on int2ptr and ptr2int casts, as the former is not possible in the strict provenance model and the latter can be written nicer using the `.addr()` API. Based on an initial version of the lint by ```@Gankra``` in rust-lang#95199.
This patch series examines the question: how bad would it be if we adopted
an extremely strict pointer provenance model that completely banished all
int<->ptr casts.
The key insight to making this approach even vaguely pallatable is the
ptr.with_addr(addr) -> ptr
function, which takes a pointer and an address and creates a new pointer
with that address and the provenance of the input pointer. In this way
the "chain of custody" is completely and dynamically restored, making the
model suitable even for dynamic checkers like CHERI and Miri.
This is not a formal model, but lots of the docs discussing the model
have been updated to try to the concept of this design in the hopes
that it can be iterated on.
Many new methods have been added to ptr to attempt to fill in semantic gaps
that this introduces, or to just get the ball rolling on "hey this is a
problem that needs to be solved, here's a bad solution as a starting point".