-
Notifications
You must be signed in to change notification settings - Fork 12.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
explicitly distinguish pointer::addr and pointer::expose_addr #95588
Conversation
r? @kennytm (rust-highfive has picked a reviewer for you, use r? to override) |
IMO they should always be equivalent. Under a "real model" if it's deemed necessary for
In particular, if you "care" about the subgoal of being able to decouple usize==ptr on CHERI, then the two must be the same, because usize literally isn't big enough to fit provenance info into anymore. If you don't want to "play along" and keep using usize casts, that's fine, your code is just Less Portable. (edit: and you get less "nice things" like "perfect miri asan") |
I don't think so. In my view, Furthermore, by disallowing such casts, some tools (not CHERI, but Miri) can do a better job at enforcing correctness in programs that mix strict provenance APIs with ptr2int2ptr roundtrips via
If most of the code does play along but a single crate does not, then I think we should still say that only pointers explicitly cast to ints via
So, my current thinking is that strict provenance is not a toggle in the memory model, it is a subset of the Rust language. As in, full Rust has permissive provenance (and our stability guarantee covers that), but if you want extras like CHERI or Miri you have to get your program into the strict provenance subset. Under this view, there is no such thing as 'this code does X under strict provenance and Y under permissive provenance'. For programs that are entirely inside the strict provenance subset, indeed this PR does not change anything. But it matters for programs that are gradually moving into that subset, and still have some ptr2int2ptr roundtrips left in some places. For those, I think we should have the rules as set in this PR. |
The way the zulip discussion has been going, I think there seems to be some consensus that we want weak provenance for the actual rust spec, but with strict provenance as a simple to understand subset that we should aspire to live within, and for which Miri et al can work effectively. In that setting, there are definitely two ptr2int operations here: one "broadcasts" / "escapes" the pointer such that it can be used by a subsequent int2ptr, while the other one just extracts the address bits without escaping the pointer, which means that even if you cast it back to a pointer you just get an invalid pointer. For strict-provenance conforming code, you want to maximally use the non-escaping version, because you have no intention of ever doing an int2ptr so it's just performance / optimization left on the table to do otherwise. Hence I think But in far future rust, I think we will want both, because strict provenance everywhere is not possible in the face of FFI and C code that doesn't observe these rules, even if we ignore all the legacy rust code that will not be updated due to MSRV or laziness. |
See this comment for a more detailed explanation of how I am currently interpreting strict provenance. |
I think FFI is a red herring and strict provenance everywhere is possible. But we have too much legacy code to mandate that and I'd rather not have two memory models (strict and permissive). But that discussion is orthogonal to this PR so let's continue on Zulip. ;) |
I should also add that my generic argument for why supporting permissive provenance in the optimizer is easy -- as in, the argument for why we can reasonably guarantee that code with ptr2int2ptr cast roundtrips will keep compiling correctly -- relies on the requirement that is added by this PR. :) |
I am pretty uncomfortable with any suggestion in this thread that |
Interesting. To me this is required to achieve the full potential of strict provenance. let addr = ptr.addr();
let ptr = addr as *const u8;
let _val = *ptr; EDIT: Hm okay I guess it can still help Miri, but it won't help the optimizer. So see the better arguments below. |
I also think distinguishing |
As one last point, I think we will have to pessimize optimizations around The existence of permissive provenance code in some parts of the program should not come at the cost of optimizations for other parts of the program. I don't think we want a 'global toggle' on a language fork for strict vs permissive provenance. But to achieve this isolation within a single compiled program, we have to distinguish |
@RalfJung you're "allowed" to as far as the compiler and abstract machine are concerned, but if you want better validation (and targetting CHERI) you need to "do better", and we already have tools that can help you do that. I think it's wholy desirable for older projects to be "pressured" to conform by people running miri and getting an explosion and being annoyed. If there is clarification to be had in the docs it's that what is currently written is "the strict provenance interpretation of both this operation and (I must confess I am pessimistic on the ability to define a "true" memory model that everyone would be happy with. Part of my goal is to make more code "boring" to the memory model so that the semantic uncanny valley between "what code thinks can happen" and "what the model actually says" is rarer and conceptually like, double-unsafe-code that people scrutinize and validate with more aggressive means, in the same vein of C code that is "probably UB but works in practice because why would the compiler actually miscompile this".) |
All of this is still true under my proposed change. Why should we allow programs like that?
I have a very concrete vision for defining a memory model that at least a lot of people would be happy with. It would ban ptr2int2ptr transmute roundtrips, and maybe ptr2int transmutes in general, but preserve everything else. But to make that model coherent, I argue we need the change I am proposing here. I agree with having more "boring code", and the APIs help make "more code boring". This change does not even affect "boring code", so if you basically gave up already on code outside the 'boring' fragment you should be agnostic on this change. |
Maybe I'm off-base here but as far as the optimizer goes, Rust code is already Pretty Dang Fast, and is "doomed" to lower to compiler backends that implement something like PNVI-ae-udi anyway, so the promise of Better Codegen is not something I regard as plausible or interesting. |
I am saying we need to regress codegen for code that uses So, what I said about optimizations is not about making anything faster than it is right now. It is about keeping the optimizations that we currently do, and putting them on solid footing. Under the vision I mentioned above, optimizing pointer tagging like we do today is correct if and only if we accept this PR. If we don't accept it, that is almost the equivalent of adding an I am not promising Better Codegen, I am promising Actually Explaining Why Current Codegen Makes Sense (for code that uses strict provenance APIs), and Worse But Correct Codegen (for code that doesn't). |
A possible compromise: make it clear that this is making a materially different claim from |
I would also IMO prefer if there was a new method that did have your claimed semantics so that older code could make it clear that it's doing Pointer Crimes and not just completely unaware of the distinction. (and then I guess under CHERI |
I thought that was basically implicit already in the fact that this is (a) unstable and (b) references the strict provenance experiment. I added an even more explicit note, does that work for you? |
Yes I think there is general consensus that such a method should exist, together with an explicit method to cast an int back to a pointer. Then we can deprecate ptr2int and int2ptr |
So part of the reason I defined everything to have Same As Today semantics is so that:
Like, regardless of models or whatever these are just Nice To Have. |
This has exhausted my remaining spoons to discuss this so I won't be able to review this (and am supposed to be "done"). I've given all the insights I can here. I would prefer the two ops be the same, but if you must then tread carefully because you may scupper the social-engineering of this project. Giving people free toys that are nicer and happen to work better is how you move needles. Making this "more dangerous" makes the toys less free. I also think trying to conceptualize a "mixed mode" miri is dropping the ball on being able to use miri as pressure for code to be become more "boring". |
From the social engineering side, I don't think it's a particularly hard sell. You are a Good Rustacean, you want to write Good Code. The rust team is now bringing you a new function that both expresses intent better (this is getting the address of a pointer only, not committing Crimes) and also allows the compiler to make your code faster because you expressed your intent. Win win. The looming spectre of UB will only haunt Bad Rustaceans that lie to the compiler and commit Pointer Crimes using this function, and that's not you, right? |
I don't have a strong opinion but I feel like we could avoid making promises now without removing the possibility of making them later. That is, the docs should probably imply neither of Ideally we'd use maximally-cautionary language that suggests you should pair Or framed a bit differently, maximally portable "strict provenance" code avoids int2ptr/ptr2int entirely, so the distinction doesn't matter much. Not sure how to describe this in docs exactly though. Also, @RalfJung, for miri, could |
Since this proposal just adds some UB, those 2 points remain true after my PR. They would for now be library UB, not language UB, but that doesn't seem like a big deal to me for a transition period. But I think it would be a shame not to use this to also start moving towards a cleaner model while maintaining our current optimization potential (or, rather, to actually unlock the optimization potential that we are currently only having under false premises and "hope it doesn't go wrong"). I thought that was also part of the motivation. Even if it's not at the top of your list, it certainly is a big part of why I got excited about it.
Sorry to hear that. :(
I appreciate that. I think there is also some carrots in the other direction though, which is to say that we can promise people that if their code matches this API then we can actually keep optimizing it like we do currently without speculative advances in formal model technology. Basically, we can have our cake (no odd optimization barriers in pointer tagging code) and eat it, too (still supporting code that does ptr2int2ptr cast roundtrips, in a foundationally-formal-correct kind of way), but only if we let the compiler know when some code actually claims to follow strict provenance. So, my thinking is that this will help make the toy more attractive, overall. But I might be wrong. I wonder what others think.
😂 |
@eddyb isn't the forward-compatible option to make it UB? If people end up writing code like the example above, then for the purpose of optimizations and formalizing Rust behavior we will end up in a situation almost like today where we have legacy code we have to support that doesn't give the compiler the information we require for formally correct high-quality codegen.
In principle, yes -- but we'd first have to actually implement the concept of 'exposed allocations' in Miri, or even 'exposed provenance' (considering Stacked Borrows). So, not any time soon, I think. |
I think we need a Because right now, people are just going to see "oh, there's a method for this" and start using it, probably without checking whether doing so is actually sound. If we can say in the lint (#95488) "hey, make sure you pick carefully between |
Another way to steer people away from these low-level APIs would be to provide higher level functions such as |
explicitly distinguish pointer::addr and pointer::expose_addr `@bgeron` pointed out that the current docs promise that `ptr.addr()` and `ptr as usize` are equivalent. I don't think that is a promise we want to make. (Conceptually, `ptr as usize` might 'escape' the provenance to enable future `usize as ptr` casts, but `ptr.addr()` dertainly does not do that.) So I propose we word the docs a bit more carefully here. `@Gankra` what do you think?
Hm, I would have thought people might want warnings against all non-strict-provenance operations, but you are right that there is no loss in analysis precision until |
@bors r- |
e50061d
to
7138a32
Compare
@bors r=scottmcm |
📌 Commit 7138a322ef76a658ab240c97264c33937cfb815c has been approved by |
This comment has been minimized.
This comment has been minimized.
7138a32
to
0252fc9
Compare
@bors r=scottmcm |
📌 Commit 0252fc9 has been approved by |
explicitly distinguish pointer::addr and pointer::expose_addr `@bgeron` pointed out that the current docs promise that `ptr.addr()` and `ptr as usize` are equivalent. I don't think that is a promise we want to make. (Conceptually, `ptr as usize` might 'escape' the provenance to enable future `usize as ptr` casts, but `ptr.addr()` dertainly does not do that.) So I propose we word the docs a bit more carefully here. `@Gankra` what do you think?
Rollup of 7 pull requests Successful merges: - rust-lang#91873 (Mention implementers of unsatisfied trait) - rust-lang#95588 (explicitly distinguish pointer::addr and pointer::expose_addr) - rust-lang#95603 (Fix late-bound ICE in `dyn` return type suggestion) - rust-lang#95620 (interpret: remove MemoryExtra in favor of giving access to the Machine) - rust-lang#95630 (Update `NonNull` pointer provenance methods' documentation) - rust-lang#95631 (Refactor: remove unnecessary nested blocks) - rust-lang#95642 (`CandidateSource::XCandidate` -> `CandidateSource::X`) Failed merges: r? `@ghost` `@rustbot` modify labels: rollup
@bgeron pointed out that the current docs promise that
ptr.addr()
andptr as usize
are equivalent. I don't think that is a promise we want to make. (Conceptually,ptr as usize
might 'escape' the provenance to enable futureusize as ptr
casts, butptr.addr()
dertainly does not do that.)So I propose we word the docs a bit more carefully here. @Gankra what do you think?