Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Differences between *const T and *mut T. Initially *const T pointers are forever read-only? #257

Open
thomcc opened this issue Nov 11, 2020 · 27 comments
Labels
A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows)

Comments

@thomcc
Copy link
Member

thomcc commented Nov 11, 2020

I hadn't seen this but it's very surprising and should be documented better. Apparently, *mut T and *const T aren't equivalent — if the raw pointer starts out as a *const T it will always be illegal to write to, even nothing beyond the pointer's "memory" of its initial state is the reason for this.

See: rust-lang/rust-clippy#4774 (comment)

This is not entirely correct... something like &mut foo as *mut T as *const T as *mut T is entirely harmless. What is relevant is the initial cast, when a reference is turned to a raw pointer. I think of the pointer as "crossing into another domain", that of uncontrolled raw accesses. If that initial transition is a *const, then the entire "domain" gets marked as read-only (modulo UnsafeCell). The raw ptrs basically "remember" the way that the first raw ptr got created.

This is extremely surprising, as lots of documentation and common wisdom indicates that *const T vs *mut T are identical except as a sort of lint, and that the variance is different.

In fact, often having correct variance in your types often forces using *const even for mutable data (hence NonNull uses const). The prevalence of this certainly helps contribute to programmers belief that there's no meaningful difference, you just have to be sure the data you write to is legal for you to write to.

A common case where this happens is if you write a helper method to return a pointer, you might write this just once for the *const T case, and use it even if you're a &mut self and need a *mut T result. I wouldn't think twice about this, mostly because the myth they're equivalent is so widespread

Ralf's comment here even further propagates this myth, in a thread explicitly asking about the differences here... https://internals.rust-lang.org/t/what-is-the-real-difference-between-const-t-and-mut-t-raw-pointers/6127/18 :

I agree. *const T and *mut T are equivalent in terms of UB.

More broadly, nowhere in the thread does the 'once *const, always *const' behavior come up, just that you need to make sure that you maintain the normal rust rules (e.g. the rules which would apply had you started out with a *mut T).

I looked and in none of Rust's reference material could I find any mention of behavior like this. This is very surprising, and I had been under the impression that optimising accesses to raw pointers wasn't beneficial enough for Rust to care strongly about them.

I also think this breaks a lot of existing unsafe code given how widespread the belief that they are equivalent is, and makes non-const-correct C libraries much thornier to bind to :(

@elichai
Copy link

elichai commented Nov 11, 2020

if the raw pointer starts out as a *const T it will always be illegal to write to

More broadly, nowhere in the thread does the 'once *const, always *const'

These statements aren't true. the only thing that matters is how did you get the pointer, for example this is 100% correct rust code:

let mut v = 5u8;
let ptr: *const u8 = unsafe {std::mem::transmute(&mut v)};
unsafe {*(ptr as *mut u8) = 7;}
assert_eq!(v, 7); 

So even though it started as a *const u8 you're still allowed to write into it, because you got the pointer from a unique(mut) reference and not a shared reference

@RalfJung
Copy link
Member

This is, I think, a duplicate of #227. I agree it is a problem. I just do not know a good solution.

So even though it started as a *const u8 you're still allowed to write into it, because you got the pointer from a unique(mut) reference and not a shared reference

The subtle aspect of this is that x as *const _ is basically the same as &*x as *const _, i.e., as *const _ always goes through a shared reference.

@elichai
Copy link

elichai commented Nov 11, 2020

The subtle aspect of this is that x as *const _ is basically the same as &*x as *const _, i.e., as *const _ always goes through a shared reference.

Ohhh that's what he was talking about, I'm sorry I misunderstood you @thomcc

@mversic
Copy link

mversic commented Nov 17, 2020

I hope I'm not off topic. I don't think it is, since NonNull is a wrapper over *const

NonNull documentation states:

Notice that NonNull has a From instance for &T. However, this does not change the fact that mutating through a (pointer derived from a) shared reference is undefined behavior unless the mutation happens inside an UnsafeCell. The same goes for creating a mutable reference from a shared reference. When using this From instance without an Unsaf:eCell, it is your responsibility to ensure that as_mut is never called, and as_ptr is never used for mutation.

I've also found this post which basically asserts that it is entirely ok to use NonNull in FFI. If pointer is nullable then I see no special benefit in using Option<NonNull<T>>, I would just use *mut T. However, I'm interested to use NonNull<T> for non nullable pointers(i.e pointers for which C documentation explicitly states null value must not be provided) as it would provide additional type safety. This is just for the scenario where Rust code is calling C code, not vice versa.

And, now I'm confused :)

@mversic
Copy link

mversic commented Nov 17, 2020

hm, we could say that this isn't a NonNull related issue. We can still have the same issue in FFI in this scenario:

let x = [1, 2, 3];
let y = c_fun_which_takes_ptr_and_mutates_it(x.as_ptr() as *mut _)?;

this is said to be a UB as well

therefore I find that using NonNull<T> is as good as using *mut T considering the risk of UB. Maybe it's a little better since documentation states the risk of UB

@Diggsey
Copy link

Diggsey commented Nov 18, 2020

AIUI, this actually has little to do with *const vs *mut and is about whether the pointer provenance is a & or a &mut (or no provenance).

The only tricky part is the point @RalfJung mentioned when casting directly from a &mut to a *const where a reference is implicitly created.

One option would be to warn on this direct cast (&mut -> *const) (in the next edition if that would be too noisy) and require that the &mut -> *mut -> *const vs &mut -> & -> *const path is explicitly distinguished.

Then you can be safe in treating *const and *mut the same.

@RustyYato
Copy link

Could we change &mut T as *const T to not go through a shared reference? Getting rid of the implicit footgun.

@RalfJung
Copy link
Member

RalfJung commented Nov 29, 2020

The tricky bit would be to keep this code working:

fn main() {
    let x = &mut 0;
    let shared = &*x;
    let y = x as *const i32; // if we use *mut here instead, this stops compiling
    let _val = *shared;
}

Currently this works because x as *const _ is considered a read-only access.

OTOH, we do reject the as *mut version of this. If we want to treat as *mut and as *const the same, accepting one and rejecting the other makes little sense.

@GoldsteinE
Copy link

How does addr_of!() affects this? This code:

let mut x = 0_i32;
let ptr_x: *const i32 = std::ptr::addr_of!(x);
let mut_ptr_x: *mut i32 = ptr_x as _;
unsafe { *mut_ptr_x = 2; }

creates a *const i32 without creating &i32 first and currently triggers Miri. addr_of!() documentation doesn’t mention that resulting pointer can’t be casted to *mut T and used for writes though.

@RalfJung
Copy link
Member

RalfJung commented Jul 26, 2022

creates a *const i32 without creating &i32 first and currently triggers Miri

Indeed, that's how it currently affects addr_of.

addr_of!() documentation doesn’t mention that resulting pointer can’t be casted to *mut T and used for writes though.

True. It also doesn't say that you can do that. The docs are not exhaustive for what you cannot do. (That would require infinitely large docs.)

There has not been a decision on what the semantics should be here, and that's why the docs basically don't talk about this. It's not great, but absent a decision it's also not clear what else to do. And making the decision without having an entire aliasing model for all the context isn't really a good idea either.

@GoldsteinE
Copy link

It also doesn't say that you can do that.

It’s true. The way “validity” is currently defined in the standard library docs doesn’t guarantee that any use of pointers from addr_of!() (or addr_of_mut!(), for that matter) is valid.

Is there a rationale for making addr_of!()-produced pointers invalid for writes? I think it’s kind of confusing and doesn’t match the general intuition that *const _ and *mut _ raw pointers are interchangeable.

@bjorn3
Copy link
Member

bjorn3 commented Jul 27, 2022

Is there a rationale for making addr_of!()-produced pointers invalid for writes? I think it’s kind of confusing and doesn’t match the general intuition that *const _ and *mut _ raw pointers are interchangeable.

If you want to write through the pointer you would use addr_of_mut!(), right? Otherwise what is the point of having two separate macros?

addr_of!() documentation doesn’t mention that resulting pointer can’t be casted to *mut T and used for writes though.

It actually does under the examples section of the addr_of!() documentation:

See addr_of_mut for how to create a pointer to unininitialized data. Doing that with addr_of would not make much sense since one could only read the data, and that would be Undefined Behavior.

@RalfJung
Copy link
Member

Ralf's comment here even further propagates this myth, in a thread explicitly asking about the differences here... https://internals.rust-lang.org/t/what-is-the-real-difference-between-const-t-and-mut-t-raw-pointers/6127/18 :

I agree. *const T and *mut T are equivalent in terms of UB.

I feel quoted out of context here -- for the question raised in that particular thread, my statement holds true. But specifically when converting a reference to a raw pointer, there is a difference.

Is there a rationale for making addr_of!()-produced pointers invalid for writes?

Basically, because it matches what the borrow checker does -- see here further up this thread.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 12, 2023
avoid mixing accesses of ptrs derived from a mutable ref and parent ptrs

`@Vanille-N` is working on a successor for Stacked Borrows. It will mostly accept strictly more code than Stacked Borrows did, with one exception: the following pattern no longer works.
```rust
let mut root = 6u8;
let mref = &mut root;
let ptr = mref as *mut u8;
*ptr = 0; // Write
assert_eq!(root, 0); // Parent Read
*ptr = 0; // Attempted Write
```
This worked in Stacked Borrows kind of by accident: when doing the "parent read", under SB we Disable `mref`, but the raw ptrs derived from it remain usable. The fact that we can still use the "children" of a reference that is no longer usable is quite nasty and leads to some undesirable effects (in particular it is the major blocker for resolving rust-lang/unsafe-code-guidelines#257). So in Tree Borrows we no longer do that; instead, reading from `root` makes `mref` and all its children read-only.

Due to other improvements in Tree Borrows, the entire Miri test suite still passes with this new behavior, and even the entire libcore and liballoc test suite, except for these 2 cases this PR fixes. Both of these involve code where the programmer wrote `&mut` but then used pointers derived from that reference in ways that alias with the parent pointer, which arguably is violating uniqueness. They are fixed by properly using raw pointers throughout.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Feb 12, 2023
avoid mixing accesses of ptrs derived from a mutable ref and parent ptrs

``@Vanille-N`` is working on a successor for Stacked Borrows. It will mostly accept strictly more code than Stacked Borrows did, with one exception: the following pattern no longer works.
```rust
let mut root = 6u8;
let mref = &mut root;
let ptr = mref as *mut u8;
*ptr = 0; // Write
assert_eq!(root, 0); // Parent Read
*ptr = 0; // Attempted Write
```
This worked in Stacked Borrows kind of by accident: when doing the "parent read", under SB we Disable `mref`, but the raw ptrs derived from it remain usable. The fact that we can still use the "children" of a reference that is no longer usable is quite nasty and leads to some undesirable effects (in particular it is the major blocker for resolving rust-lang/unsafe-code-guidelines#257). So in Tree Borrows we no longer do that; instead, reading from `root` makes `mref` and all its children read-only.

Due to other improvements in Tree Borrows, the entire Miri test suite still passes with this new behavior, and even the entire libcore and liballoc test suite, except for these 2 cases this PR fixes. Both of these involve code where the programmer wrote `&mut` but then used pointers derived from that reference in ways that alias with the parent pointer, which arguably is violating uniqueness. They are fixed by properly using raw pointers throughout.
@RalfJung
Copy link
Member

Some updates on this:

  • With Tree Borrows, as *const T and as *mut T behave exactly the same, fixing the surprise that triggered this issue.
  • It turns out that at least one analysis in rustc actually did assume that "initially *const" pointers are not used for mutation; see Mutating through addr_of produces LLVM IR with UB rust#111502.
  • It also turns out some people actually prefer the SB behavior over TB here: having a raw pointer's mutability determined when it initially crosses from safe land to unsafe land. I agree with @thomcc and everyone else who was surprised by this over the years -- while this model can be rationalized well, it is also almost never what people intuitively expect, so IMO we should avoid it unless there are other major reasons to build things like that. Operationally, *const and *mut should not make a difference. If we truly want to make let bindings (without UnsafeCell) immutable, we should achieve that based on the mutability of the binding, not the syntax of the cast. Of course they are not actually immutable since their initial value has to be written in after they are allocated... but I don't think we want that mut in let mut to be any more than a type system hint that prevents bugs.

@CAD97
Copy link

CAD97 commented May 13, 2023

To make sure it's remembered, there is some practical justification of as *const _/addr_of! and as *mut _/addr_of_mut! behaving differently — they're treated differently by the borrow checker. The *mut version is checked as a mutable access, and the *const version as immutable.

example
let mut x = &mut 5;
let r = &x;

let _ = x as *mut _;
// ^ERROR: cannot borrow as mutable ... also borrowed as immutable

let _ = x as *const _;
// allowed

let _ = addr_of_mut!(x);
// ^ERROR: cannot borrow as mutable ... also borrowed as immutable

let _ = addr_of!(x);
// allowed

dbg!(r);

This doesn't mean that the opsem has to match this and create a pointer with shared provenance for the *const constructions1, but it does provide a potential justification.

As long as providing derived mut provenance to the pointer doesn't impact the validity of extant provenance until the pointer is accessed, though, I agree that the more permissive model of giving the mut provenance when possible is desirable. (If the more permissive semantics are a pessimization to some code, it can probably be rewritten to introduce a shared reborrow and limit the provenance explicitly. Plus, managing two distinct simultaneously valid sibling raw provenances (one mut and one shr) seems like a nightmare.)

Footnotes

  1. At least at some point, the compiler interpreted type_ascribe!(x, &mut _) as *const _ as going through an intermediate coercion to &_ which does limit to shared provenance while that's still the case.

@RalfJung
Copy link
Member

I opened #400 for the specific question of whether let-bound variables should be UB to mutate.

@JakobDegen
Copy link
Contributor

JakobDegen commented May 15, 2023

@RalfJung the one thing I will point out here is that it does not apriori have to be the case that for r: &mut u8, r as *const _ and addr_of!(*r) have to do the same thing. Maybe they should, but I wouldn't be terribly shocked if the slightly different syntax led people to have different expectations

@RalfJung
Copy link
Member

I think it would be very surprising if those two ways of turning a mutable ref into a raw ptr would not do the same thing -- I feel fairly strongly they should be the same.

However I can see the question of mutation of let-bound variables being separate from that of mutating through &mut to *const-cast pointers. Hence the separate issue for the former.

@saethlin
Copy link
Member

Users seem to have some kind of intuition that the expression inside addr_of{_mut}! is some kind of special context that provides waves hands simpler/less-UB semantics. I think this is a UI issue with it being a macro instead of what it expands to. I think it is quite important that we eventually deprecate the macro and have an operator that does the job (like &raw), it would be a great shame if we acquire baggage due to the way we got to a stabilized &raw.


In the indefinite future, we should have a stable #![no_core] and when that is stable, not having access to the addr_of! semantics in it may be acutely painful; addr_of! is exactly the flavor of low level operation I expect to be common in core-less code.

thomcc pushed a commit to tcdi/postgrestd that referenced this issue May 31, 2023
avoid mixing accesses of ptrs derived from a mutable ref and parent ptrs

``@Vanille-N`` is working on a successor for Stacked Borrows. It will mostly accept strictly more code than Stacked Borrows did, with one exception: the following pattern no longer works.
```rust
let mut root = 6u8;
let mref = &mut root;
let ptr = mref as *mut u8;
*ptr = 0; // Write
assert_eq!(root, 0); // Parent Read
*ptr = 0; // Attempted Write
```
This worked in Stacked Borrows kind of by accident: when doing the "parent read", under SB we Disable `mref`, but the raw ptrs derived from it remain usable. The fact that we can still use the "children" of a reference that is no longer usable is quite nasty and leads to some undesirable effects (in particular it is the major blocker for resolving rust-lang/unsafe-code-guidelines#257). So in Tree Borrows we no longer do that; instead, reading from `root` makes `mref` and all its children read-only.

Due to other improvements in Tree Borrows, the entire Miri test suite still passes with this new behavior, and even the entire libcore and liballoc test suite, except for these 2 cases this PR fixes. Both of these involve code where the programmer wrote `&mut` but then used pointers derived from that reference in ways that alias with the parent pointer, which arguably is violating uniqueness. They are fixed by properly using raw pointers throughout.
@CAD97
Copy link

CAD97 commented May 14, 2024

Two small potential arguments for addr_of! not providing write-capable provenance:

  • &raw const place has a bit more of a "don't write through this" feeling than addr_of!(place) does, and definitely more than ref/pointer coercion &mut place as *const _ (which doesn't even cause an unused mut lint).
  • Closure capture rules mean that addr_of!(capture) still captures by-ref, which results in generating a write-incapable pointer as it gets derived from the ref-capture.
    • Prior to edition 2021, addr_of!(place.field) captures (and thus reference retags) the entire place. In edition 2021 and later, each field is captured independently (meaning the rest of place doesn't get retagged by the capture).

For full clarity, I am fully in support of preferring &mut place as *const _ being ptr::from_mut(&mut place).cast_const() and not ptr::from_ref(&mut place). (It is currently more accurately &raw const *&mut place.) This is only about addr_of!/&raw const.

And I still think I weakly favor &raw const getting write-capable provenance, because all else being equal, more things being DB and a simpler specification is better. I just think that these observations are interesting to consider.


Because OTPT is nowhere near, I think this is an argument for stabilizing &raw const and &raw mut. Once they're actually available we can see how people actually expect them to behave.

@RustyYato
Copy link

Closure capture rules mean that addr_of!(capture) still captures by-ref, which results in generating a write-incapable pointer as it gets derived from the ref-capture

This sounds like a foot gun. I would have expected it to be captured by raw pointer. But that seems off topic here. I'll open an issue in the main rust repo after investigating this.

@chorman0773
Copy link
Contributor

chorman0773 commented May 14, 2024

TBH, I wouldn't expect a whole new capture mode here.

Changing &T -> *const T would alter type checking rules in language-visible ways, not just operational semantics. It's almost certainly a breaking change (because of auto traits).

@RalfJung
Copy link
Member

RalfJung commented May 14, 2024

Closure capture is an interesting one. The consistent capture mode would be &mut, but that's probably also surprising.

But really the main point to me is that &raw const *raw_mut_pointer (and raw_mut_ptr as *const _, which compiles to the same MIR) should not lose an existing write permission -- I assume we have consensus on that? Having &raw const do different things to the permission depending on the shape of the place expression that follows is a non-compositional nightmare (and I've had to spend my share of time just dealing with that nightmare in Miri; it's particularly bad for Box).

@celinval
Copy link

I'm curious... what is the point of having two types *const T and *mut T if they behave the same way?

As a developer, if I call a function that takes *const T, I expect that function to never change the value of that variable, even if my original variable is mutable.

@chorman0773
Copy link
Contributor

Well, it's intended as an indicator to programmers mostly.

@chorman0773
Copy link
Contributor

Using a &mut capture would also alter well-formedness (degrade from Fn() to FnMut(), and also borrow the type mutably).

@RalfJung
Copy link
Member

We also have *const i32 and *const u32 even though they behave in the same way -- or rather, opsem doesn't make a difference between them. Both the pointee type and mutability are hints for the intended use of this pointer, but not hard guarantees/constraints.

bors added a commit to rust-lang-ci/rust that referenced this issue Aug 18, 2024
Stabilize `raw_ref_op` (RFC 2582)

This stabilizes the syntax `&raw const $expr` and `&raw mut $expr`. It has existed unstably for ~4 years now, and has been exposed on stable via the `addr_of` and `addr_of_mut` macros since Rust 1.51 (released more than 3 years ago). I think it has become clear that these operations are here to stay. So it is about time we give them proper primitive syntax. This has two advantages over the macro:

- Being macros, `addr_of`/`addr_of_mut` could in theory do arbitrary magic with the expression on which they work. The only "magic" they actually do is using the argument as a place expression rather than as a value expression. Place expressions are already a subtle topic and poorly understood by many programmers; having this hidden behind a macro using unstable language features makes this even worse. Conversely, people do have an idea of what happens below `&`/`&mut`, so we can make the subtle topic a lot more approachable by connecting to existing intuition.
- The name `addr_of` is quite unfortunate from today's perspective, given that we have accepted provenance as a reality, which means that a pointer is *not* just an address. Strict provenance has a method, `addr`, which extracts the address of a pointer; using the term `addr` in two different ways is quite unfortunate. That's why this PR soft-deprecates `addr_of` -- we will wait a long time before actually showing any warning here, but we should start telling people that the "addr" part of this name is somewhat misleading, and `&raw` avoids that potential confusion.

In summary, this syntax improves developers' ability to conceptualize the operational semantics of Rust, while making a fundamental operation frequently used in unsafe code feel properly built in.

Possible questions to consider, based on the RFC and [this](rust-lang#64490 (comment)) great summary by `@CAD97:`

- Some questions are entirely about the semantics. The semantics are the same as with the macros so I don't think this should have any impact on this syntax PR. Still, for completeness' sake:
  - Should `&raw const *mut_ref` give a read-only pointer?
    - Tracked at: rust-lang/unsafe-code-guidelines#257
    - I think ideally the answer is "no". Stacked Borrows says that pointer is read-only, but Tree Borrows says it is mutable.
  - What exactly does `&raw const (*ptr).field` require? Answered in [the reference](https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html): the arithmetic to compute the field offset follows the rules of `ptr::offset`, making it UB if it goes out-of-bounds. Making this a safe operation (using `wrapping_offset` rules) is considered too much of a loss for alias analysis.
- Choose a different syntax? I don't want to re-litigate the RFC. The only credible alternative that has been proposed is `&raw $place` instead of `&raw const $place`, which (IIUC) could be achieved by making `raw` a contextual keyword in a new edition. The type is named `*const T`, so the explicit `const` is consistent in that regard. `&raw expr` lacks the explicit indication of immutability. However, `&raw const expr` is quite a but longer than `addr_of!(expr)`.
- Shouldn't we have a completely new, better raw pointer type instead? Yes we all want to see that happen -- but I don't think we should block stabilization on that, given that such a nicer type is not on the horizon currently and given the issues with `addr_of!` mentioned above. (If we keep the `&raw $place` syntax free for this, we could use it in the future for that new type.)
- What about the lint the RFC talked about? It hasn't been implemented yet.  Given that the problematic code is UB with or without this stabilization, I don't think the lack of the lint should block stabilization.
  - I created an issue to track adding it: rust-lang#127724
- Other points from the "future possibilites of the RFC
  - "Syntactic sugar" extension: this has not been implemented. I'd argue this is too confusing, we should stick to what the RFC suggested and if we want to do anything about such expressions, add the lint.
  - Encouraging / requiring `&raw` in situations where references are often/definitely incorrect: this has been / is being implemented. On packed fields this already is a hard error, and for `static mut` a lint suggesting raw pointers is being rolled out.
  - Lowering of casts: this has been implemented. (It's also an invisible implementation detail.)
  - `offsetof` woes: we now have native `offset_of` so this is not relevant any more.

To be done before landing:

- [x] Suppress `unused_parens` lint around `&raw {const|mut}` expressions
  - See bottom of rust-lang#127679 (comment) for rationale
  - Implementation: rust-lang#128782
- [ ] Update the Reference.
  - rust-lang/reference#1567

Fixes rust-lang#64490

cc `@rust-lang/lang` `@rust-lang/opsem`

try-job: x86_64-msvc
try-job: i686-mingw
try-job: test-various
try-job: dist-various-1
try-job: armhf-gnu
try-job: aarch64-apple
bors added a commit to rust-lang-ci/rust that referenced this issue Aug 18, 2024
Stabilize `raw_ref_op` (RFC 2582)

This stabilizes the syntax `&raw const $expr` and `&raw mut $expr`. It has existed unstably for ~4 years now, and has been exposed on stable via the `addr_of` and `addr_of_mut` macros since Rust 1.51 (released more than 3 years ago). I think it has become clear that these operations are here to stay. So it is about time we give them proper primitive syntax. This has two advantages over the macro:

- Being macros, `addr_of`/`addr_of_mut` could in theory do arbitrary magic with the expression on which they work. The only "magic" they actually do is using the argument as a place expression rather than as a value expression. Place expressions are already a subtle topic and poorly understood by many programmers; having this hidden behind a macro using unstable language features makes this even worse. Conversely, people do have an idea of what happens below `&`/`&mut`, so we can make the subtle topic a lot more approachable by connecting to existing intuition.
- The name `addr_of` is quite unfortunate from today's perspective, given that we have accepted provenance as a reality, which means that a pointer is *not* just an address. Strict provenance has a method, `addr`, which extracts the address of a pointer; using the term `addr` in two different ways is quite unfortunate. That's why this PR soft-deprecates `addr_of` -- we will wait a long time before actually showing any warning here, but we should start telling people that the "addr" part of this name is somewhat misleading, and `&raw` avoids that potential confusion.

In summary, this syntax improves developers' ability to conceptualize the operational semantics of Rust, while making a fundamental operation frequently used in unsafe code feel properly built in.

Possible questions to consider, based on the RFC and [this](rust-lang#64490 (comment)) great summary by `@CAD97:`

- Some questions are entirely about the semantics. The semantics are the same as with the macros so I don't think this should have any impact on this syntax PR. Still, for completeness' sake:
  - Should `&raw const *mut_ref` give a read-only pointer?
    - Tracked at: rust-lang/unsafe-code-guidelines#257
    - I think ideally the answer is "no". Stacked Borrows says that pointer is read-only, but Tree Borrows says it is mutable.
  - What exactly does `&raw const (*ptr).field` require? Answered in [the reference](https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html): the arithmetic to compute the field offset follows the rules of `ptr::offset`, making it UB if it goes out-of-bounds. Making this a safe operation (using `wrapping_offset` rules) is considered too much of a loss for alias analysis.
- Choose a different syntax? I don't want to re-litigate the RFC. The only credible alternative that has been proposed is `&raw $place` instead of `&raw const $place`, which (IIUC) could be achieved by making `raw` a contextual keyword in a new edition. The type is named `*const T`, so the explicit `const` is consistent in that regard. `&raw expr` lacks the explicit indication of immutability. However, `&raw const expr` is quite a but longer than `addr_of!(expr)`.
- Shouldn't we have a completely new, better raw pointer type instead? Yes we all want to see that happen -- but I don't think we should block stabilization on that, given that such a nicer type is not on the horizon currently and given the issues with `addr_of!` mentioned above. (If we keep the `&raw $place` syntax free for this, we could use it in the future for that new type.)
- What about the lint the RFC talked about? It hasn't been implemented yet.  Given that the problematic code is UB with or without this stabilization, I don't think the lack of the lint should block stabilization.
  - I created an issue to track adding it: rust-lang#127724
- Other points from the "future possibilites of the RFC
  - "Syntactic sugar" extension: this has not been implemented. I'd argue this is too confusing, we should stick to what the RFC suggested and if we want to do anything about such expressions, add the lint.
  - Encouraging / requiring `&raw` in situations where references are often/definitely incorrect: this has been / is being implemented. On packed fields this already is a hard error, and for `static mut` a lint suggesting raw pointers is being rolled out.
  - Lowering of casts: this has been implemented. (It's also an invisible implementation detail.)
  - `offsetof` woes: we now have native `offset_of` so this is not relevant any more.

To be done before landing:

- [x] Suppress `unused_parens` lint around `&raw {const|mut}` expressions
  - See bottom of rust-lang#127679 (comment) for rationale
  - Implementation: rust-lang#128782
- [ ] Update the Reference.
  - rust-lang/reference#1567

Fixes rust-lang#64490

cc `@rust-lang/lang` `@rust-lang/opsem`

// try-job: i686-mingw // `dump-ice-to-disk` is flaky
try-job: x86_64-msvc
try-job: test-various
try-job: dist-various-1
try-job: armhf-gnu
try-job: aarch64-apple
jieyouxu added a commit to jieyouxu/rust that referenced this issue Aug 18, 2024
…,jieyouxu

Stabilize `raw_ref_op` (RFC 2582)

This stabilizes the syntax `&raw const $expr` and `&raw mut $expr`. It has existed unstably for ~4 years now, and has been exposed on stable via the `addr_of` and `addr_of_mut` macros since Rust 1.51 (released more than 3 years ago). I think it has become clear that these operations are here to stay. So it is about time we give them proper primitive syntax. This has two advantages over the macro:

- Being macros, `addr_of`/`addr_of_mut` could in theory do arbitrary magic with the expression on which they work. The only "magic" they actually do is using the argument as a place expression rather than as a value expression. Place expressions are already a subtle topic and poorly understood by many programmers; having this hidden behind a macro using unstable language features makes this even worse. Conversely, people do have an idea of what happens below `&`/`&mut`, so we can make the subtle topic a lot more approachable by connecting to existing intuition.
- The name `addr_of` is quite unfortunate from today's perspective, given that we have accepted provenance as a reality, which means that a pointer is *not* just an address. Strict provenance has a method, `addr`, which extracts the address of a pointer; using the term `addr` in two different ways is quite unfortunate. That's why this PR soft-deprecates `addr_of` -- we will wait a long time before actually showing any warning here, but we should start telling people that the "addr" part of this name is somewhat misleading, and `&raw` avoids that potential confusion.

In summary, this syntax improves developers' ability to conceptualize the operational semantics of Rust, while making a fundamental operation frequently used in unsafe code feel properly built in.

Possible questions to consider, based on the RFC and [this](rust-lang#64490 (comment)) great summary by `@CAD97:`

- Some questions are entirely about the semantics. The semantics are the same as with the macros so I don't think this should have any impact on this syntax PR. Still, for completeness' sake:
  - Should `&raw const *mut_ref` give a read-only pointer?
    - Tracked at: rust-lang/unsafe-code-guidelines#257
    - I think ideally the answer is "no". Stacked Borrows says that pointer is read-only, but Tree Borrows says it is mutable.
  - What exactly does `&raw const (*ptr).field` require? Answered in [the reference](https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html): the arithmetic to compute the field offset follows the rules of `ptr::offset`, making it UB if it goes out-of-bounds. Making this a safe operation (using `wrapping_offset` rules) is considered too much of a loss for alias analysis.
- Choose a different syntax? I don't want to re-litigate the RFC. The only credible alternative that has been proposed is `&raw $place` instead of `&raw const $place`, which (IIUC) could be achieved by making `raw` a contextual keyword in a new edition. The type is named `*const T`, so the explicit `const` is consistent in that regard. `&raw expr` lacks the explicit indication of immutability. However, `&raw const expr` is quite a but longer than `addr_of!(expr)`.
- Shouldn't we have a completely new, better raw pointer type instead? Yes we all want to see that happen -- but I don't think we should block stabilization on that, given that such a nicer type is not on the horizon currently and given the issues with `addr_of!` mentioned above. (If we keep the `&raw $place` syntax free for this, we could use it in the future for that new type.)
- What about the lint the RFC talked about? It hasn't been implemented yet.  Given that the problematic code is UB with or without this stabilization, I don't think the lack of the lint should block stabilization.
  - I created an issue to track adding it: rust-lang#127724
- Other points from the "future possibilites of the RFC
  - "Syntactic sugar" extension: this has not been implemented. I'd argue this is too confusing, we should stick to what the RFC suggested and if we want to do anything about such expressions, add the lint.
  - Encouraging / requiring `&raw` in situations where references are often/definitely incorrect: this has been / is being implemented. On packed fields this already is a hard error, and for `static mut` a lint suggesting raw pointers is being rolled out.
  - Lowering of casts: this has been implemented. (It's also an invisible implementation detail.)
  - `offsetof` woes: we now have native `offset_of` so this is not relevant any more.

To be done before landing:

- [x] Suppress `unused_parens` lint around `&raw {const|mut}` expressions
  - See bottom of rust-lang#127679 (comment) for rationale
  - Implementation: rust-lang#128782
- [ ] Update the Reference.
  - rust-lang/reference#1567

Fixes rust-lang#64490

cc `@rust-lang/lang` `@rust-lang/opsem`

try-job: x86_64-msvc
try-job: test-various
try-job: dist-various-1
try-job: armhf-gnu
try-job: aarch64-apple
tgross35 added a commit to tgross35/rust that referenced this issue Aug 19, 2024
Stabilize `raw_ref_op` (RFC 2582)

This stabilizes the syntax `&raw const $expr` and `&raw mut $expr`. It has existed unstably for ~4 years now, and has been exposed on stable via the `addr_of` and `addr_of_mut` macros since Rust 1.51 (released more than 3 years ago). I think it has become clear that these operations are here to stay. So it is about time we give them proper primitive syntax. This has two advantages over the macro:

- Being macros, `addr_of`/`addr_of_mut` could in theory do arbitrary magic with the expression on which they work. The only "magic" they actually do is using the argument as a place expression rather than as a value expression. Place expressions are already a subtle topic and poorly understood by many programmers; having this hidden behind a macro using unstable language features makes this even worse. Conversely, people do have an idea of what happens below `&`/`&mut`, so we can make the subtle topic a lot more approachable by connecting to existing intuition.
- The name `addr_of` is quite unfortunate from today's perspective, given that we have accepted provenance as a reality, which means that a pointer is *not* just an address. Strict provenance has a method, `addr`, which extracts the address of a pointer; using the term `addr` in two different ways is quite unfortunate. That's why this PR soft-deprecates `addr_of` -- we will wait a long time before actually showing any warning here, but we should start telling people that the "addr" part of this name is somewhat misleading, and `&raw` avoids that potential confusion.

In summary, this syntax improves developers' ability to conceptualize the operational semantics of Rust, while making a fundamental operation frequently used in unsafe code feel properly built in.

Possible questions to consider, based on the RFC and [this](rust-lang#64490 (comment)) great summary by `@CAD97:`

- Some questions are entirely about the semantics. The semantics are the same as with the macros so I don't think this should have any impact on this syntax PR. Still, for completeness' sake:
  - Should `&raw const *mut_ref` give a read-only pointer?
    - Tracked at: rust-lang/unsafe-code-guidelines#257
    - I think ideally the answer is "no". Stacked Borrows says that pointer is read-only, but Tree Borrows says it is mutable.
  - What exactly does `&raw const (*ptr).field` require? Answered in [the reference](https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html): the arithmetic to compute the field offset follows the rules of `ptr::offset`, making it UB if it goes out-of-bounds. Making this a safe operation (using `wrapping_offset` rules) is considered too much of a loss for alias analysis.
- Choose a different syntax? I don't want to re-litigate the RFC. The only credible alternative that has been proposed is `&raw $place` instead of `&raw const $place`, which (IIUC) could be achieved by making `raw` a contextual keyword in a new edition. The type is named `*const T`, so the explicit `const` is consistent in that regard. `&raw expr` lacks the explicit indication of immutability. However, `&raw const expr` is quite a but longer than `addr_of!(expr)`.
- Shouldn't we have a completely new, better raw pointer type instead? Yes we all want to see that happen -- but I don't think we should block stabilization on that, given that such a nicer type is not on the horizon currently and given the issues with `addr_of!` mentioned above. (If we keep the `&raw $place` syntax free for this, we could use it in the future for that new type.)
- What about the lint the RFC talked about? It hasn't been implemented yet.  Given that the problematic code is UB with or without this stabilization, I don't think the lack of the lint should block stabilization.
  - I created an issue to track adding it: rust-lang#127724
- Other points from the "future possibilites of the RFC
  - "Syntactic sugar" extension: this has not been implemented. I'd argue this is too confusing, we should stick to what the RFC suggested and if we want to do anything about such expressions, add the lint.
  - Encouraging / requiring `&raw` in situations where references are often/definitely incorrect: this has been / is being implemented. On packed fields this already is a hard error, and for `static mut` a lint suggesting raw pointers is being rolled out.
  - Lowering of casts: this has been implemented. (It's also an invisible implementation detail.)
  - `offsetof` woes: we now have native `offset_of` so this is not relevant any more.

To be done before landing:

- [x] Suppress `unused_parens` lint around `&raw {const|mut}` expressions
  - See bottom of rust-lang#127679 (comment) for rationale
  - Implementation: rust-lang#128782
- [ ] Update the Reference.
  - rust-lang/reference#1567

Fixes rust-lang#64490

cc `@rust-lang/lang` `@rust-lang/opsem`

try-job: x86_64-msvc
try-job: test-various
try-job: dist-various-1
try-job: armhf-gnu
try-job: aarch64-apple
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Aug 19, 2024
Rollup merge of rust-lang#127679 - RalfJung:raw_ref_op, r=jieyouxu

Stabilize `raw_ref_op` (RFC 2582)

This stabilizes the syntax `&raw const $expr` and `&raw mut $expr`. It has existed unstably for ~4 years now, and has been exposed on stable via the `addr_of` and `addr_of_mut` macros since Rust 1.51 (released more than 3 years ago). I think it has become clear that these operations are here to stay. So it is about time we give them proper primitive syntax. This has two advantages over the macro:

- Being macros, `addr_of`/`addr_of_mut` could in theory do arbitrary magic with the expression on which they work. The only "magic" they actually do is using the argument as a place expression rather than as a value expression. Place expressions are already a subtle topic and poorly understood by many programmers; having this hidden behind a macro using unstable language features makes this even worse. Conversely, people do have an idea of what happens below `&`/`&mut`, so we can make the subtle topic a lot more approachable by connecting to existing intuition.
- The name `addr_of` is quite unfortunate from today's perspective, given that we have accepted provenance as a reality, which means that a pointer is *not* just an address. Strict provenance has a method, `addr`, which extracts the address of a pointer; using the term `addr` in two different ways is quite unfortunate. That's why this PR soft-deprecates `addr_of` -- we will wait a long time before actually showing any warning here, but we should start telling people that the "addr" part of this name is somewhat misleading, and `&raw` avoids that potential confusion.

In summary, this syntax improves developers' ability to conceptualize the operational semantics of Rust, while making a fundamental operation frequently used in unsafe code feel properly built in.

Possible questions to consider, based on the RFC and [this](rust-lang#64490 (comment)) great summary by `@CAD97:`

- Some questions are entirely about the semantics. The semantics are the same as with the macros so I don't think this should have any impact on this syntax PR. Still, for completeness' sake:
  - Should `&raw const *mut_ref` give a read-only pointer?
    - Tracked at: rust-lang/unsafe-code-guidelines#257
    - I think ideally the answer is "no". Stacked Borrows says that pointer is read-only, but Tree Borrows says it is mutable.
  - What exactly does `&raw const (*ptr).field` require? Answered in [the reference](https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html): the arithmetic to compute the field offset follows the rules of `ptr::offset`, making it UB if it goes out-of-bounds. Making this a safe operation (using `wrapping_offset` rules) is considered too much of a loss for alias analysis.
- Choose a different syntax? I don't want to re-litigate the RFC. The only credible alternative that has been proposed is `&raw $place` instead of `&raw const $place`, which (IIUC) could be achieved by making `raw` a contextual keyword in a new edition. The type is named `*const T`, so the explicit `const` is consistent in that regard. `&raw expr` lacks the explicit indication of immutability. However, `&raw const expr` is quite a but longer than `addr_of!(expr)`.
- Shouldn't we have a completely new, better raw pointer type instead? Yes we all want to see that happen -- but I don't think we should block stabilization on that, given that such a nicer type is not on the horizon currently and given the issues with `addr_of!` mentioned above. (If we keep the `&raw $place` syntax free for this, we could use it in the future for that new type.)
- What about the lint the RFC talked about? It hasn't been implemented yet.  Given that the problematic code is UB with or without this stabilization, I don't think the lack of the lint should block stabilization.
  - I created an issue to track adding it: rust-lang#127724
- Other points from the "future possibilites of the RFC
  - "Syntactic sugar" extension: this has not been implemented. I'd argue this is too confusing, we should stick to what the RFC suggested and if we want to do anything about such expressions, add the lint.
  - Encouraging / requiring `&raw` in situations where references are often/definitely incorrect: this has been / is being implemented. On packed fields this already is a hard error, and for `static mut` a lint suggesting raw pointers is being rolled out.
  - Lowering of casts: this has been implemented. (It's also an invisible implementation detail.)
  - `offsetof` woes: we now have native `offset_of` so this is not relevant any more.

To be done before landing:

- [x] Suppress `unused_parens` lint around `&raw {const|mut}` expressions
  - See bottom of rust-lang#127679 (comment) for rationale
  - Implementation: rust-lang#128782
- [ ] Update the Reference.
  - rust-lang/reference#1567

Fixes rust-lang#64490

cc `@rust-lang/lang` `@rust-lang/opsem`

try-job: x86_64-msvc
try-job: test-various
try-job: dist-various-1
try-job: armhf-gnu
try-job: aarch64-apple
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Aug 20, 2024
Stabilize `raw_ref_op` (RFC 2582)

This stabilizes the syntax `&raw const $expr` and `&raw mut $expr`. It has existed unstably for ~4 years now, and has been exposed on stable via the `addr_of` and `addr_of_mut` macros since Rust 1.51 (released more than 3 years ago). I think it has become clear that these operations are here to stay. So it is about time we give them proper primitive syntax. This has two advantages over the macro:

- Being macros, `addr_of`/`addr_of_mut` could in theory do arbitrary magic with the expression on which they work. The only "magic" they actually do is using the argument as a place expression rather than as a value expression. Place expressions are already a subtle topic and poorly understood by many programmers; having this hidden behind a macro using unstable language features makes this even worse. Conversely, people do have an idea of what happens below `&`/`&mut`, so we can make the subtle topic a lot more approachable by connecting to existing intuition.
- The name `addr_of` is quite unfortunate from today's perspective, given that we have accepted provenance as a reality, which means that a pointer is *not* just an address. Strict provenance has a method, `addr`, which extracts the address of a pointer; using the term `addr` in two different ways is quite unfortunate. That's why this PR soft-deprecates `addr_of` -- we will wait a long time before actually showing any warning here, but we should start telling people that the "addr" part of this name is somewhat misleading, and `&raw` avoids that potential confusion.

In summary, this syntax improves developers' ability to conceptualize the operational semantics of Rust, while making a fundamental operation frequently used in unsafe code feel properly built in.

Possible questions to consider, based on the RFC and [this](rust-lang/rust#64490 (comment)) great summary by `@CAD97:`

- Some questions are entirely about the semantics. The semantics are the same as with the macros so I don't think this should have any impact on this syntax PR. Still, for completeness' sake:
  - Should `&raw const *mut_ref` give a read-only pointer?
    - Tracked at: rust-lang/unsafe-code-guidelines#257
    - I think ideally the answer is "no". Stacked Borrows says that pointer is read-only, but Tree Borrows says it is mutable.
  - What exactly does `&raw const (*ptr).field` require? Answered in [the reference](https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html): the arithmetic to compute the field offset follows the rules of `ptr::offset`, making it UB if it goes out-of-bounds. Making this a safe operation (using `wrapping_offset` rules) is considered too much of a loss for alias analysis.
- Choose a different syntax? I don't want to re-litigate the RFC. The only credible alternative that has been proposed is `&raw $place` instead of `&raw const $place`, which (IIUC) could be achieved by making `raw` a contextual keyword in a new edition. The type is named `*const T`, so the explicit `const` is consistent in that regard. `&raw expr` lacks the explicit indication of immutability. However, `&raw const expr` is quite a but longer than `addr_of!(expr)`.
- Shouldn't we have a completely new, better raw pointer type instead? Yes we all want to see that happen -- but I don't think we should block stabilization on that, given that such a nicer type is not on the horizon currently and given the issues with `addr_of!` mentioned above. (If we keep the `&raw $place` syntax free for this, we could use it in the future for that new type.)
- What about the lint the RFC talked about? It hasn't been implemented yet.  Given that the problematic code is UB with or without this stabilization, I don't think the lack of the lint should block stabilization.
  - I created an issue to track adding it: rust-lang/rust#127724
- Other points from the "future possibilites of the RFC
  - "Syntactic sugar" extension: this has not been implemented. I'd argue this is too confusing, we should stick to what the RFC suggested and if we want to do anything about such expressions, add the lint.
  - Encouraging / requiring `&raw` in situations where references are often/definitely incorrect: this has been / is being implemented. On packed fields this already is a hard error, and for `static mut` a lint suggesting raw pointers is being rolled out.
  - Lowering of casts: this has been implemented. (It's also an invisible implementation detail.)
  - `offsetof` woes: we now have native `offset_of` so this is not relevant any more.

To be done before landing:

- [x] Suppress `unused_parens` lint around `&raw {const|mut}` expressions
  - See bottom of rust-lang/rust#127679 (comment) for rationale
  - Implementation: rust-lang/rust#128782
- [ ] Update the Reference.
  - rust-lang/reference#1567

Fixes rust-lang/rust#64490

cc `@rust-lang/lang` `@rust-lang/opsem`

try-job: x86_64-msvc
try-job: test-various
try-job: dist-various-1
try-job: armhf-gnu
try-job: aarch64-apple
@RalfJung RalfJung added the A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows) label Oct 18, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-aliasing-model Topic: Related to the aliasing model (e.g. Stacked/Tree Borrows)
Projects
None yet
Development

No branches or pull requests