Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rename ptr::invalid -> ptr::without_provenance #117658

Merged
merged 1 commit into from
Feb 21, 2024

Conversation

RalfJung
Copy link
Member

@RalfJung RalfJung commented Nov 7, 2023

It has long bothered me that ptr::invalid returns a pointer that is actually valid for zero-sized memory accesses. In general, it doesn't even make sense to ask "is this pointer valid", you have to ask "is this pointer valid for a given memory access". We could say that a pointer is invalid if it is not valid for any memory access, but the way this FCP is going, it looks like all pointers will be valid for zero-sized memory accesses.

Two possible alternative names emerged as people's favorites:

  1. Something involving dangling, in analogy to NonNull::dangling. To avoid inconsistency with the NonNull method, the address-taking method could be called dangling_at(addr: usize) -> *const T.
  2. without_provenance, to be symmetric with the inverse operation ptr.addr_without_provenance() (currently still called ptr.addr() but probably going to be renamed)

I have no idea which one of these is better. I read this comment as expressing a slight preference for something like the second option, so I went for that. I'm happy to go with dangling_at as well.

Cc @rust-lang/opsem

@rustbot
Copy link
Collaborator

rustbot commented Nov 7, 2023

r? @m-ou-se

(rustbot has picked a reviewer for you, use r? to override)

@rustbot rustbot added O-unix Operating system: Unix-like O-windows Operating system: Windows S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Nov 7, 2023
@rust-log-analyzer

This comment has been minimized.

@scottmcm
Copy link
Member

scottmcm commented Nov 8, 2023

Also, ptr::dangling is a lot like NonNull::dangling except that the address is chosen by the user, so it makes sense to use the same name.

Bikeshed: I might expect that ptr::dangling::<T>() worked to get a *const T just like NonNull::<T>::dangling() gives me a NonNull<T>? So I'm not necessarily convinced that this parallel works that well.

Not that I have a better name in mind. dangling_at or something isn't great.

@CAD97
Copy link
Contributor

CAD97 commented Nov 8, 2023

I agree that invalid is an unfortunate name given that we're FCP-merge on considering all pointers valid for zero-sized accesses.

I agree that ptr::dangling() is a reasonable name for the functionality, but also that it seems like that name should be a nullary constructor like ptr::NonNull::dangling().

By analogy with ptr::from_exposed_addr, this function could maybe be from_dangling_addr or dangling_with_addr. Or it could be spelled as a two part dangling().with_addr(_), but a direct function serves as a place to attach docs to and to help avoid people from reaching for the single step addr as *mut _ instead.

And just as a note, calling the function from_addr would be a very poor choice, as then the wrong and UB prone ptr::from_addr(ptr.addr()) has no indication that it's a bad idea while also having the nicer names compared to ptr::from_exposed_addr(ptr.expose_addr()).

@CAD97
Copy link
Contributor

CAD97 commented Nov 8, 2023

CI failure is that hashbrown (cfg(feature = "rustc-dep-of-std")) uses ptr::invalid_mut. Thus when remaining, we'll need to carry a #[deprecated] re-export with the old name for at least long enough to release and update std to using a new version of hashbrown. cc @Amanieu

@RalfJung
Copy link
Member Author

RalfJung commented Nov 8, 2023

My plan was to fix hashbrown before landing this: rust-lang/hashbrown#481

@RalfJung
Copy link
Member Author

RalfJung commented Nov 8, 2023

Regarding the name: yes the inconsistency with NonNull::dangling being nullary is a bit annoying. But I thought this wouldn't actually be confusing enough to justify picking a different name; the compiler will easily point out when there are too many / too few arguments.

If we don't want to overload dangling like that, then dangling_with_addr would work, but it is very verbose. dangling_addr would be a bit less consistent but also less verbose. I don't find dangling_at so bad either...

bors added a commit to rust-lang/hashbrown that referenced this pull request Nov 15, 2023
avoid using unstable ptr::invalid_mut

I want to rename that function in rust-lang/rust#117658, but currently the build fails because hashbrown uses it.
@joboet
Copy link
Member

joboet commented Nov 16, 2023

How about loose as a name, as in: "is not tied to an allocated object"?

@RalfJung
Copy link
Member Author

That sounds like a pointer one could use to access any allocated object. Which is not what this is.

@CAD97
Copy link
Contributor

CAD97 commented Nov 16, 2023

FWIW I've turned around and like dangling as a name just fine again. At worst it's a single added compile-edit cycle to get it wrong, and both ptr::dangling(addr) and ptr::NonNull::dangling() are locally obvious about what they do. layout.dangling() is the odd one out here if any are, honestly.

@RalfJung
Copy link
Member Author

I was considering adding ptr::dangling() and the renaming the variant that takes the address to something longer. But in the end I am fine either way.

Let's see what t-libs-api says.

@RalfJung RalfJung added the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Nov 17, 2023
@RalfJung
Copy link
Member Author

Based on t-libs-api feedback, let's stick to ptr::dangling{,_mut} for now.

@Amanieu this is now blocked on a hashbrown release.

@bors
Copy link
Contributor

bors commented Nov 24, 2023

☔ The latest upstream changes (presumably #118232) made this pull request unmergeable. Please resolve the merge conflicts.

@RalfJung
Copy link
Member Author

@Amanieu would be nice to get a new release of hashbrown that includes rust-lang/hashbrown#481 so that this PR can move to the next stage. :)

@m-ou-se
Copy link
Member

m-ou-se commented Nov 28, 2023

We discussed this in last week's libs-api meeting, but didn't reach a conclusion yet.

It'd be useful to know the common use cases for this function.

Some things that came up in the meeting:

  • Inconsistency with NonNull::dangling() isn't great.
  • If ptr::dangling() took no arguments, the proposed functionality would be ptr::dangling().with_addr().
    • That's equivalent to ptr::null().with_addr().
    • Depending on the expected use cases, that might be too verbose or unclear.
  • Many use cases involve stuffing non-pointer data (some integer or flag) into a pointer or AtomicPtr.
    • For those use cases, the name "invalid" might be fine as is.

@m-ou-se
Copy link
Member

m-ou-se commented Nov 28, 2023

I'll note that the methods to go from a pointer to a usize are called .addr() and .expose_addr(), while the opposites are called ptr::invalid() and ptr::from_exposed_addr().

Purely by looking at these names, from_addr would be the most consistent name for ptr::invalid().

That's not very clear though, which might suggest that .addr() should also have a different name, depending on what we end up calling ptr::invalid.

(Edit March 2024: I now think addr is fine. I see no need for symmetry here. See #121588 (comment))

@m-ou-se m-ou-se removed the I-libs-api-nominated Nominated for discussion during a libs-api team meeting. label Nov 28, 2023
@RalfJung
Copy link
Member Author

That's not very clear though, which might suggest that .addr() should also have a different name, depending on what we end up calling ptr::invalid.

Yeah, addr probably needs a rename. Some alternative names have been mentioned on Zulip, though none of them make me really excited: so far we got lossy_addr/addr_lossy and bare_addr/addr_bare. So the corresponding inverses would be from_lossy_addr (which makes little sense) and from_bare_addr (which makes sense but doesn't warn about the fact that this pointer cannot be used to access any memory).

To me it seems more important to warn about the fact that this pointer cannot be used to access any memory than to be consistent with the inverse function -- ideally we'd do both but I can't think of a way to do that.

@dtolnay
Copy link
Member

dtolnay commented Nov 28, 2023

it seems more important to warn about the fact that this pointer cannot be used to access any memory than to be consistent with the inverse function

👍 to this.

I share the sentiment of not feeling great about either of lossy_addr or bare_addr.

Ralf's comment in rust-lang/unsafe-code-guidelines#478 (comment) caught my eye: I sometimes call them "(raw) integer pointers", but that doesn't really fit the pattern here.

I wonder if there is a way to lean into the interpretation of these being integers dressed as pointers, not so much pointers referring to some (invalid/dangling) "address".

What I mean is going even further than ptr.numerical_addr() / ptr::from_num{ber,erical_addr}, as these names still overly emphasize an address-ness which is not really there. If Ralf calls them an "integer pointer", maybe they're rather a "pointy integer".

pub mod ptr {
    pub fn pointy_integer<T>(value: usize) - > *const T;
    pub fn pointy_integer_mut<T>(value: usize) - > *mut T;
}

The thing to figure out would be whether a "pointy integer" is the thing you get back from ptr::pointy_integer(value) (integer dressed as a pointer), or the thing you get back from ptr.expose_addr() (pointer dressed as an integer). To me, using "address" to describe the latter makes sense, and that leaves "pointy integer" unambiguously describing the former.

@RalfJung
Copy link
Member Author

RalfJung commented Dec 2, 2023

It'd be useful to know the common use cases for this function.

  • implementing the ptr::null and NonNull::dangling functions -- basically this function is the source of all dangling pointers
  • creating pointers that are just sentinel values and will never be dereferenced
  • creating a pointer "out of thin air" that is valid for zero-sized reads, e.g. when implementing an iterator that needs a special code path for the ZST case

@rust-log-analyzer

This comment has been minimized.

@bors
Copy link
Contributor

bors commented Feb 21, 2024

💔 Test failed - checks-actions

@bors bors added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. and removed S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. labels Feb 21, 2024
also introduce ptr::dangling matching NonNull::dangling
@RalfJung
Copy link
Member Author

@bors r=m-ou-se

@bors
Copy link
Contributor

bors commented Feb 21, 2024

📌 Commit b58f647 has been approved by m-ou-se

It is now in the queue for this repository.

@bors bors added S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Feb 21, 2024
@bors
Copy link
Contributor

bors commented Feb 21, 2024

⌛ Testing commit b58f647 with merge 3406ada...

@bors
Copy link
Contributor

bors commented Feb 21, 2024

☀️ Test successful - checks-actions
Approved by: m-ou-se
Pushing 3406ada to master...

@bors bors added the merged-by-bors This PR was explicitly merged by bors. label Feb 21, 2024
@bors bors merged commit 3406ada into rust-lang:master Feb 21, 2024
12 checks passed
@rustbot rustbot added this to the 1.78.0 milestone Feb 21, 2024
@rust-timer
Copy link
Collaborator

Finished benchmarking commit (3406ada): comparison URL.

Overall result: no relevant changes - no action needed

@rustbot label: -perf-regression

Instruction count

This benchmark run did not return any relevant results for this metric.

Max RSS (memory usage)

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
3.5% [2.4%, 4.5%] 2
Regressions ❌
(secondary)
- - 0
Improvements ✅
(primary)
-4.2% [-4.2%, -4.2%] 1
Improvements ✅
(secondary)
-3.4% [-3.5%, -3.4%] 2
All ❌✅ (primary) 0.9% [-4.2%, 4.5%] 3

Cycles

This benchmark run did not return any relevant results for this metric.

Binary size

Results

This is a less reliable metric that may be of interest but was not used to determine the overall result at the top of this comment.

mean range count
Regressions ❌
(primary)
0.0% [0.0%, 0.1%] 38
Regressions ❌
(secondary)
0.0% [0.0%, 0.0%] 4
Improvements ✅
(primary)
- - 0
Improvements ✅
(secondary)
- - 0
All ❌✅ (primary) 0.0% [0.0%, 0.1%] 38

Bootstrap: 649.698s -> 651.719s (0.31%)
Artifact size: 310.95 MiB -> 310.97 MiB (0.01%)

github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Feb 22, 2024
rename ptr::invalid -> ptr::without_provenance

It has long bothered me that `ptr::invalid` returns a pointer that is actually valid for zero-sized memory accesses. In general, it doesn't even make sense to ask "is this pointer valid", you have to ask "is this pointer valid for a given memory access". We could say that a pointer is invalid if it is not valid for *any* memory access, but [the way this FCP is going](rust-lang/unsafe-code-guidelines#472), it looks like *all* pointers will be valid for zero-sized memory accesses.

Two possible alternative names emerged as people's favorites:
1. Something involving `dangling`, in analogy to `NonNull::dangling`. To avoid inconsistency with the `NonNull` method, the address-taking method could be called `dangling_at(addr: usize) -> *const T`.
2. `without_provenance`, to be symmetric with the inverse operation `ptr.addr_without_provenance()` (currently still called `ptr.addr()` but probably going to be renamed)

I have no idea which one of these is better. I read [this comment](rust-lang/rust#117658 (comment)) as expressing a slight preference for something like the second option, so I went for that. I'm happy to go with `dangling_at` as well.

Cc `@rust-lang/opsem`
@RalfJung RalfJung deleted the ptr-dangling branch February 24, 2024 11:14
adpaco-aws added a commit to model-checking/kani that referenced this pull request Feb 29, 2024
Upgrades the Rust toolchain to `nightly-2024-02-25`. The Rust compiler
PRs that triggered changes in this upgrades are:
 * rust-lang/rust#121209
 * rust-lang/rust#121309
 * rust-lang/rust#120863
 * rust-lang/rust#117772
 * rust-lang/rust#117658

With rust-lang/rust#121309 some intrinsics
became inlineable so their names became qualified. This made our `match`
on the intrinsic name to fail in those cases, leaving them as
unsupported constructs as in this example:

```
warning: Found the following unsupported constructs:
             - _RNvNtCscyGW2MM2t5j_4core10intrinsics8unlikelyCs1eohKeNmpdS_5arith (3)
             - caller_location (1)
             - foreign function (1)
         
         Verification will fail if one or more of these constructs is reachable.
         See https://model-checking.github.io/kani/rust-feature-support.html for more details.

[...]

Failed Checks: _RNvNtCscyGW2MM2t5j_4core10intrinsics8unlikelyCs1eohKeNmpdS_5arith is not currently supported by Kani. Please post your example at https://github.com/model-checking/kani/issues/new/choose
 File: "/home/ubuntu/.rustup/toolchains/nightly-2024-02-18-x86_64-unknown-linux-gnu/lib/rustlib/src/rust/library/core/src/num/mod.rs", line 25, in core::num::<impl i8>::checked_add
```

We use `trimmed_name()` to work around this, but that may include type
arguments if the intrinsic is defined on generics. So in those cases, we
just take the first part of the name so we can keep the rest as before.

Resolves #3044
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
merged-by-bors This PR was explicitly merged by bors. O-unix Operating system: Unix-like O-windows Operating system: Windows S-waiting-on-bors Status: Waiting on bors to run and complete tests. Bors will change the label on completion. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.