Implement a lint for implicit autoref of raw pointer dereference #103735

WaffleLapkin · 2022-10-29T15:48:05Z

This PR implements a implicit_unsafe_autorefs lint that checks for implicit auto-refs of pointer dereference. An example:

unsafe {
    let ptr: *mut [u8] = ...;
    (*ptr).len()
    //    ^^^^^^ this implicitly takes a reference,
    //           assuming that `ptr` is a valid pointer, etc
}

I've made the lint deny-by-default, because this seems like an important footgun. However, given that even std had quite a few hits, maybe this should be warn-by-default.

Resolves #99437 (I think?)
r? compiler
cc @RalfJung

rustbot · 2022-10-29T15:48:12Z

Hey! It looks like you've submitted a new PR for the library teams!

If this PR contains changes to any rust-lang/rust public library APIs then please comment with @rustbot label +T-libs-api -T-libs to tag it appropriately. If this PR contains changes to any unstable APIs please edit the PR description to add a link to the relevant API Change Proposal or create one if you haven't already. If you're unsure where your change falls no worries, just leave it as is and the reviewer will take a look and make a decision to forward on if necessary.

Examples of T-libs-api changes:

Stabilizing library features
Introducing insta-stable changes such as new implementations of existing stable traits on existing stable types
Introducing new or changing existing unstable library APIs (excluding permanently unstable features / features without a tracking issue)
Changing public documentation in ways that create new stability guarantees
Changing observable runtime behavior of library APIs

library/alloc/src/collections/btree/node.rs

src/test/ui/lint/implicit_unsafe_autoref.rs

RalfJung · 2022-10-29T15:54:56Z

Haven't looked at the lint implementation yet, but the std changes seem great. :)

And yeah I think this can't be more than warn-by-default as a start.

src/test/ui/lint/implicit_unsafe_autorefs.stderr

WaffleLapkin · 2022-10-29T16:03:26Z

Oh, nice, the compiler has places which trigger this lint too!

petrochenkov · 2022-10-29T19:26:05Z

r? @RalfJung

compiler/rustc_lint/src/implicit_unsafe_autorefs.rs

Co-authored-by: klensy <klensy@users.noreply.github.com> Co-authored-by: Ralf Jung <post@ralfj.de>

RalfJung · 2022-11-12T11:39:13Z

Okay I think this is ready to be brought to the lang team. Also Cc @thomcc from the libs team -- if you could take a loot at the library diff here and give feedback on whether you consider these good changes or not, that would be helpful.

Dear @rust-lang/lang,

This PR proposes to add a lint against code like this

pub fn test(ptr: *mut [u8]) -> *mut [u8] {
    let layout_size = 24;
    unsafe { addr_of_mut!((*ptr)[..layout_size]) }
}

The problem with this code is that it tries to not create a reference (to make sure there are never any aliasing promises), but actually, this code does create a reference, since it desugars to

pub fn test(ptr: *mut [u8]) -> *mut [u8] {
    let layout_size = 24;
    unsafe { addr_of_mut!((&mut *ptr)[..layout_size]) }
}

The lint will propose to make that desugar explicit, so that one can tell by reading the code that a reference is being created -- and one can consider using other methods, such as the unstable get_unchecked_mut on raw slices.

This also helps with situations like

pub struct Test {
    data: [u8],
}

pub fn test_len(t: *const Test) -> usize {
    unsafe { (*t).data.len() }
}

where a raw slice method exists (unstably) that could have been called, but this creates a reference to call the regular slice len method.

Generally speaking, when doing tricky raw pointer manipulation to handle pointer aliasing, it is important to know when references are being created, so I think we need to become better at helping the programmer with that. The lint as currently implemented fires any time some starts with *ptr (where ptr is a raw pointer), followed possibly by some field projections and built-in indexing, and then finally that place is turned into a reference via an implicit &/&mut.

What are your thoughts on this? Are you in favor of such a lint? If the lint as currently implemented is too broad, I could also imagine restrictions of it that are still quite useful, such as only firing when the method being called returns a reference or raw pointer (so whatever aliasing promises are implicitly being made, they actually remain relevant even after this function returns). However that would not help with the situation of accidentally calling the wrong len.

bors · 2022-11-13T15:12:52Z

☔ The latest upstream changes (presumably #93563) made this pull request unmergeable. Please resolve the merge conflicts.

thomcc · 2022-11-13T18:42:21Z

Also Cc @thomcc from the libs team -- if you could take a loot at the library diff here and give feedback on whether you consider these good changes or not, that would be helpful.

I mean, I don't love the look of them, if I'm being honest. The rationale does make a reasonably good case though, so I'll take a closer look tomorrow.

scottmcm · 2022-11-29T20:54:52Z

We discussed this in the lang meeting today, and agreed that the current lint is far more broad than we'd be willing to accept, especially as deny-by-default.

We're interested in seeing less impactful versions, like were mentioned above

I could also imagine restrictions of it that are still quite useful

We were particularly thinking that something around ptr::addr_of(_mut)! might make sense, as that feels more to us kindof like an "opt-in" for "I really wanted to stay in pointer-land". As such, the

addr_of_mut!((*ptr)[..layout_size])

example was particularly persuasive, much more so than the same expression outside that context.

There were some vague ideas around maybe checking against a goal that the stack borrows state not be modified. But also a wish for a translation of our vague mental models into a nice specific writeup from the OpSem.

(@rust-lang/lang, please add more if I forgot or mischaracterized something. There was lots of discussion today.)

Personally, taking a quick look at the libs changes, stuff like this really doesn't seem like an improvement to me:

-                assert!((*tail).value.is_none());
+                assert!((&(*tail).value).is_none());

Though admittedly that's in part due to me knowing what those methods do. I'm not sure if there's some form of that intuition that could be formalized, like noting that they don't return borrows and thus the extra thing atop the borrow stack isn't a problem or something?

(Or it's also possible that I'm wrong and those are a problem that really ought to be assert_matches!((*tail).value, None); to look only at the place without ever making a reference at all?)

RalfJung · 2022-11-30T08:08:02Z

especially as deny-by-default.

Indeed I think this lint should be warn-by-default anyway.

Though admittedly that's in part due to me knowing what those methods do. I'm not sure if there's some form of that intuition that could be formalized, like noting that they don't return borrows and thus the extra thing atop the borrow stack isn't a problem or something?

Yeah, the risky cases are

functions that return a borrow or raw pointer, so how their argument was created matters long-term
functions that don't actually need to read from the reference they are given -- specifically, len, where we can entirely avoid messing with the borrow stack by using raw pointers

saethlin · 2022-11-30T15:57:42Z

There were some vague ideas around maybe checking against a goal that the stack borrows state not be modified.

In current stacked borrows (which I presume you're alluding to here) what you're asking about doesn't exist. All reborrows change state. Reborrows of the topmost tag tend to be inconsequential, but detecting that code is working with the topmost tag at compile time sounds intractable. Perhaps there are a few very specific cases we could exclude from the lint, but at a glance I don't see them in the code above. (at one point I was very interested in certain code patterns that create inconsequential tags, because I hoped dealing with those could be an alternative to adding a GC to Miri)

stuff like this really doesn't seem like an improvement to me

I don't know if Ralf chose not to mention this because he intends to fix this in a new aliasing model, but here is an example of this code causing UB (adapted based on https://crates.io/crates/lathe/0.0.0):

fn main() {
    let mut b = Box::new(Some(0usize));
    let raw = Box::into_raw(b);
    unsafe {
        let r = &*raw;
        let ptr = raw as *const Option<usize>;
        let new_box = Box::from_raw(ptr as *mut usize);
        let z = (*ptr).is_none();
        drop(new_box); // Call drop explicitly to make the error simpler
    }
}

error: Undefined Behavior: trying to retag from <3171> for Unique permission at alloc1601[0x0], but that tag does not exist in the borrow stack for this location
 --> demo.rs:9:14
  |
9 |         drop(new_box); // Call drop explicitly to make the error simpler
  |              ^^^^^^^
  |              |
  |              trying to retag from <3171> for Unique permission at alloc1601[0x0], but that tag does not exist in the borrow stack for this location
  |              this error occurs as part of retag at alloc1601[0x0..0x8]
  |
  = help: this indicates a potential bug in the program: it performed an invalid operation, but the Stacked Borrows rules it violated are still experimental
  = help: see https://github.com/rust-lang/unsafe-code-guidelines/blob/master/wip/stacked-borrows.md for further information
help: <3171> was created by a Unique retag at offsets [0x0..0x8]
 --> demo.rs:7:23
  |
7 |         let new_box = Box::from_raw(ptr as *mut usize);
  |                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
help: <3171> was later invalidated at offsets [0x0..0x10] by a SharedReadOnly retag
 --> demo.rs:8:17
  |
8 |         let z = (*ptr).is_none();
  |                 ^^^^^^^^^^^^^^^^
  = note: BACKTRACE:
  = note: inside `main` at demo.rs:9:14

RalfJung · 2022-11-30T17:15:13Z

For operations that take an &T and actually read that entire T, whether or not we use a reference or raw ptr (same for &mut T and writing the entire T). That's why e.g. MaybeUninit::read/write probably should not get the lint. Now, is_none doesn't read the entrie T so there are cases where a raw ptr version of it might make sense, but that seems like a niche case I would not focus on for a lint.

Mark-Simulacrum · 2023-01-03T20:17:32Z

T-lang briefly discussed this today. We felt that for a proper discussion we want a summary that outlines:

These are the patterns that should be linted on (ideally with an example, if the pattern is broader than a specific method or similar)
Why those patterns are dangerous (e.g., "can cause stacked borrows retagging to occur unintentionally and trigger SB UB")
- Do we have a good sense of whether the danger is inherent or just due to current SB rules, i.e., do we have an expectation that we want to change the rules to make the pattern not dangerous? (If we do, then linting based on current rules may not make sense).
What the alternative for each pattern is (i.e., how are users expected to fix the lint?)
- It seems like several of the examples in this PR rely on unstable functions as the fix, which makes linting on stable have a difficult bar. (But the desire to lint may help justify stabilization of those library functions).

#103735 (comment) laid out two cases, but it's not quite clear how they generalize to me. That comment defines the lint to be:

fires any time some starts with *ptr (where ptr is a raw pointer), followed possibly by some field projections and built-in indexing, and then finally that place is turned into a reference via an implicit &/&mut.

Is that entirely accurate?

You mention (#103735 (comment)) that the risky cases are:

functions that return a borrow or raw pointer, so how their argument was created matters long-term, or

functions that don't actually need to read from the reference they are given -- specifically, len, where we can entirely avoid messing with the borrow stack by using raw pointers

Reflecting a little, I think that part of the difficulty in the prior discussion was that (*ptr) was felt to carry ~all of the same implications that &*ptr/&mut *ptr does, which made it hard to justify having to write out the &mut explicitly. There was also some feeling that perhaps ptr::addr_of{,mut}! change that "equivalency", but it wasn't clear to those in the room the extent to which that's true.

I think the thing that would help most with this is focusing on the 3rd point above - what is the expected delta in user code after the lint? And perhaps some discussion of why a rule of * being dangerous (rather than &) isn't "enough".

JakobDegen · 2023-01-04T02:31:09Z

I agree with comments above that the alternative code that is suggested is often not better. If we had postfix & or &mut, I might feel differently. Fortunately though, I think there is a nice alternative: Explicitly encourage users to allow the lint if it is what they intended. This is nice because the #[allow] acts like a comment, warning other readers that there is something potentially surprising going on here. In other words, while something like this may or may not be an improvement

-   unsafe { addr_of_mut!((*ptr)[..layout_size]) }
+   unsafe { addr_of_mut!((&mut *ptr)[..layout_size]) }

something like this definitely is

+   #[allow(implicit_unsafe_autorefs)]
    unsafe { addr_of_mut!((*ptr)[..layout_size]) }

I also think we can do much better in targeting this lint only at cases that matter the most. Specifically, I'd like to suggest the following algorithm for determining when this should fire: We look for the following sequence:

Deref of a raw pointer
Any number of place projections
An automatically inserted reference.
Any number of automatically inserted deref/derefmut calls.
Either of the following:
a. A deref followed by any non-deref place projection (that intermediate deref will typically be auto-inserted)
b. A method call annotated with #[rustc_no_implicit_refs].
c. Edit: A deref followed by a addr_of! or addr_of_mut!. See bottom of post for details.

The rationale for this is quite simple:

a. In the case of place projections, we want to warn users because they created a &Whole despite thinking they only accessed the part field.
b. In the case of suitably annotated method calls, we want to warn users because something about the method means they are likely to be surprised that calling that method makes aliasing assertions about all of &self (or &mut self).

I think this covers all of the important cases. These two examples from Ralf are covered:

pub fn test(ptr: *mut [u8]) -> *mut [u8] {
    let layout_size = 24;
    unsafe { addr_of_mut!((*ptr)[..layout_size]) }
}

pub struct Test {
    data: [u8],
}

pub fn test_len(t: *const Test) -> usize {
    unsafe { (*t).data.len() }
}

Most of the other cases that this fires on (within this PR) are not covered though, and I think that's probably a good thing. I'd like to explicitly call out the .is_none() example above. The natural way to remove the .is_none() (and autoref) is to instead write:

fn main() {
    let mut b = Box::new(Some(0usize));
    let raw = Box::into_raw(b);
    unsafe {
        let r = &*raw;
        let ptr = raw as *const Option<usize>;
        let new_box = Box::from_raw(ptr as *mut usize);
        let z = match *ptr {
            Some(_) => false,
            None => true,
        };
        drop(new_box); // Call drop explicitly to make the error simpler
    }
}

But that's still UB. The main point here is that the autoref part of the .is_none() method call is not doing anything that the user doesn't know is happening anyway - both are sort of acting like a "read" of the underlying place, and that is enough to trigger the UB. Said differently: The autoref is not at fault. The user thinks they are reading the Option<_>, they are getting the semantics of a read to the Option<_>, and that is UB.

The .len() call on slices differs from this significantly. The autoref that happens as a part of this call acts like a read of the backing memory of the slice, while the user expects .len() to only read the data in the wide pointer. That specific disagreement is what makes .len() so dangerous.

Speaking a little more philosophically, I think the concern from T-lang was actually exactly right: There is very little that you can think you are doing with a *ptr that is weaker than &*ptr. The two big examples are basically: Accessing only part of the place (instead of all of it), or accessing the pointer itself (instead of the data it points to). Those are exactly the cases that I'm suggesting we fire the lint on.

This line of thought even yields another category of methods that we should apply the attribute to: Anything that looks like

fn as_ptr(&self) -> *const Self {
    self
}

Here too, the user is likely to expect that they are only accessing the "pointer value" of self, but by creating a &self they get the semantics of a read to the backing memory.

List of methods in std that I know about that should get the annotation (will expand as I think of more):

get, get_mut, get_unchecked, get_unchecked_mut, len on slices.

Edit: The list was originally missing item c (the addr_of! case). That case is another manifestation of the "thought we were accessing a pointer, accessed the backing memory." For example:

use core::ptr::addr_of;
use core::ops::Deref;

fn main() {
    unsafe {
        struct W<T>(T);
        
        impl<T> Deref for W<T> {
            type Target = T;
            fn deref(&self) -> &T { &self.0 }
        }
        
        let w: W<i32> = W(5);
        let w = addr_of!(w);
        let p: *const i32 = addr_of!(**w); // LINT
    }
}

The user probably expects the line computing p to have semantics of just doing pointer computations, but because there is a &W<i32> (and &i32) for the Deref call, they instead get semantics implying a read of the pointed-to memory.

RalfJung · 2023-01-04T10:30:21Z

@JakobDegen well put, I have nothing to add. :)
(Strangely I didn't get an email notification for your post... not sure if GH just screwed up entirely for your post or just specifically failed to notify me.)

scottmcm · 2023-01-24T19:56:47Z

I'm going to un-nominate, since it looks like this has been discussed twice in the lang meeting without it getting un-nominated.

Personally I things something along the lines of @JakobDegen's sketch sounds good.

Please re-nominate if you'd like a quorum opinion on something here.

pnkfelix · 2023-03-09T15:40:30Z

Visiting for T-compiler triage.

Based on discussion above, we do not think this is waiting-on-team any longer. Instead, the PR author should incorporate the feedback from @JakobDegen above.

@rustbot label: -S-waiting-on-team +S-waiting-on-author

Dylan-DPC · 2023-10-08T06:26:31Z

@WaffleLapkin any updates on this?

WaffleLapkin · 2023-11-26T16:19:49Z

@RalfJung @JakobDegen just to be sure, you still agree with the suggestions from #103735 (comment) (for suggesting #[allow] and changing the detection)?

I think I finally have time to work on this.

RalfJung · 2023-11-26T20:19:31Z

Yes that still sounds like a very good proposal to me.

ghost · 2023-11-26T20:37:24Z

compiler/rustc_lint/src/implicit_unsafe_autorefs.rs

+    ///
+    /// If you are sure, you can soundly take a reference, then you can take it explicitly:
+    /// ```rust
+    /// # use std::ptr::addr_of_mut;


Suggested change

/// # use std::ptr::addr_of_mut;

/// use std::ptr::addr_of_mut;

Dylan-DPC · 2024-02-05T13:15:34Z

@WaffleLapkin any updates on this?

WaffleLapkin · 2024-02-05T15:42:11Z

@Dylan-DPC not much, I'm struggling to find to me work on this (and other PRs) :(

WaffleLapkin · 2024-03-31T17:05:07Z

closing in favor of #123239

rustbot assigned petrochenkov Oct 29, 2022

rustbot added S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-libs Relevant to the library team, which will review and decide on the PR/issue. labels Oct 29, 2022

RalfJung reviewed Oct 29, 2022

View reviewed changes

library/alloc/src/collections/btree/node.rs Outdated Show resolved Hide resolved

WaffleLapkin added 2 commits October 29, 2022 19:53

Implement a lint for implicit autoref of raw pointer dereference

561dfaf

Make some auto-refs in std explicit

493b5fd

RalfJung reviewed Oct 29, 2022

View reviewed changes

src/test/ui/lint/implicit_unsafe_autoref.rs Outdated Show resolved Hide resolved

WaffleLapkin force-pushed the scarry-derefs branch from b1f35e2 to 493b5fd Compare October 29, 2022 15:54

RalfJung reviewed Oct 29, 2022

View reviewed changes

src/test/ui/lint/implicit_unsafe_autorefs.stderr Outdated Show resolved Hide resolved

This comment has been minimized.

Sign in to view

rustbot assigned RalfJung and unassigned petrochenkov Oct 29, 2022

klensy reviewed Oct 29, 2022

View reviewed changes

compiler/rustc_lint/src/implicit_unsafe_autorefs.rs Outdated Show resolved Hide resolved

RalfJung reviewed Oct 31, 2022

View reviewed changes

compiler/rustc_lint/src/implicit_unsafe_autorefs.rs Outdated Show resolved Hide resolved

RalfJung reviewed Oct 31, 2022

View reviewed changes

compiler/rustc_lint/src/implicit_unsafe_autorefs.rs Outdated Show resolved Hide resolved

RalfJung reviewed Oct 31, 2022

View reviewed changes

compiler/rustc_lint/src/implicit_unsafe_autorefs.rs Outdated Show resolved Hide resolved

WaffleLapkin and others added 6 commits November 2, 2022 12:02

Apply suggestions from code review

e1c0b69

Co-authored-by: klensy <klensy@users.noreply.github.com> Co-authored-by: Ralf Jung <post@ralfj.de>

implicit_unsafe_autorefs: make the lint warn by default

4a7791d

implicit_unsafe_autorefs: lint field access too

af4ade4

implicit_unsafe_autorefs: more std fixes

45a6283

implicit_unsafe_autorefs: compiler fixes

0efeae6

strip trailing whitespace

459d6ad

WaffleLapkin force-pushed the scarry-derefs branch from 2371483 to 459d6ad Compare November 2, 2022 18:48

This comment has been minimized.

Sign in to view

implicit_unsafe_autorefs: ui test fixes

ef427e6

rust-cloud-vms bot force-pushed the scarry-derefs branch from 4d3a78b to 6b238e6 Compare November 7, 2022 16:46

WaffleLapkin requested a review from RalfJung November 7, 2022 17:50

RalfJung added T-lang Relevant to the language team, which will review and decide on the PR/issue. I-lang-nominated Nominated for discussion during a lang team meeting. labels Nov 12, 2022

RalfJung added S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Nov 26, 2022

scottmcm removed the I-lang-nominated Nominated for discussion during a lang team meeting. label Jan 31, 2023

rustbot added S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. and removed S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). labels Mar 9, 2023

ghost reviewed Nov 26, 2023

View reviewed changes

Urgau mentioned this pull request Mar 30, 2024

Implement a lint for implicit autoref of raw pointer dereference - take 2 #123239

Open

WaffleLapkin closed this Mar 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement a lint for implicit autoref of raw pointer dereference #103735

Implement a lint for implicit autoref of raw pointer dereference #103735

WaffleLapkin commented Oct 29, 2022

rustbot commented Oct 29, 2022

RalfJung commented Oct 29, 2022

WaffleLapkin commented Oct 29, 2022

This comment has been minimized.

petrochenkov commented Oct 29, 2022

This comment has been minimized.

RalfJung commented Nov 12, 2022

bors commented Nov 13, 2022

thomcc commented Nov 13, 2022

scottmcm commented Nov 29, 2022

RalfJung commented Nov 30, 2022 •

edited

Loading

saethlin commented Nov 30, 2022 •

edited

Loading

RalfJung commented Nov 30, 2022 via email

Mark-Simulacrum commented Jan 3, 2023

JakobDegen commented Jan 4, 2023 •

edited

Loading

RalfJung commented Jan 4, 2023

scottmcm commented Jan 24, 2023 •

edited

Loading

pnkfelix commented Mar 9, 2023

Dylan-DPC commented Oct 8, 2023

WaffleLapkin commented Nov 26, 2023

RalfJung commented Nov 26, 2023

ghost Nov 26, 2023

Dylan-DPC commented Feb 5, 2024

WaffleLapkin commented Feb 5, 2024

WaffleLapkin commented Mar 31, 2024

	/// # use std::ptr::addr_of_mut;
	/// use std::ptr::addr_of_mut;

Implement a lint for implicit autoref of raw pointer dereference #103735

Implement a lint for implicit autoref of raw pointer dereference #103735

Conversation

WaffleLapkin commented Oct 29, 2022

rustbot commented Oct 29, 2022

RalfJung commented Oct 29, 2022

WaffleLapkin commented Oct 29, 2022

This comment has been minimized.

petrochenkov commented Oct 29, 2022

This comment has been minimized.

RalfJung commented Nov 12, 2022

bors commented Nov 13, 2022

thomcc commented Nov 13, 2022

scottmcm commented Nov 29, 2022

RalfJung commented Nov 30, 2022 • edited Loading

saethlin commented Nov 30, 2022 • edited Loading

RalfJung commented Nov 30, 2022 via email

Mark-Simulacrum commented Jan 3, 2023

JakobDegen commented Jan 4, 2023 • edited Loading

RalfJung commented Jan 4, 2023

scottmcm commented Jan 24, 2023 • edited Loading

pnkfelix commented Mar 9, 2023

Dylan-DPC commented Oct 8, 2023

WaffleLapkin commented Nov 26, 2023

RalfJung commented Nov 26, 2023

ghost Nov 26, 2023

Choose a reason for hiding this comment

Dylan-DPC commented Feb 5, 2024

WaffleLapkin commented Feb 5, 2024

WaffleLapkin commented Mar 31, 2024

RalfJung commented Nov 30, 2022 •

edited

Loading

saethlin commented Nov 30, 2022 •

edited

Loading

JakobDegen commented Jan 4, 2023 •

edited

Loading

scottmcm commented Jan 24, 2023 •

edited

Loading