-
Notifications
You must be signed in to change notification settings - Fork 12.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement a lint for implicit autoref of raw pointer dereference #103735
Conversation
Hey! It looks like you've submitted a new PR for the library teams! If this PR contains changes to any Examples of
|
b1f35e2
to
493b5fd
Compare
Haven't looked at the lint implementation yet, but the std changes seem great. :) And yeah I think this can't be more than warn-by-default as a start. |
Oh, nice, the compiler has places which trigger this lint too! |
This comment has been minimized.
This comment has been minimized.
r? @RalfJung |
Co-authored-by: klensy <klensy@users.noreply.github.com> Co-authored-by: Ralf Jung <post@ralfj.de>
2371483
to
459d6ad
Compare
This comment has been minimized.
This comment has been minimized.
4d3a78b
to
6b238e6
Compare
Okay I think this is ready to be brought to the lang team. Also Cc @thomcc from the libs team -- if you could take a loot at the Dear @rust-lang/lang, This PR proposes to add a lint against code like this pub fn test(ptr: *mut [u8]) -> *mut [u8] {
let layout_size = 24;
unsafe { addr_of_mut!((*ptr)[..layout_size]) }
} The problem with this code is that it tries to not create a reference (to make sure there are never any aliasing promises), but actually, this code does create a reference, since it desugars to pub fn test(ptr: *mut [u8]) -> *mut [u8] {
let layout_size = 24;
unsafe { addr_of_mut!((&mut *ptr)[..layout_size]) }
} The lint will propose to make that desugar explicit, so that one can tell by reading the code that a reference is being created -- and one can consider using other methods, such as the unstable This also helps with situations like pub struct Test {
data: [u8],
}
pub fn test_len(t: *const Test) -> usize {
unsafe { (*t).data.len() }
} where a raw slice method exists (unstably) that could have been called, but this creates a reference to call the regular slice Generally speaking, when doing tricky raw pointer manipulation to handle pointer aliasing, it is important to know when references are being created, so I think we need to become better at helping the programmer with that. The lint as currently implemented fires any time some starts with What are your thoughts on this? Are you in favor of such a lint? If the lint as currently implemented is too broad, I could also imagine restrictions of it that are still quite useful, such as only firing when the method being called returns a reference or raw pointer (so whatever aliasing promises are implicitly being made, they actually remain relevant even after this function returns). However that would not help with the situation of accidentally calling the wrong |
☔ The latest upstream changes (presumably #93563) made this pull request unmergeable. Please resolve the merge conflicts. |
I mean, I don't love the look of them, if I'm being honest. The rationale does make a reasonably good case though, so I'll take a closer look tomorrow. |
We discussed this in the lang meeting today, and agreed that the current lint is far more broad than we'd be willing to accept, especially as deny-by-default. We're interested in seeing less impactful versions, like were mentioned above
We were particularly thinking that something around addr_of_mut!((*ptr)[..layout_size]) example was particularly persuasive, much more so than the same expression outside that context. There were some vague ideas around maybe checking against a goal that the stack borrows state not be modified. But also a wish for a translation of our vague mental models into a nice specific writeup from the OpSem. (@rust-lang/lang, please add more if I forgot or mischaracterized something. There was lots of discussion today.) Personally, taking a quick look at the libs changes, stuff like this really doesn't seem like an improvement to me: - assert!((*tail).value.is_none());
+ assert!((&(*tail).value).is_none()); Though admittedly that's in part due to me knowing what those methods do. I'm not sure if there's some form of that intuition that could be formalized, like noting that they don't return borrows and thus the extra thing atop the borrow stack isn't a problem or something? (Or it's also possible that I'm wrong and those are a problem that really ought to be |
Indeed I think this lint should be warn-by-default anyway.
Yeah, the risky cases are
|
In current stacked borrows (which I presume you're alluding to here) what you're asking about doesn't exist. All reborrows change state. Reborrows of the topmost tag tend to be inconsequential, but detecting that code is working with the topmost tag at compile time sounds intractable. Perhaps there are a few very specific cases we could exclude from the lint, but at a glance I don't see them in the code above. (at one point I was very interested in certain code patterns that create inconsequential tags, because I hoped dealing with those could be an alternative to adding a GC to Miri)
I don't know if Ralf chose not to mention this because he intends to fix this in a new aliasing model, but here is an example of this code causing UB (adapted based on https://crates.io/crates/lathe/0.0.0): fn main() {
let mut b = Box::new(Some(0usize));
let raw = Box::into_raw(b);
unsafe {
let r = &*raw;
let ptr = raw as *const Option<usize>;
let new_box = Box::from_raw(ptr as *mut usize);
let z = (*ptr).is_none();
drop(new_box); // Call drop explicitly to make the error simpler
}
}
|
For operations that take an &T and actually read that entire T, whether or not we use a reference or raw ptr (same for &mut T and writing the entire T). That's why e.g. MaybeUninit::read/write probably should not get the lint.
Now, is_none doesn't read the entrie T so there are cases where a raw ptr version of it might make sense, but that seems like a niche case I would not focus on for a lint.
|
T-lang briefly discussed this today. We felt that for a proper discussion we want a summary that outlines:
#103735 (comment) laid out two cases, but it's not quite clear how they generalize to me. That comment defines the lint to be:
Is that entirely accurate? You mention (#103735 (comment)) that the risky cases are:
Reflecting a little, I think that part of the difficulty in the prior discussion was that I think the thing that would help most with this is focusing on the 3rd point above - what is the expected delta in user code after the lint? And perhaps some discussion of why a rule of |
I agree with comments above that the alternative code that is suggested is often not better. If we had postfix - unsafe { addr_of_mut!((*ptr)[..layout_size]) }
+ unsafe { addr_of_mut!((&mut *ptr)[..layout_size]) } something like this definitely is + #[allow(implicit_unsafe_autorefs)]
unsafe { addr_of_mut!((*ptr)[..layout_size]) } I also think we can do much better in targeting this lint only at cases that matter the most. Specifically, I'd like to suggest the following algorithm for determining when this should fire: We look for the following sequence:
The rationale for this is quite simple: a. In the case of place projections, we want to warn users because they created a I think this covers all of the important cases. These two examples from Ralf are covered: pub fn test(ptr: *mut [u8]) -> *mut [u8] {
let layout_size = 24;
unsafe { addr_of_mut!((*ptr)[..layout_size]) }
} pub struct Test {
data: [u8],
}
pub fn test_len(t: *const Test) -> usize {
unsafe { (*t).data.len() }
} Most of the other cases that this fires on (within this PR) are not covered though, and I think that's probably a good thing. I'd like to explicitly call out the fn main() {
let mut b = Box::new(Some(0usize));
let raw = Box::into_raw(b);
unsafe {
let r = &*raw;
let ptr = raw as *const Option<usize>;
let new_box = Box::from_raw(ptr as *mut usize);
let z = match *ptr {
Some(_) => false,
None => true,
};
drop(new_box); // Call drop explicitly to make the error simpler
}
} But that's still UB. The main point here is that the autoref part of the The Speaking a little more philosophically, I think the concern from T-lang was actually exactly right: There is very little that you can think you are doing with a This line of thought even yields another category of methods that we should apply the attribute to: Anything that looks like fn as_ptr(&self) -> *const Self {
self
} Here too, the user is likely to expect that they are only accessing the "pointer value" of List of methods in std that I know about that should get the annotation (will expand as I think of more):
Edit: The list was originally missing item c (the use core::ptr::addr_of;
use core::ops::Deref;
fn main() {
unsafe {
struct W<T>(T);
impl<T> Deref for W<T> {
type Target = T;
fn deref(&self) -> &T { &self.0 }
}
let w: W<i32> = W(5);
let w = addr_of!(w);
let p: *const i32 = addr_of!(**w); // LINT
}
} The user probably expects the line computing |
@JakobDegen well put, I have nothing to add. :) |
I'm going to un-nominate, since it looks like this has been discussed twice in the lang meeting without it getting un-nominated. Personally I things something along the lines of @JakobDegen's sketch sounds good. Please re-nominate if you'd like a quorum opinion on something here. |
Visiting for T-compiler triage. Based on discussion above, we do not think this is waiting-on-team any longer. Instead, the PR author should incorporate the feedback from @JakobDegen above. @rustbot label: -S-waiting-on-team +S-waiting-on-author |
@WaffleLapkin any updates on this? |
@RalfJung @JakobDegen just to be sure, you still agree with the suggestions from #103735 (comment) (for suggesting I think I finally have time to work on this. |
Yes that still sounds like a very good proposal to me. |
/// | ||
/// If you are sure, you can soundly take a reference, then you can take it explicitly: | ||
/// ```rust | ||
/// # use std::ptr::addr_of_mut; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/// # use std::ptr::addr_of_mut; | |
/// use std::ptr::addr_of_mut; |
@WaffleLapkin any updates on this? |
@Dylan-DPC not much, I'm struggling to find to me work on this (and other PRs) :( |
closing in favor of #123239 |
This PR implements a
implicit_unsafe_autorefs
lint that checks for implicit auto-refs of pointer dereference. An example:I've made the lint deny-by-default, because this seems like an important footgun. However, given that even std had quite a few hits, maybe this should be warn-by-default.
Resolves #99437 (I think?)
r? compiler
cc @RalfJung