-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
addr_of[_mut]!
docs should be more precise about what's sound
#114902
Comments
Maybe @rust-lang/opsem have some thoughts on precise wording that should be used. |
That is unfortunate, since when I read them I find them fairly clearly stating this to be not sound. :/ Our mental models must be quite different here. I wonder, if the code did Sadly there are many inaccurate mental models out there about how place and value expression evaluation interacts. You are by far not the first to be confused by these docs. That's why we added the part about I think the proper model here is beautiful and I'd love to teach it to everyone... but maybe, this one time, we don't actually have to. If rust-lang/reference#1387 gets accepted, your code will become sound. |
I think that any invalid example involving |
Yeah, and I wish I had your mental model! I know what it's like to be on the other side of "I don't see what's confusing about this for you", so I'll try as best as I can to articulate how I interpreted the docs.
Maybe I'm missing something, but you said that there are two loads - doesn't that mean that the final dereference (since there are three of them) isn't performing a load? (And, by implication, that final dereference doesn't need to be properly aligned?) Maybe this is the issue: In my mental model, "dereference" doesn't necessarily mean "load". A few examples of why that's my mental model:
For that reason, my mental model of something like fn main() {
let mut x = (0usize,);
let x_ptr: *mut (usize,) = &mut x;
unsafe { (*x_ptr).0 = 1 };
println!("{}", x.0); // prints 1
} My reading of this example is that, if In your example of
Honestly, I think I read the docs like this:
Yeah I would absolutely love to have the correct model in my head! If I had to speculate, I'd say that part of the disconnect between folks like you - who work on this stuff, and especially its internals - and folks like me - who merely consume these semantics - is that it'd be a lot harder for your mental model to be wrong-but-accidentally-not-causing-any-problems. For me, it's easy to feel like I understand things, and so long as I come to the correct conclusions (where "correct" means something like "write code that other programmers and Miri agree is sound"), I never realize I'm wrong until I come to edge cases like these. By contrast, it'd be a lot harder for you to persist with an incorrect mental model because, when working on the compiler internals, much smaller/finer-grained errors in your model have practical consequences than is the case for me. I think about how confusing it used to be for me to reason about indexing/off-by-one errors/etc when I was first learning to program, and now it's not just something I'm good at, it feels obvious and easy. It's hard for me to empathize with what it'd be like not to have an intuitive grasp of those concepts. I think this is basically a Hard Problem. I don't really know of any way past it besides a) lots of navel gazing about how you came to your current understanding and, b) trying out your explanations on people (and then seeing if they can expand on the model they've acquired from those explanations, and if those expansions are correct). On a more pragmatic note, I think there are two pieces of low-hanging fruit here that would make things a lot better:
|
The first sentence is true, but the "by implication" is not. The final dereference produces a place, exactly as you have described, but constructing a place already adds the requirement of being properly aligned, same as you would get if you put a We found this behavior confusing and foot-gunny as well, which is why rust-lang/reference#1387 documents the new behavior, that constructing a place on its own doesn't add any requirements. |
In my mental model, "dereference" doesn't necessarily mean "load".
Okay so it sounds like our models are closer than I thought! The one missing bit is that according to https://doc.rust-lang.org/reference/behavior-considered-undefined.html it is the deref, not the load, where alignment is checked.
I do agree "the usual rules" without further explanation is terrible docs...
|
Sounds good! A few more clarifying questions: First, do field accesses generate loads? E.g.: let p: *const (u8, (u16, (u32, u64))) = ...;
addr_of!((*p).1.1.1); Does this Second, I assume that indirecting through a reference generates a load unconditionally? E.g.: let p: *const (u8, &'static (u16, (u32, u64))) = ...;
addr_of!((*p).1.1.1); I assume that this generates a single load (namely, it loads the value of the |
Under the rules are currently documented and enforced by Miri, loads are irrelevant for alignment. Aligned is checked on Under rust-lang/reference#1387, it is only loads that generate alignment requirements. Field accesses do not require alignment, but they do require "inbounds" ( For the second point, yes -- that code desugars to |
I don't know a better way to explain this. I feel like we want to avoid the explanation that a Place can only contain one Deref. That's why your example has a load through the reference, it desugars into multiple Places. |
That's not even true for the initially built MIR. (Normalizing to "only one deref" is a MIR pass that runs at some point.) It is certainly not true for something like MiniRust. The thing is that |
I ran into another issue with In particular, I have a use case for out-of-bounds projection: using it to compute type layout information based on synthesized pointers. The playground is having issues right now, so here's the code instead. The idea is that I'm trying to figure out the offset of a field within a struct, and I want that to become part of its type information (by assigning to a trait's associated constant). The type is unsized, so I can't construct a (As an aside, I'm also not sure about whether this abides by strict provenance. A previous attempt used Codeuse core::mem::{align_of, size_of};
use core::num::NonZeroUsize;
use core::ptr::NonNull;
/// A trait which describes the layout of sized types
/// and of slice-based DSTs.
unsafe trait KnownLayout {
const MIN_SIZE: usize;
const ALIGN: NonZeroUsize;
const TRAILING_ELEM_SIZE: Option<usize>;
// # Safety
//
// Implementer promises this is aligned to `LargestSupportedAlign`.
const MAX_ALIGN_DANGLING_PTR: NonNull<Self>;
}
macro_rules! impl_known_layout {
($($t:ty),*) => {
$(
unsafe impl KnownLayout for $t {
const MIN_SIZE: usize = size_of::<Self>();
const ALIGN: NonZeroUsize = const_align_of::<Self>();
const MAX_ALIGN_DANGLING_PTR: NonNull<Self> = {
assert!(Self::ALIGN.get() <= align_of::<LargestSupportedAlign>());
LARGEST_SUPPORTED_ALIGN_RAW.cast::<Self>()
};
const TRAILING_ELEM_SIZE: Option<usize> = None;
}
)*
}
}
impl_known_layout!((), u8, u16, u32);
unsafe impl<const N: usize, T> KnownLayout for [T; N] {
const MIN_SIZE: usize = size_of::<Self>();
const ALIGN: NonZeroUsize = const_align_of::<Self>();
const MAX_ALIGN_DANGLING_PTR: NonNull<Self> = {
assert!(Self::ALIGN.get() <= align_of::<LargestSupportedAlign>());
LARGEST_SUPPORTED_ALIGN_RAW.cast::<Self>()
};
const TRAILING_ELEM_SIZE: Option<usize> = None;
}
unsafe impl<T> KnownLayout for [T] {
const MIN_SIZE: usize = 0;
const ALIGN: NonZeroUsize = const_align_of::<T>();
const MAX_ALIGN_DANGLING_PTR: NonNull<Self> = {
assert!(Self::ALIGN.get() <= align_of::<LargestSupportedAlign>());
let elem = LARGEST_SUPPORTED_ALIGN_RAW.cast::<T>();
let slc = core::ptr::slice_from_raw_parts(elem.as_ptr().cast_const(), 0);
unsafe { NonNull::new_unchecked(slc.cast_mut()) }
};
const TRAILING_ELEM_SIZE: Option<usize> = Some(size_of::<T>());
}
#[repr(C)]
struct Foo<A, B, T: ?Sized + KnownLayout> {
a: A,
b: B,
t: T,
}
unsafe impl<A, B, T: ?Sized + KnownLayout> KnownLayout for Foo<A, B, T> {
const MIN_SIZE: usize = {
let slf = Self::MAX_ALIGN_DANGLING_PTR.as_ptr().cast_const();
// TODO: Provenance issues here?
let t_ptr = unsafe { core::ptr::addr_of!((*slf).t) };
let t_offset = unsafe { t_ptr.cast::<u8>().offset_from(slf.cast::<u8>()) };
t_offset as usize + T::MIN_SIZE
};
const ALIGN: NonZeroUsize = {
let aligns = [const_align_of::<A>(), const_align_of::<B>(), T::ALIGN];
let mut max_align = aligns[0];
let mut i = 0;
while i < aligns.len() {
max_align = if aligns[i].get() > max_align.get() {
aligns[i]
} else {
max_align
};
i += 1;
}
max_align
};
const TRAILING_ELEM_SIZE: Option<usize> = T::TRAILING_ELEM_SIZE;
const MAX_ALIGN_DANGLING_PTR: NonNull<Self> = {
assert!(Self::ALIGN.get() <= align_of::<LargestSupportedAlign>());
// SAFETY: Cannot produce un-aligned pointer due to preceding assert.
let self_raw = T::MAX_ALIGN_DANGLING_PTR.as_ptr() as *mut Self;
// SAFETY: `self_raw` came from a non-null pointer.
unsafe { NonNull::new_unchecked(self_raw) }
};
}
// TODO: Making this really big makes rustc very uhappy.
// Luckily, invalidly-aligned pointers shouldn't be a problem
// once rust-lang/reference#1387 lands.
// #[repr(align(536_870_912))] // 2^29
#[repr(align(4096))] // 2^29
struct LargestSupportedAlign;
const LARGEST_SUPPORTED_ALIGN: &LargestSupportedAlign = &LargestSupportedAlign;
const LARGEST_SUPPORTED_ALIGN_RAW: NonNull<LargestSupportedAlign> = {
let ptr: *const LargestSupportedAlign = LARGEST_SUPPORTED_ALIGN;
unsafe { NonNull::new_unchecked(ptr.cast_mut()) }
};
const fn const_align_of<T>() -> NonZeroUsize {
match NonZeroUsize::new(align_of::<T>()) {
Some(align) => align,
None => unreachable!(),
}
}
fn main() {
macro_rules! print_meta {
($($t:ty),*) => {
println!("{:10}\tmin_size align\ttrailing_size\tdangling", "type");
$(
print_meta!(@inner $t);
print_meta!(@inner [$t]);
print_meta!(@inner Foo<u16, u8, $t>);
print_meta!(@inner Foo<u16, u8, [$t]>);
print_meta!(@inner Foo<u16, u8, Foo<u16, u8, [$t]>>);
)*
};
(@inner $t:ty) => {{
let min_size = <$t as KnownLayout>::MIN_SIZE;
let align = <$t as KnownLayout>::ALIGN;
let trailing_size = <$t as KnownLayout>::TRAILING_ELEM_SIZE;
let dangling = <$t as KnownLayout>::MAX_ALIGN_DANGLING_PTR;
println!("{:10}\t{min_size}\t {align}\t{trailing_size:?}\t\t{dangling:?}", stringify!($t));
}};
}
print_meta!((), u8, u16, u32, [(); 2], [u8; 2], [u16; 2], [u32; 2]);
} |
This came up in the t-opsem meeting about whether we should change the rules for Why is |
Yeah, sorry, I should have clarified. I saw discussions to this effect, I just wanted to put a "real world" out-of-bounds projection use case out there in case it was useful information. I'm not actually advocating for this, as overall it sounds like the wrong tradeoff to support it given LLVM.
Once |
Offsets of the unsized field specifically are not a constant so I don't see how |
I'm referring to the offset to the beginning of an unsized field: #[repr(C)]
struct Foo {
a: u8,
b: [u16], // offset to this field guaranteed to be 2, right?
} |
I think that's ok for slices, but |
Given the repr(C) type layout reference, I would assume that the offset of #[repr(C)]
struct Foo<T: ?Sized>{
a: u8,
b: T,
} |
The offset of b depends on the *dynmic* alignment of T, so for dyn Traiy it can differ depending on the underlying dynamic type. This is necessary to ensure that the field is properly aligned.
|
Hmm that raises an interesting question. IIUC, in practice, the way you would obtain a However, in a Demo codeuse std::fmt::Debug;
struct Foo<T: ?Sized> {
a: u8,
t: T,
}
fn print_foo_t(f: &Foo<dyn Debug>) {
let offset = {
let base: *const Foo<dyn Debug> = f;
let t: *const dyn Debug = core::ptr::addr_of!(f.t);
unsafe { t.cast::<u8>().offset_from(base.cast::<u8>()) }
};
println!("{:?}\tat offset {offset}", &f.t);
}
fn main() {
print_foo_t(&Foo { a: 0, t: 0u8 });
print_foo_t(&Foo {
a: 0,
t: (0u32, 1u32),
});
print_foo_t(&Foo {
a: 0,
t: (0u128, 1u128),
});
print_foo_t(&Foo {
a: 0,
t: "hello",
});
} Prints:
|
The vtable includes the size and alignment of the sole unsized field, which is enough to dynamically compute its offset. |
Yes, the code for projecting to a field of unsized type will, at runtime, run The codegen logic for this is around here. |
update and clarify addr_of docs This updates the docs to match rust-lang/reference#1387. Cc `@rust-lang/opsem` `@chorman0773` not sure if you had anything else you wanted to say here, I'd be happy to get your feedback. :) Fixes rust-lang/rust#114902, so Cc `@joshlf`
This example comes from this URLO thread, which in turn was inspired by this zerocopy issue.
The
addr_of!
andaddr_of_mut!
docs imply that they can handle unaligned pointers so long as a reference is never materialized from that pointer (which would be insta-UB). However, running the following program under Miri fails:Here's the failure:
@cuviper pointed out my issue:
The docs strongly imply that this would be sound. It does this in two ways. First, when contrasting references and raw pointers, it specifically calls out that references need to be aligned:
Second, it seems to enumerate the cases in which
addr_of!
would be unsound, and operating on unaligned pointers isn't on the list:IMO the docs as currently written are both vague and misleading with respect to what's allowed. It would be good if the docs were more precise on exactly what is and isn't allowed. I was pretty confident in the soundness of the code I'd written, and only discovered the issue thanks to Miri.
The text was updated successfully, but these errors were encountered: