Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Alignment niches for references types. #3204

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

moulins
Copy link

@moulins moulins commented Dec 6, 2021

This RFC proposes to add niches to reference-like types by exploiting unused bit patterns caused by alignment requirements.

Rendered

Previous discussion on IRLO

@nagisa
Copy link
Member

nagisa commented Dec 6, 2021

This RFC does not explain how references to the reference would be handled. That is, the following code must be well formed:

pub fn banana(val: &mut E) -> Option<&mut &u16> {
    match val {
        E::A(ref mut val) => Some(val),
        _ => None
    }
}

let reference: &mut &u16 = banana(enum).unwrap();
static SOME_REF: u16 = 42;
*reference = &SOME_REF; // How does the compiler know it needs to copy over / remove the discriminant from alignment bits?

Right now the value returned by banana violates the validity invariants for the reference types and I don't see how the alignment niche bits could be masked out anywhere, as the knowledge of the reference being part of an enum is lost.

@moulins
Copy link
Author

moulins commented Dec 6, 2021

Right now the value returned by banana violates the validity invariants for the reference types and I don't see how the alignment niche bits could be masked out anywhere, as the knowledge of the reference being part of an enum is lost.

Under my proposal, either we have a E::A and the value inside is a valid reference, correctly aligned, or we don't have a reference at all, so there's no problem.

It works exactly the same way as the null pointer optimization (take your example and replace &mut E by &mut Option<&u16>).

- Should the `Aligned<T>` type be part of `stdlib`'s public API? This would allow library authors (like `hashbrown`) to also exploit these niches.
- "size niches" could also be introduced: e.g. `&[u8; 100]` has no alignment niches, but has size niches above `100.wrapping_neg()` (thanks scottmcm for the idea!)
- This somewhat complicates the validity invariant of `Aligned<T>`, is it acceptable?
- This forces `Weak<T>` to drop back to using a `NonNull<T>`, because the sentinel value is no longer a valid `Aligned<T>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Weak<T> could just be:

pub struct Weak<T>(WeakInner<T>);
enum WeakInner<T> {
    Sentinal = -1,
    Ptr(Aligned<T>),
}

sidestepping that issue.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd go so far as to say Weak is a perfect use case for this feature!

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, I didn't know this was possible. (but WeakInner<T> should wrap a RcBox<T>)

However, this relies on the fact that -1 is always an invalid Aligned<RcBox<T>> value, whatever T is. This is true if "size niches" are implemented, but how does the compiler derive this automatically?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The simple to say way is that if an explicit discriminant falls in the niche of the other value(s), it can be niched into it by the layout pass.

(And it'd be enum WeakRepr { Dangling = -1, Real(Aligned<RcBox<T>>) } to ensure it uses the RcBox alignment. (I still wish we could unify the RcBox and ArcInner names to use the same naming convention.))

Additionally, we don't guarantee the value of the dangling weak sentinel, and I don't think we guarantee that None::<Weak<_>> is null, so the discriminant doesn't have to be specified, so long as a niche remains for Option to take.

@scottmcm scottmcm added the T-lang Relevant to the language team, which will review and decide on the RFC. label Dec 6, 2021

## `Aligned<T>`

To handle other smart pointers (`Arc<T>`, `Rc<T>`, `Weak<T>`), a new `std`-internal type `std::ptr::Aligned<T>` is introduced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This really shouldn't be std-internal — There are lots of use cases for this in the ecosystem.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree in principle, but adding a new public type grows the scope of this RFC, because now we have to design an API to go with it. As long as Aligned<T> is internal, we can limit the API to the strict minimum.


To handle other smart pointers (`Arc<T>`, `Rc<T>`, `Weak<T>`), a new `std`-internal type `std::ptr::Aligned<T>` is introduced.

This type behaves mostly like `NonNull<T>`, but also requires that the wrapped pointer is well-aligned, enabling the use of the attributes presented in the previous section.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pondering: Change NonNull<T> to be NonNull<T, ALIGN = 1>. So then you'd have type Aligned<T> = NonNull<T, ALIGN = align_of::<T>()>;.

Hmm, maybe that can't work, though, since there's already NonNull<T>: From<&T>, which would overlap from Aligned<T>: From<&T> for 1-aligned stuff...

I'm just thinking that this could be nice for other stuff, too, since it could have a .read method that would be able to be smart about read or read_unaligned, for example. (Or even just become a new intrinsic that would pass the actual alignment to the LLVM load/stores, rather than needing to pick between the "I promise it's normally-aligned" and "you have to treat it as 1-aligned" versions -- so you could still have a Aligned<u8, 16>, for example, if you got it from casting a simd address.)

Copy link
Author

@moulins moulins Dec 7, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This would be nice, to avoid duplicating the whole NonNull<T> API in a new Aligned<T> type.

Could the From overlap be resolved by having NonNull<T, 0>: From<&T>, and Aligned<T>: From<&T>?
If "size niches" exist, only ALIGN=1 (and greater) would have them, to not break current NonNull<T> usage that uses -1 as a manual sentinel value.

Now, does the compiler support the required features to do this in std? (we need an unstable defaulted const parameter, and some sort of const specialization to support both From implementations)

@programmerjake
Copy link
Member

one possible way to spell Aligned<T> is &'unsafe T: see #3199

@moulins
Copy link
Author

moulins commented Dec 7, 2021

one possible way to spell Aligned<T> is &'unsafe T

That could actually be pretty elegant; it would remove the need for defining the new niches via custom attributes.

@fstirlitz
Copy link

With respect to Aligned<_>: I remember people some time ago floating the idea of adding yet more pointer niches by recognising the fact that under some ABIs, certain pointer ranges are reserved and cannot possibly be allocated addresses (see e.g. #2400). If work on this feature were to resume, we’d have to invent yet another wrapper to express yet another pointer invariant.

So how about having a single std::ptr::WellFormed<_> wrapper that captures all possible pointer well-formedness invariants at once? Meaning, std::ptr::WellFormed<T> wraps a value of a memory address at which a T might possibly be allocated (though it need not be actually possible to dereference).

@Lokathor
Copy link
Contributor

Lokathor commented Dec 8, 2021

Once you set whatever "well formed" means as Stable you can't add more requirements later or you potentially invalidate old code. Even new platforms should likely not add new requirements or the old code will not actually be portable to those platforms and it will silently break. You could gate the effects of WellFormed by edition, and then rustfix would know that it can't touch that code and just give an error that it needs to be adjusted manually to re-check invariants.

Other than that it's a fine plan.

@fstirlitz
Copy link

Good point. But it also applies to modifying the ABI of regular references, which this proposal does without much restraint.

In practice I wouldn’t expect WellFormed<_> to be much different from &_ in terms of ABI, other than the compiler not being allowed to introduce spurious accesses to the former (via loop-invariant code motion and such). I guess this may make it justified to spell the feature as &'unsafe _ outright.

@moulins
Copy link
Author

moulins commented Dec 8, 2021

I remember people some time ago floating the idea of adding yet more pointer niches by recognising the fact that under some ABIs, certain pointer ranges are reserved and cannot possibly be allocated addresses

This isn't possible without a breaking change; for example, the following code is currently sound on all targets, independently of any reserved pointer ranges:

let ptr = 0xFEDCBA9876543210 as usize as *const ();
let zst: &() = unsafe { &* ptr };

This fact is also required for Vec soundness: an empty Vec points to a zero-sized slice constructed from such a dangling pointer.

It may be enough to special-case references to ZSTs to not have these requirements, but this seems pretty ad-hoc.


Another issue is that checking if a *mut T can be cast to a WellFormed<T> is non trivial, and would give different results depending on obscure differences between architectures.
In comparison, the check for Align<T> is simply ptr != 0 && ptr % align_of::<T>() == 0 (or, if size niches exist, the slightly more complicated ptr.wrapping_add(size_of::<T>()) <= size_of::<T>() && ptr % align_of::<T>() == 0), which only depends on a few parameters (size of the address space, and size/alignment of T).

@fstirlitz
Copy link

I don’t think this should be considered sound. If applied to ZSTs used as proof tokens, it would render them toothless, especially if they are Copy. And yes, I have a use case for such.

More philosophically, you should not be allowed to pull references out of thin air just because you don’t need to actually access any memory when dereferencing them. To claim the existence of a value at a given address you have to have sound knowledge that memory has been allocated and a value has been placed there. I don’t think it makes any difference that the extent of the allocation is empty and ‘placing’ the value happens to be a no-op at the machine code level. Someone must have had put it there, at least conceptually. For such a claim to be valid, it has to point to a place where values can ever possibly reside. To put it differently, even if ZSTs can reside anywhere in the address space, it doesn’t mean the address space has to extend throughout the entire range of the usize type. This is no different from singling out the zero address as invalid.

Though on the other hand, singling out ZSTs wouldn’t necessarily be so unprincipled, given that we had proposals like #2040 and #2754.

I suppose Vec<_> could equally well point empty vectors to some kind of static EMPTY_VEC: [(); 0] = [];. Or simply ‘dangle’ pointers somewhere else in the address space. Though if there is more such code in the wild, then it indeed is a concern.

As for the simplicity argument, adding a zero page reservation would add only one parameter as well, the beginning of the address space: this would change ptr != 0 into ptr >= LOWEST_ADDR, and ptr.wrapping_add(size_of::<T>()) > size_of::<T>() (what you meant above, I suppose) into ptr.wrapping_add(size_of::<T>()) >= LOWEST_ADDR + size_of<T>(). Though yes, in the general case, it would have to be an intrinsic with a target-specific implementation.

@thomcc
Copy link
Member

thomcc commented Dec 9, 2021

A concern I'd have with Aligned<T> being &'unsafe T is that &'unsafe T, if I understand it correctly, has the same invariants as T in many cases. Specifically stuff like aliasing, and the range covered by the borrow, and (afaict) initialization of the pointee. This would limit the set of use cases greatly. They do seem similar though, so perhaps there is a way to solve both

Or perhaps not... In truth, I think I see far more uses for this than I see for &'unsafe T (even though I sympathize with the pain of working with pointers in unsafe code)..

@Lokathor
Copy link
Contributor

Lokathor commented Dec 9, 2021

You can pull a reference to a zero-sized Copy data type with a public constructor out of thin air (assuming it's at an aligned address).

If the constructor isn't public, or the type isn't Copy, then you're potentially breaking someone's invariants and that's gonna be a bad time.

@CAD97
Copy link

CAD97 commented Dec 10, 2021

@fstirlitz to be clear, conjuring a ZST via dereferencing an arbitrary nonnull well-aligned pointer is sound, as far as the language is concerned. This isn't changing.

However, it is semantically identical to transmute::<(), ZST>, which is also sound, as far as the language is concerned.

Where the unsoundness enters, is that it's Wrong ("library unsound") to violate privacy and create a type not through the APIs provided to construct it. If someone writes unsafe to violate your library requirements, they've committed an unsound act, and you're free to execute UB in your library with that as proof.


Back on the topic of ptr::WellFormed<_>... the problem is that the invariants of ptr::WellFormed<_> are basically that the pointer points to a place that was valid to dereference previously, but not necessarily now. (IOW, this is what &'unsafe means to me, I think.) The pointer having been previously valid is the only way that you can know for sure that it is well formed for dereferencing, including whatever interesting extra rules the target may have for valid pointers (to nonzero bytes), such as normal form or the zero page.

I'd love to be proven wrong in the future, but I don't really see all that much value for a compiler-known, niche-optimized pointer type between "known nonnull and well aligned" (ptr::NonNull) and "valid reference with programmer-checked lifetime" (&'unsafe). I do see benefit to having both of these.

@Lokathor
Copy link
Contributor

There's certainly some value in having "all the invariants of a reference but stop bugging me about the lifetime".

In other words, it's always aligned, and if the target location is still live then it does contain initialized data in a valid state.

@jonas-schievink
Copy link
Contributor

Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles. What's the proposed implementation strategy for this RFC?

@the8472
Copy link
Member

the8472 commented Dec 11, 2021

Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles.

I tried something similar last year, but instead of trying to make it depend on specific Ts I tried to implement

I remember people some time ago floating the idea of adding yet more pointer niches by recognising the fact that under some ABIs, certain pointer ranges are reserved and cannot possibly be allocated addresses

It may be enough to special-case references to ZSTs to not have these requirements, but this seems pretty ad-hoc.

By doing a conservative layout analysis of T to see if we could be certain that it has a non-zero size (e.g. it being an aggregate that contains at least one scalar field, an enum with at least one such variant etc. etc.). Anything that would require another query was considered as potentially zero-sized.

I got as far as compiling stage-1 core and alloc but it blew up when compiling std somewhere, if I recall correctly, in LLVM normalizing a pointer subtraction to a wrapping add (meaning the offset was > isize::MAX) and then failing an LLVM assertion that the normalized version was out of range for the data type (since I redefined references to be only valid for 1..isize::MAX).

I didn't investigate further whether that was a bug in LLVM or my attempt was generating nonsensical IR. I have no guess how easy or difficult it would be to fix that issue.

A similar conservative approach might be possible for alignments by trying to determine the minimum knowable alignment for a type instead of the exact alignment.

@moulins
Copy link
Author

moulins commented Dec 12, 2021

Back on the topic of ptr::WellFormed<_>... the problem is that the invariants of ptr::WellFormed<_> are basically that the pointer points to a place that was valid to dereference previously, but not necessarily now. (IOW, this is what &'unsafe means to me, I think.) The pointer having been previously valid is the only way that you can know for sure that it is well formed for dereferencing, including whatever interesting extra rules the target may have for valid pointers (to nonzero bytes), such as normal form or the zero page.

After some thinking, this actually seems a workable solution: add the WellFormed<T>/&'unsafe T type, with your "must point to a maybe-deallocated memory location valid for a T" invariant. This implies the following:

  • a WellFormed<T> is always non-null;
  • a WellFormed<T> is always well-aligned;
  • ptr.offset(1) is always sound; in particular, it doesn't wrap around the address space.

These three properties are enough to exploit alignment and size niches.

One advantage of WellFormed<T> over Aligned<T> is that because they can only be constructed from previously-valid references, we can transfer over whatever invariants we require of references without having to make them explicit.
This means that we don't need to commit to some specific bit-level validity invariants (alignment, etc) for WellFormed<T>, and that adding more in the future (restricted address ranges, etc...) isn't a breaking change.

WellFormed<ZST> created from thin air are still sound, because the language explicitely guarantees that any non-null, well-aligned pointer is a valid allocation for a ZST, so any such constant pointer can be soundly casted to a WellFormed.

The only issue is NonNull::dangling; a WellFormed::dangling equivalent is desirable, but it doesn't respect the "must be created from a existing allocation" requirement. A way to resolve this would be to define dangling as returning a pointer to a logically preexisting-but-deallocated region, but I don't know if this actually makes sense.


Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles. What's the proposed implementation strategy for this RFC?

I must admit I didn't think that far yet. 😃 I have never worked on rustc, so I have no idea of the gory details of such a change.
However, I suspect adding niches only for Aligned<T>/Unique<T> to be easier, because they are actual types defined in libcore, not compiler intrinsics. This would have the bizarre consequence that Box<T> has more niches than &T though, which may break some assumptions elsewhere in the compiler.

@Lokathor
Copy link
Contributor

ptr.offset(size_of::<T>())

Minor nit, offset takes units, not bytes

@programmerjake
Copy link
Member

Last time I tried this, making the layout computation of &T depend on the alignment of T completely broke the compiler due to query cycles. What's the proposed implementation strategy for this RFC?

I must admit I didn't think that far yet. 😃 I have never worked on rustc, so I have no idea of the gory details of such a change. However, I suspect adding niches only for Aligned<T>/Unique<T> to be easier, because they are actual types defined in libcore, not compiler intrinsics. This would have the bizarre consequence that Box<T> has more niches than &T though, which may break some assumptions elsewhere in the compiler.

I don't actually know for sure, but from what I've heard about how rustc's internals are structured, I'd expect the issue to not be that &T is builtin and WellFormed<T> is defined in the library, but instead that you could define a type like so:

pub enum MyType {
    A(WellFormed<MyType>),
    B,
    C,
}

For MyType, to calculate the size and alignment, you have to know if B and C can fit in A's niches, which requires calculating WellFormed<MyType>'s niches, which requires calculating MyType's size and alignment all over again, forming a loop.

@programmerjake
Copy link
Member

programmerjake commented Dec 12, 2021

As an extreme example, let's assume that we're on a 16-bit architecture (pointers are 16-bits) for simplicity:

pub enum MyEnum1<'a> {
    Ref(&'a MyEnum2<'a>),
    Z = 0,
    P1 = 1,
    N1 = -1,
    N2 = -2,
    N3 = -3,
}

pub enum MyEnum2<'a> {
    Ref(&'a MyEnum1<'a>),
    Z = 0,
    P1 = 1,
    N1 = -1,
    N2 = -2,
    N3 = -3,
}

There are two possible equally valid sets of layouts:

  • MyEnum1 has size=4 and MyEnum2 has size=2 with MyEnum2's other enumerants fitting into &MyEnum1's niches which extend from -4 to -1 for the size niches as well as -1 to 1 for the alignment niches. MyEnum1 has more enumerants than niches in &MyEnum2 so there's a separate i16 tag field in MyEnum1 making it take 4 bytes rather than 2.

  • The reversed size assignments: MyEnum2 has size=4 and MyEnum1 has size=2 with the rest of the logic being analogous to the above case.

@CAD97
Copy link

CAD97 commented Dec 12, 2021

@programmerjake's example holds for size niches, but can a similar example be created for alignment niches? I don't think so... though this would still require splitting alignment and size calculation, so I think it could be strictly ordered alignment→niches→size, if only alignment and not size niches are considered.

Even in the case of recursive requirements, it's solvable, if not optimally. Niche optimization is an optimization, so it's fine to miss an optimization in some cases (if not ideal). "Just" (yes it's more complicated) break cycles by picking some types and not giving their references any size/align niches. (In order to assure the computation is deterministic and gives the same answer no matter what part of the dependency cycle is queried first, it should be sufficient to pessimize types in a total ordering, or even just to pessimize the whole cycle.) This wouldn't ever give worse results than current, without the extra niches.

@fstirlitz
Copy link

They might as well be both 2 bytes: with variant Z having bit representation 0x0000, P1 being 0x0001, N1 being 0x0003, N2 being 0x0005, N3 being 0x0007. There is no rule that the discriminant must be directly visible in the bit representation.

@fstirlitz
Copy link

As for dangling: I thought the point of the niche optimisation is to remove the need for such a thing. But in the worst case, it could be simply declared valid by fiat.

@CAD97
Copy link

CAD97 commented Dec 12, 2021

N2 being 0x0005, N3 being 0x0007

As currently speced, the niche is only ptr < size_of::<T>() || size_of::<T>() < ptr (off by one on those?). As currently implemented, rustc's niche system only supports valid_scalar_range_start and valid_scalar_range_end to restrict a value to a smaller continuous range. rustc does not currently support more complicated sub-object/sub-byte niching, and adding support for such niches will make niching much more complicated and make the space/time trade-off much less of a clear win.

The example as given exhibits dual-minima with simple continuous range niching. The situation still exists with full alignment niche exploitation, even if it's significantly harder to hit -- just add a lot more variants -- and as such needs to be handled.

@ogoffart
Copy link

I'd like to add a comment here relative to my experiments relative to the efficiency of niches.

It all started because I wanted to implement the use of niche for enum with more data member. (What is mentioned in the "Future possibilities" of this RFC.) The PR was rust-lang/rust#70477 , but the benchmarking saw some perf regressions in the compiler and then i did not have the time to continue this experiment.
Anyway, my theory is that the compilation is slower when using niche because the generated LLVM code when using niche is not optimal, and lead to code that is both slower to compile, and slower.
I made an experiment in here: rust-lang/rust#77816 this disable the niche optimization for all the structures bigger than 16bytes. And the result is that for many benchmark, it if faster not to use the niche.

Anyway, what's relevant for this RFC, is that before enabling more niches, one should probably try to actually make sure that the niche optimization generate decent code (I summarized some possible improvement in rust-lang/rust#75866 (comment) )

@nox

This comment has been minimized.

@CAD97
Copy link

CAD97 commented Feb 23, 2022

Making size|align_of_val return incorrect results for custom DSTs is pretty much a nonoption. It works for extern types because they truly have dynamically unknowable layout (as far as Rust is concerned). Extern types are basically C void*; somebody else knows. You can't allocate or free a pointer to extern type.

Custom DSTs, though, are useful because the language knows how to interact with them (mediated by the custom code of course). In order to be able to have e.g. Box<CustomDst>, it needs to be able to get the layout to drop and deallocate the place.

This isn't the thread for custom potentially thin DSTs, so we needn't hash out how they're supposed to work here. I just want to note that treating custom thin DSTs like extern types removes any benefit of the feature.

The option that's worth mentioning is requiring layout extraction to work even after the object is dropped. It's not the whole solution, but what interacts with ptr::WellFormed w.r.t. method safety.

@programmerjake
Copy link
Member

The option that's worth mentioning is requiring layout extraction to work even after the object is dropped. It's not the whole solution, but what interacts with ptr::WellFormed w.r.t. method safety.

Requiring layout extraction to work after drop isn't enough to make the layout extraction functions safe, it would also have to work after deallocation for the functions to be safe (as in not marked with unsafe). after drop but before deallocation, loads through the pointer would still work (assuming you didn't tell llvm otherwise). after deallocation, loads through the pointer are UB.

@programmerjake
Copy link
Member

I don't think that's enough; if you have a r: &mut T you can always do ptr::copy(r as *mut u8, r as *mut u8, size_of_val(r)) which logically does nothing, but is still a non-atomic write.

If you have a &mut T, then you have exclusive access to that T, so non-atomic writes aren't an issue since, due to the exclusivity, there's guaranteed to be nothing the writes could race with. any reads caused by reading the layout must happen either before or after the non-atomic write, they can't occur simultaneously due to &mut's exclusivity guarantees. This is exactly why AtomicI32::get_mut and UnsafeCell::get_mut are safe.

@moulins
Copy link
Author

moulins commented Feb 23, 2022

This isn't the thread for custom potentially thin DSTs, so we needn't hash out how they're supposed to work here. I just want to note that treating custom thin DSTs like extern types removes any benefit of the feature.

What would be the correct place to discuss this? Should I open a new issue somewhere?


If you have a &mut T, then you have exclusive access to that T, so non-atomic writes aren't an issue [...]

Consider the following:

fn noop_write<T: ?Sized>(v: &mut T) {
  let size = std::mem::size_of_val(v);
  let p = v as *mut T as *mut u8;
  // Safety:
  // - we have exclusive access, so writes are fine
  // - we leave the object in the same state we've found it.
  unsafe { std::ptr::copy(p, p, size) }
}

fn data_race(v: Arc<Mutex<ThinDst>>) {
    let v1 = v.clone();
    let _ = std::thread::spawn(move || {
       // Get exclusive access, and do a non-synchronized write to the whole value...
       let v: &mut ThinDst = &mut *v1.lock().unwrap();
       noop_write(v);
    });

    let _ = std::thread::spawn(move || {
       // ...but the `size_of_val` call races with the write by accessing the data inside the mutex!
       let mutex: &Mutex<ThinDst> = &*v;
       let _ = std::mem::size_of_val(mutex);
    });
}

@programmerjake
Copy link
Member

If you have a &mut T, then you have exclusive access to that T, so non-atomic writes aren't an issue [...]

Consider the following:

hmm, I'm inclined to say that the code that created the Mutex<ThinDST> has library UB, since Mutex assumes its insides don't need to be accessed in order to inspect the Mutex itself through a shared reference. I recognize that that view of things is probably problematic though...

@nox
Copy link
Contributor

nox commented May 30, 2022

I now remember the issue I had last time I tried to make use of alignment for niches.

Consider the following function:

unsafe fn transmute_opt_references<'a, T: 'a, U: 'a>(val: Option<&'a T>) -> &'a U {
    std::mem::transmute(val)
}

This currently typechecks and builds because the layout of Option<&T> and Option<&U> is computable during type check.

Now, if we make Option<&T>'s layout be dependent on T alignment, this will not typecheck or build anymore, and AFAIK even @eddyb's min_align_of suggestion wouldn't help us here.

I can't see a way to just split the layout_of query into more queries to fix that. Instead, it seems to me that we must special-case Option<&T>-looking enums (i.e. enums with exactly 2 variants, where one of the two has a non-nullable field) so that they don't check the alignment of the non-nullable field, to compute their layout, while making Option<Option<&T>> still check the alignment to use the right tag for the outer None.

@eddyb
Copy link
Member

eddyb commented May 31, 2022

Instead, it seems to me that we must special-case Option<&T>-looking enums (i.e. enums with exactly 2 variants, where one of the two has a non-nullable field) so that they don't check the alignment of the non-nullable field, to compute their layout

You're in luck, we have this for T: ?Sized: we allow transmutes between Option and/or newtypes around pointers, as long as the two pointee types can be statically known to have the same metadata.
For reference, here is the SizeSkeleton code.

However, that might not be enough: today you can replace Option in your example to an enum with more dataless variants than just Option's None and it will still work if you wrap both &T and &U in that enum.

But that might be covered by the usual "don't transmute arbitrary Rust types" layout breakage that we'd get anyway in general if we add more optimizations.

@nox
Copy link
Contributor

nox commented May 31, 2022

Just to be sure I understand correctly, @eddyb so you mean that SizeSkeleton today already shortcircuits having to use layout_of on Option<&T> for transmutes?

@eddyb
Copy link
Member

eddyb commented Jun 4, 2022

Just to be sure I understand correctly, @eddyb so you mean that SizeSkeleton today already shortcircuits having to use layout_of on Option<&T> for transmutes?

@nox Yeah, so that it works for T: ?Sized. It's not used right now for T: Sized just because layout_of succeeds but since T: ?Sized is strictly more general, it should still work even in the more restricted cases.

@moulins
Copy link
Author

moulins commented Jun 4, 2022

However, that might not be enough: today you can replace Option in your example to an enum with more dataless variants than just Option's None and it will still work if you wrap both &T and &U in that enum.

IMHO, this is entirely expected; after all, under this proposition we have:

struct E<T> {
  A(T),
  B,
  C,
}

assert_eq!(std::mem::size_of::<E<&u8>>(), 16);
assert_eq!(std::mem::size_of::<E<&u16>>(), 8);

So of course you can't transmute E<&T> to E<&U> generically.

@nox
Copy link
Contributor

nox commented Jun 5, 2022

@eddyb

@nox Yeah, so that it works for T: ?Sized. It's not used right now for T: Sized just because layout_of succeeds but since T: ?Sized is strictly more general, it should still work even in the more restricted cases.

I think we still need to split some queries and not just layout and align.

Consider this:

#[repr(C)]
struct Foo<'a, T: 'a> {
    some_int: usize,
    some_ref: &'a T,
}

We should be able to transmute Foo<'a, X> to Foo<'a, Y> but the layout_of query will fail because the query for some_ref's layout will fail.

Btw you mentioned min_align_of and I'm confused. Wouldn't that alignment-only query be exact anyway? When do we need to compute the full layout to get the exact alignment? I can't think of a single type where the results would be different.

@eddyb
Copy link
Member

eddyb commented Jun 8, 2022

I think we still need to split some queries and not just layout and align.

Consider this:

#[repr(C)]
struct Foo<'a, T: 'a> {
    some_int: usize,
    some_ref: &'a T,
}

We should be able to transmute Foo<'a, X> to Foo<'a, Y> but the layout_of query will fail because the query for some_ref's layout will fail.

I see, are you suggesting layout_of not cache niches and instead make that a separate query?

We should try messing with all of this just enough to do a crater run, like pretend SizeSkeleton's use of layout_of fails automatically if the type refers to generic parameters anywhere, forcing the use of the more limited fallback (supporting pointers and Option of non-nullable ones).

(Arguably generic parameters are fine if only used inside e.g. PhantomData, or behind raw pointers, but I am curious what the stricter/simpler check would break)


Btw you mentioned min_align_of and I'm confused. Wouldn't that alignment-only query be exact anyway? When do we need to compute the full layout to get the exact alignment? I can't think of a single type where the results would be different.

My intuition was that enum optimizations were tricky enough that alignment could be bumped in a weird way (e.g. niche vs tag discrepancy), but maybe I'm wrong.

Or at least, I might be wrong today because the only way I can think of to have "niche vs tag" bump the alignment in the tag case is manually specified discriminants that force a larger-than-byte tag, but interfering with discriminants at all disables niche-based optimizations today IIRC (mostly to keep codegen simple).

Either way, if we have rustc ICE after checking both queries agree, we're probably in the clear.

@scottmcm
Copy link
Member

scottmcm commented Jun 8, 2022

When do we need to compute the full layout to get the exact alignment?

I agree I can't think of one in current rust.

I've pondered adding something like an align(natural) that would pad the type out so that it has align ≡ size, which would definitely make this more complicated.

Whether that's something we should have I don't know, but I think it's a thing where the "we have an underestimate of alignment and size" would work for the purposes of picking how big a niche to offer, even if computing the exact alignment is hard.

- For fat pointers, the metadata is valid (as described in `mem::{size, align}_of_val_raw`).
- The pointer points to a region of memory that has suitable size and alignment (it must fit the layout returned by `Layout::from_value_raw`). There is no constraint placed upon the "liveness" of this memory; it may be deallocated while the `WellFormed<T>` exists, or already deallocated when the `WellFormed` is created.

### Thin DSTs
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Quickly noting: I believe we should allow references to extern types to be dangling, such that external libraries can freely tag pointers.

I haven't read the details of the RFC, but building on the idea that extern types are special, perhaps we could just disallow alignment niches for them entirely?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is what this section talks about, yes (even if it only mentions extern types in passing, I lump them together with thin DSTs because they are very similar).

In particular, I believe your proposition corresponds to the second "resolution":

  • Make WellFormed::{size, align}_of_val unsafe, and treat thin DSTs as having size: 0, align: 1 for the purposes of the WellFormed invariant.

bors added a commit to rust-lang-ci/rust that referenced this pull request Jul 21, 2023
Prototype: Add unstable `-Z reference-niches` option

MCP: rust-lang/compiler-team#641
Relevant RFC: rust-lang/rfcs#3204

This prototype adds a new `-Z reference-niches` option, controlling the range of valid bit-patterns for reference types (`&T` and `&mut T`), thereby enabling new enum niching opportunities. Like `-Z randomize-layout`, this setting is crate-local; as such, references to built-in types (primitives, tuples, ...) are not affected.

The possible settings are (here, `MAX` denotes the all-1 bit-pattern):
| `-Z reference-niches=` | Valid range |
|:---:|:---:|
| `null` (the default) | `1..=MAX` |
| `size` | `1..=(MAX- size)` |
| `align` | `align..=MAX.align_down_to(align)` |
| `size,align` | `align..=(MAX-size).align_down_to(align)` |

------

This is very WIP, and I'm not sure the approach I've taken here is the best one, but stage 1 tests pass locally; I believe this is in a good enough state to unleash this upon unsuspecting 3rd-party code, and see what breaks.
github-actions bot pushed a commit to rust-lang/miri that referenced this pull request Jul 22, 2023
Prototype: Add unstable `-Z reference-niches` option

MCP: rust-lang/compiler-team#641
Relevant RFC: rust-lang/rfcs#3204

This prototype adds a new `-Z reference-niches` option, controlling the range of valid bit-patterns for reference types (`&T` and `&mut T`), thereby enabling new enum niching opportunities. Like `-Z randomize-layout`, this setting is crate-local; as such, references to built-in types (primitives, tuples, ...) are not affected.

The possible settings are (here, `MAX` denotes the all-1 bit-pattern):
| `-Z reference-niches=` | Valid range |
|:---:|:---:|
| `null` (the default) | `1..=MAX` |
| `size` | `1..=(MAX- size)` |
| `align` | `align..=MAX.align_down_to(align)` |
| `size,align` | `align..=(MAX-size).align_down_to(align)` |

------

This is very WIP, and I'm not sure the approach I've taken here is the best one, but stage 1 tests pass locally; I believe this is in a good enough state to unleash this upon unsuspecting 3rd-party code, and see what breaks.
antoyo pushed a commit to rust-lang/rustc_codegen_gcc that referenced this pull request Aug 13, 2023
Prototype: Add unstable `-Z reference-niches` option

MCP: rust-lang/compiler-team#641
Relevant RFC: rust-lang/rfcs#3204

This prototype adds a new `-Z reference-niches` option, controlling the range of valid bit-patterns for reference types (`&T` and `&mut T`), thereby enabling new enum niching opportunities. Like `-Z randomize-layout`, this setting is crate-local; as such, references to built-in types (primitives, tuples, ...) are not affected.

The possible settings are (here, `MAX` denotes the all-1 bit-pattern):
| `-Z reference-niches=` | Valid range |
|:---:|:---:|
| `null` (the default) | `1..=MAX` |
| `size` | `1..=(MAX- size)` |
| `align` | `align..=MAX.align_down_to(align)` |
| `size,align` | `align..=(MAX-size).align_down_to(align)` |

------

This is very WIP, and I'm not sure the approach I've taken here is the best one, but stage 1 tests pass locally; I believe this is in a good enough state to unleash this upon unsuspecting 3rd-party code, and see what breaks.
@lolbinarycat
Copy link
Contributor

what's the rationale for introducing a new std-internal type instead of just adding an additional validity constraint to the existing internal Unique pointer type?

@moulins
Copy link
Author

moulins commented Sep 29, 2024

The optimization should be applicable to Arc/Rc too, which can't use Unique internally.

@lolbinarycat
Copy link
Contributor

another alternative to the Aligned type is a rustc perma-unstable attribute, similar to the ones that power the NonZero and NonNull types. this way no library code needs adjusting, just add the attribute to every pointer type that must be aligned.

@Diggsey
Copy link
Contributor

Diggsey commented Sep 29, 2024

Regarding the Aligned alternative disadvantages:

Loss of "size niches" (e.g. -2 for u16) and future "restricted ranges niches".

This example is not strictly true - even only knowing that a type T has alignment 2, but without knowing its size, you can still know that -2 is an invalid address for T, since the size of a type must be at least as large as its alignment.

I like the "opaqueness" of WellFormed (ie. that it makes no guarantees about what niches specifically are available) but I'm not so much a fan of the min_layout addition. It seems like you could get almost all of the benefit by only exploiting the alignment niches (at least to begin with) and avoid needing this min-layout step.

Another concern I had is the proliferation of pointer-like types - NonNull, *mut, *const, WellFormed, Unique. Specifically that they all replace pointers rather than wrapping them, ie. it's NonNull<T> rather than NonNull<*mut T>. This means that they can't be composed with each other, and so we're building a system of types that only works if there is a strict ordering among them. For now there does seem to be an ordering (ignoring *const vs *mut differences) but are we sure that will always be the case when future pointer types are considered?

@zachs18
Copy link
Contributor

zachs18 commented Sep 29, 2024

Loss of "size niches" (e.g. -2 for u16) and future "restricted ranges niches".

This example is not strictly true - even only knowing that a type T has alignment 2, but without knowing its size, you can still know that -2 is an invalid address for T, since the size of a type must be at least as large as its alignment.

A type can have a size smaller than its alignment (only) if it is a zero-sized type. E.g. [u16; 0] can live at address -2, so if the niche of BikeshedValidAddressPtr<T> do not depend on the size of T (only the alignment), then -align_of::<T>() must still be bit-valid.

@Diggsey
Copy link
Contributor

Diggsey commented Sep 30, 2024

A type can have a size smaller than its alignment (only) if it is a zero-sized type.

You're right, I missed that case - still "is a ZST" is easily known along with alignment using an algorithm that does not require computing the layout of any other types.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-lang Relevant to the language team, which will review and decide on the RFC.
Projects
None yet
Development

Successfully merging this pull request may close these issues.