Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add empty dropped variant to ManuallyDrop #49056

Closed
wants to merge 1 commit into from

Conversation

cramertj
Copy link
Member

Change ManuallyDrop's internal (unstable) representation to match the design of MaybeUninit in rust-lang/rfcs#1892. This indicates that the union has more than one possible representation variant. After the inner type has been dropped, it is no longer a valid T, and so the union should be considered to contain only ().

cc @canndrew @RalfJung

@rust-highfive
Copy link
Collaborator

r? @sfackler

(rust_highfive has picked a reviewer for you, use r? to override)

@rust-highfive rust-highfive added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Mar 15, 2018
@sfackler
Copy link
Member

Do we have to "assign" to .dropped in the drop method? Not sure what the semantics of unions are exactly.

@cramertj
Copy link
Member Author

@sfackler I suppose you could mem::replace with a ManuallyDrop { dropped: () }.

@kennytm
Copy link
Member

kennytm commented Mar 15, 2018

cc #47650 @mikeyhew @nikomatsakis.

This PR is going to invalidate the entire presumption of #47650, that ManuallyDrop is a single-field union so that #47034 can be resolved just by inserting a ?Sized later.

@cramertj
Copy link
Member Author

cramertj commented Mar 15, 2018

An alternate interpretation for the unsafe behavior of unions would be that "this union contains none of the variants, and its contents are undefined" is always a valid option. Under this interpretation, union X {} would be inhabited, since "no contents" is a valid inhabitant. That would make this change unnecessary, but might have significant effects in other places.

@sfackler
Copy link
Member

But how could you construct an X?

@cramertj
Copy link
Member Author

cramertj commented Mar 15, 2018

@sfackler If we went that route, we'd make it legal to construct any union by specifying either 0 or 1 field (rather than exactly one, as it is today).

@sfackler
Copy link
Member

Ah sure

@hanna-kruppe
Copy link
Contributor

@eddyb drew a distinction between union { T } and union { T, () } in #47650 (comment)

@cramertj
Copy link
Member Author

@rkruppe I'm having a hard time understanding that distinction, since ManuallyDrop<T> no longer contains a valid T after drop has been called on the innards.

@hanna-kruppe
Copy link
Contributor

@cramertj See the discussion preceding the comment I linked. Kind of hard to tell where it starts but this #47650 (comment) might be a reasonably close starting point. But basically IIUC the argument goes that "initializing and then dropping" leaves some values intact and is therefore different from never having initialized at all.

@cramertj
Copy link
Member Author

cramertj commented Mar 16, 2018

Yeah, I read that, and I'm unsure whether I agree or not :)

While it is definitely the case that some types have a valid repr even after being dropped (like primitives or struct MySillyDrop; impl Drop for MySillyDrop { ... println!("woot!"); ... }, types like Box<T> fundamentally change after being dropped. IMO there's even an argument that fn drop shouldn't take &mut self at all, but instead *mut Self since self is no longer valid at the end of the drop call.

As I type this, I'm coming around to the idea more and more, however I'm having trouble formalizing it: the property is something like "a union contains data that is a valid bit-based representation of one of its variants, but does not necessarily enforce the type's usual invariants". For example, union Foo { x: Box<T> } still has to always be NonNull, but the inner value doesn't have to point to valid memory, so uses of references to the inner Box<T> are not necessarily valid.

Because the same rules apply to &mut self in Drop impls (unless Drop impls are special-cased in the memory model), this would mean that &mut T points to memory that follows T's representation requirements, but not its usage invariants. It follows that while &T is safe for an optimizer to dereference at any time, the same is not true of &&T. This seems pretty weird to me, and I have a hard time imagining any way to formalize such a model.

But I also might be massively overcomplicating this-- perhaps this rule can be phrased in a way that has a much smaller scope than I'm imagining.

@cramertj
Copy link
Member Author

cramertj commented Mar 16, 2018

After thinking about this more, I've convinced myself that any memory model we adopt would require that a value of type T which has had Drop::drop called on it must still be a value of type T-- just one in a separate "typestate" in which invoking any methods is undefined behavior. With that in mind, I don't think my proposed change here is correct, so I'm closing this. Feel free to comment or leave more thoughts if you have them!

@hanna-kruppe
Copy link
Contributor

I've convinced myself that any memory model we adopt would require that a value of type T which has had Drop::drop called on it must still be a value of type T

Is this because Drop::drop has a &mut self? I'm not quite sure how that requires such a rule. All the memory model proposals I've seen have been a bit more leeway than usual while you're operating locally on an object, compared to the usual rules that must hold when you share it with the world (this is pretty much required to allow all sorts of well-encapsulated unsafe shenanigans). Combined with the fact that Drop::drop is called by drop glue, not by ordinary code, that seems to leave a way out for leaving something invalid behind the reference: the only people seeing that reference are Drop::drop -- which is responsible for invalidating -- and the drop glue -- which is special anyway.

I'm not really convinced by this line of argument, but it also doesn't seem obvious that the "usual rules" must apply to the &mut self in Drop::drop.

And besides, your formulation doesn't really say anything about the validity of the value after drop glue has run on it.

@canndrew
Copy link
Contributor

A bit off-topic, but one of the main reasons I want &in/&out/&uninit references is so that drop can take &out self.

@cramertj
Copy link
Member Author

@rkruppe yeah, that's one of the main reasons. The other is that it allows me to make sense of the way unions are being used in @eddyb's preferred model-- that there's a differentiation between uninit and dropped T, whereas I had previously been considering both as just "not a T". Maybe we can come up with something better?

@cramertj
Copy link
Member Author

cramertj commented Mar 16, 2018

I'm going to reopen this because there's a serious question here that has been left unanswered, and I think we should make some sort of a decision here before closing this (at least without a followup RFC or tracking issue).

There are a few options:

  • union Foo { x: T } is never uninitialized, and can only be initialized with a T
    • variant 1: when T has drop glue run on it, it is still a T, but one whose "typestate" has switched to "dropped", and for which it is UB to call any methods
    • variant 2: when T has drop glue run on it, it is no longer a T, but a value which preserves any "compiler-recognized representation invariants" that T has (e.g. NonZero, enum variant reprs, etc.). This means that Foo can't be uninitialized memory, but it could be memory that is initialized to "look like a T" without being a T.
    • variant 3: the x value inside the union is always a "live" (non-dropped) T. This option is the easiest for me personally to understand.
  • union Foo { x: T } could be uninitialized (implicit () variant). AFAIK this would prohibit ever having NonZero or enum repr optimizations applied to unions.

@cramertj cramertj reopened this Mar 16, 2018
@petrochenkov
Copy link
Contributor

With "enum interpretation" of unions union Foo { x: T } with one field is equivalent to struct Foo { x: T } with one fields, so whatever rules exist for single-field struct they can be extrapolated to single-field unions.

@mikeyhew
Copy link
Contributor

@petrochenkov except that dropping a single-field union does not drop the field it contains, whereas dropping a single-field struct does. I think that's the only difference though, right?

@mikeyhew
Copy link
Contributor

@cramertj thanks for posting this, I think it's important that we get a formal definition here before stabilizing unions with drop fields.

  • union Foo { x: T } is never uninitialized, and can only be initialized with a T

    • variant 1: when T has drop glue run on it, it is still a T, but one whose "typestate" has switched to "dropped", and for which it is UB to call any methods
    • variant 2: when T has drop glue run on it, it is no longer a T, but a value which preserves any "compiler-recognized representation invariants" that T has (e.g. NonZero, enum variant reprs, etc.). This means that Foo can't be uninitialized memory, but it could be memory that is initialized to "look like a T" without being a T.

My understanding when opening #47650 was that one of these two statements was true. Is there a practical difference between them?

union Foo { x: T } could be initialized

I think this is a typo, do you mean "uninitialized"?

@cramertj
Copy link
Member Author

cramertj commented Mar 19, 2018

@mikeyhew

My understanding when opening #47650 was that one of these two statements was true. Is there a practical difference between them?

It's primarily an issue of how to define things, specifically what it means for a value to have type T. You could imagine that the compiler would allow certain kinds of optimistic accesses on a value of type T which would be invalid if the T already had drop glue run on it. Under variant one, these optimizations would be invalid, and we'd need to formalize the exact circumstances under which a dropped T would be valid where a T is expected. Under variant two, we could perhaps offer a separate type, ReprLike<T> which is guaranteed to hold memory initialized to be valid for a T (NonZero and such), but not necessarily containing a T.

I think this is a typo, do you mean "uninitialized"?

Yup, fixed.

@RalfJung
Copy link
Member

RalfJung commented Mar 19, 2018

My personal expectation so far was that adding an empty variant to a union is pretty much a NOP. I imagined unions would be "barriers" to type validity; the compiler would never assume anything "for free" about data below a union (just like it never assumes anything "for free" for data below an UnsafeCell If we already are below a shared reference -- i.e., the "shared" typestate of UnsafeCell is also a "barrier").

In particular, I expected that a union with a NonZero field would allowed to be zero.

The main reason for this is simplicity and consistency. If we decide that the contents of unions can matter, there are just so many questions to answer. Are single-field unions special? Or is a union with three fields, all of which are NonZero, also required to not be zero? What about other properties like enum values -- is a union of bool and some other type with valid discriminants 3 and 4 defined to be exactly one of the bit patterns for 0, 1, 3, 4? This would mean that actually checking bit pattern validity at run-time would require trial-and-error. I find that fairly unsatisfying.

I have to admit I don't fully understand @eddyb's reasoning in the other thread, in particular the part at #47650 (comment) is entirely lost on me. If someone could explain why making unions "ignore" the valid bit-patterns of their inner types is a bad idea, that'd be appreciated. :) Is it "just" the layout-based optimizations?


variant 1: when T has drop glue run on it, it is still a T, but one whose "typestate" has switched to "dropped", and for which it is UB to call any methods

New typestates seem to crop up everywhere recently. ;)

You could imagine that the compiler would allow certain kinds of optimistic accesses on a value of type T which would be invalid if the T already had drop glue run on it. Under variant one, these optimizations would be invalid

They would not necessarily invalid, it all depends on which requirements we make for that "typestate". I see variant 2 as a special case of variant 1. Also, just reading uninitialized data is fine, so I can't actually think of any reasonable optimization that could be invalidated. Could you give an example?

At some point on IRC I've been talking with @eddyb about the contract for drop, and the rough consensus was that it has to leave behind the type in a "structurally valid" state. All the extra private invariants of this type may be violated, but e.g. if there is a Box<T> in there it must still be valid because drop glue will drop that one next. The details of that seem to be somewhat related, though it seems like (with this proposal) there are actually two different "dropped" typestates: One where the drop of this type has been called, but the fields have not been dropped yet, and then one where the fields are also dropped.

Either way, I imagine even an unsafe type has very little freedom in picking its "dropped" invariant. In fact, I think it is always fully determined structurally; is there any utility in types being able to still have private invariants in this state?

Also, may I add "variant 4": The value inside the union can be anything. That's what I described above. The "completely dropped" typestate would then always have the trivial invariant that does not assume anything.

@cramertj
Copy link
Member Author

cramertj commented Mar 19, 2018

@RalfJung

Also, may I add "variant 4": The value inside the union can be anything. That's what I described above.

To clarify, I think your interpretation is this option:

union Foo { x: T } could be uninitialized (implicit () variant). AFAIK this would prohibit ever having NonZero or enum repr optimizations applied to unions.

in my comment above, right?

@RalfJung
Copy link
Member

Ah, yes. I read that but then forgot about it when replying... sorry. Yes, that is "variant 4".

@pietroalbini
Copy link
Member

Ping from triage! What's the status of this PR?

@RalfJung
Copy link
Member

@pietroalbini pietroalbini added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. labels Apr 23, 2018
@pietroalbini
Copy link
Member

Marking this as blocked on some consensus from the unsafe guidelines WG.

@kennytm kennytm added the I-needs-decision Issue: In need of a decision. label Apr 24, 2018
@shepmaster shepmaster added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels May 6, 2018
@pietroalbini pietroalbini added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels May 14, 2018
@TimNN TimNN added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels May 22, 2018
@pietroalbini pietroalbini added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels Jun 4, 2018
@TimNN TimNN added A-allocators Area: Custom and system allocators and removed A-allocators Area: Custom and system allocators labels Jun 12, 2018
@TimNN TimNN added A-allocators Area: Custom and system allocators S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed A-allocators Area: Custom and system allocators S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels Jun 26, 2018
@pietroalbini
Copy link
Member

Closing until some consensus is reached by the unsafe guidelines WG.

@pietroalbini pietroalbini added S-blocked-closed and removed S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. labels Jul 14, 2018
@RalfJung
Copy link
Member

For future reference, I think I now actually understood #49056 (comment): (Quoting from #51361 (comment))

[We would like] to make ManuallyDrop::new(unsafe { mem::uninitialized() }) illegal: We want to eventually allow ManuallyDrop for unsized types, including thin DSTs. So Box<Thin<T>> would be a thin pointer and determining the size of the allocation would actually load the pointer and rely on there being a vtable field.
But then, to support Box<ManuallyDrop<Thin<T>>> (Box must still be able to determine the size), we cannot allow the ManuallyDrop<Thin<T>> to be arbitrary garbage; the vtable must be correct.

As @eddyb remarked, Thin is not a good name (because it's not a thin pointer, it makes pointers to this thin despite being a DST). InlineMetadata might be better.

@RalfJung
Copy link
Member

I think we should not do this, see https://internals.rust-lang.org/t/pre-rfc-unions-drop-types-and-manuallydrop/8025 for further details. This actually side-steps having unsafe code guidelines involved by making ManuallyDrop non a union any more.

@cramertj cramertj deleted the sound-manually-drop branch July 23, 2018 22:58
@RalfJung
Copy link
Member

RalfJung commented Aug 3, 2018

ManuallyDrop is now a struct, so this issue is done for good.

@jyn514 jyn514 added S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. and removed S-blocked-closed labels Mar 10, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
I-needs-decision Issue: In need of a decision. S-blocked Status: Marked as blocked ❌ on something else such as an RFC or other implementation work. T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.