Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement unsized unions #47650

Closed
wants to merge 13 commits into from
Closed

Conversation

mikeyhew
Copy link
Contributor

r? @eddyb

This allows unions to have at most one unsized field/variant, in order to allow ManuallyDrop<T> where T: ?Sized (see #47034). For now, I made it so that the unsized field has to be the last field, which makes the implementation simpler because there's a lot of similarities to structs.

cc #32836

Questions for reviewers:

  • Should I also relax the Sized bound on ManuallyDrop in this PR? That would be insta-stable, so would require an FCP.
  • Did I handle the layout for unsized unions properly? See src/librustc/ty/layout.rs
  • What tests specific to unsized unions should I add? Right now I have one test that tests unsizing and checks that things are dropped/not dropped as expected.

@@ -141,7 +141,7 @@ impl<'a, 'gcx> CheckTypeWellFormedVisitor<'a, 'gcx> {
self.check_variances_for_type_defn(item, ast_generics);
}
hir::ItemUnion(ref struct_def, ref ast_generics) => {
self.check_type_defn(item, true, |fcx| {
self.check_type_defn(item, false, |fcx| {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's this argument and can it be removed from check_type_defn's definition (i.e. are there any non-false calls)?

Copy link
Contributor Author

@mikeyhew mikeyhew Jan 22, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The argument is called all_sized and is currently true for enums

|
= help: the trait `std::marker::Sized` is not implemented for `T`
= help: consider adding a `where T: std::marker::Sized` bound
= note: no field of a union may have a dynamically sized type
= note: only the last field of a union may have a dynamically sized type
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't seem right. union doesn't have ordered fields. Maybe require that at most one field is unsized?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not really right, I just did it this way for now to avoid having to refactor things. I can make it more general if I must :)

@petrochenkov petrochenkov self-assigned this Jan 22, 2018
@petrochenkov petrochenkov added the S-waiting-on-review Status: Awaiting review from the assignee but also interested parties. label Jan 22, 2018
@petrochenkov petrochenkov removed their assignment Jan 24, 2018
@main--
Copy link
Contributor

main-- commented Jan 26, 2018

Is there a specific reason why this should only be allowed for a single field?

Just today I wanted to create a union with two unsized fields. I'm working with postgresql where this "varlena" format is how they represent variable-length data. This code is an exact translation of the C code I'm interacting with (of course the extern type is a 0-sized array there). I want to build &'a text and &'a bytea references to SQL values, bound to the lifetime of some allocator. The types are quite obviously extern type DSTs: I have a pointer to a buffer that stores its own size. What makes this tricky is that the size can either be 7 bits or 30 bits long which is best represented by a union. It's a shame I have to hack around this with pointer arithmetic right now.

@eddyb
Copy link
Member

eddyb commented Jan 26, 2018

Wait, is size_of_val well-defined for an unsized union? What happens to the fat pointer metadata when the union is used with a different field than the unsized one?

@mikeyhew
Copy link
Contributor Author

@eddyb the metadata is created during unsizing, and the actual value of the field being unsized is not needed – only the type. Once unsized, the metadata cannot change, and the unsized union field cannot be written to. There is no way of knowing which field type is active, so the return value of size_of_val would have to be the maximum size possible – calculate size_of() of each sized field type, and size_of_val() using the pointer metadata, and take the maximum size. Note that this only works for dynamically-sized types whose size can be determined using only the pointer metadata.

There are currently no types in Rust for which this isn't true, but there may be in the future, and there would have to be some way of disallowing calls to size_of_val and align_of_val for unions that include those types.

Single-field unions like ManuallyDrop should be fine though, right? I propose we just allow them for now.

@kennytm kennytm added the T-lang Relevant to the language team, which will review and decide on the PR/issue. label Jan 27, 2018
@kennytm
Copy link
Member

kennytm commented Jan 27, 2018

Interesting question @eddyb. As @main-- asked, what happens if we support unions with ≥2 DST fields?

union U {
    slice: [u8],
    trobj: dyn Debug,
}

The problem is how do we represent the metadata:

  • struct U::Meta { slice: usize, trobj: &'static TraitMeta<Debug_Vtable> }

    • Pro: We can calculate the size_of_val
    • Con: The metadata size becomes N words long
  • union U::Meta { slice: usize, trobj: &'static TraitMeta<Debug_Vtable> }

    • Pro: The metadata size is always 1 word long
    • Con: The type becomes !DynSized when there are ≥2 DST fields
  • enum U::Meta { Slice(usize), Trobj(&'static TraitMeta<Debug_Vtable>) }

    • Pro: We can calculate the size_of_val and the metadata size is always 2 words long
    • Con: This encodes the active field into the metadata 🤔

I think supporting unions with ≥2 DST fields require an RFC to define the proper semantic, and thus best avoided in this PR.

@eddyb
Copy link
Member

eddyb commented Jan 27, 2018

@mikeyhew Okay so what if I have this:

union OneOrManyBytes {
    one: u8,
    many: [u8]
}
size_of_val(&OneOrManyBytes { one: 0 })

Can I Just never create this? Does that mean a !Sized union can only have its !Sized field active? Then I propose we only allow single-field unions to be !Sized.

@main--
Copy link
Contributor

main-- commented Jan 27, 2018

Given that rust-lang/rfcs#1897 wants to prohibit unsized unions altogether that does seem a little strange.

At least to me it was simple and obvious/intuitive that an unsized enum would represent its metadata as an union of the metadata of its members. Clearly, the unsized metadata of a sized type is empty (union). I understand that this probably warrants an RFC though.

I don‘t think a union can ever logically be DynSized, given that you can never know in advance which variant is active. Even if only one field is unsized, the size metadata is garbage if a sized variant is active - I feel like the concept is just ill-defined like that.

@eddyb
Copy link
Member

eddyb commented Jan 27, 2018

I don‘t think a union can ever logically be DynSized, given that you can never know in advance which variant is active. Even if only one field is unsized, the size metadata is garbage if a sized variant is active - I feel like the concept is just ill-defined like that.

So the only way to get something useful is to borrow one of the fields, which would extract the metadata appropriate for that field, right? This definitely needs DynSized to be a trait.

cc @nikomatsakis I like the idea of having an union of metadata but it should fit with custom DSTs.

@kennytm
Copy link
Member

kennytm commented Jan 27, 2018

@main--

I don‘t think a union can ever logically be DynSized, given that you can never know in advance which variant is active.

It could be DynSized if the metadata is a struct, i.e. you provide the metadata for all fields no matter they are active or not, e.g.

union U {
    a: [u8],
    b: [u16],
}
=>
struct U::Meta {
    a: usize,
    b: usize,
}

then size_of_val::<U>(&val) is { let m = meta(val); max(m.a * 1, m.b * 2) }.

@eddyb
Copy link
Member

eddyb commented Jan 27, 2018

@kennytm How do you initialize those metadata fields? Especially when trait objects are involved.

@kennytm
Copy link
Member

kennytm commented Jan 27, 2018

@eddyb I think it can only be initialized through unsize coercion; or require that there is a type (! is a good candidate) that implements all traits and Default to that.

@eddyb
Copy link
Member

eddyb commented Jan 27, 2018

@kennytm Oh so the unsize coercion would have to do multiple fields in parallel or not be allowed at all, that makes a bit more sense.

@petrochenkov
Copy link
Contributor

@main--

Given that rust-lang/rfcs#1897 wants to prohibit unsized unions altogether

FWIW, rust-lang/rfcs#1897 doesn't try to prohibit unsized unions (they are already "prohibited" aka not supported), it just doesn't try to introduce them (they are listed in future directions).

@mikeyhew
Copy link
Contributor Author

@eddyb

Okay so what if I have this:

union OneOrManyBytes {
   one: u8,
   many: [u8]
}

Can I Just never create this? Does that mean a !Sized union can only have its !Sized field active? Then I propose we only allow single-field unions to be !Sized.

I agree, single-field unions are all we need for ManuallyDrop to work, so that's all this PR should add. However, the above code currently doesn't work, because OneOrManyBytes is an unsized type, and can't be created on the stack. I don't think I added a test case for that, but I did try it out locally

@mikeyhew
Copy link
Contributor Author

Regarding unions with metadata for each unsized field:

@kennytm

@eddyb I think it can only be initialized through unsize coercion; or require that there is a type (! is a good candidate) that implements all traits and Default to that.

@eddyb

@kennytm Oh so the unsize coercion would have to do multiple fields in parallel or not be allowed at all, that makes a bit more sense.

Theoretically you could unsize one field, and then unsize the other – after the first coercion, you'd have pointer metadata for one field, and after the second coercion, metadata for both.

@eddyb
Copy link
Member

eddyb commented Jan 27, 2018

@mikeyhew Right, by "in parallel" I meant independently, which means you could allow multiple at once or one at a time, but you'd still be able to unsize more than one field (kind of neat, huh?).

@mikeyhew
Copy link
Contributor Author

@eddyb oh 😄, I thought you meant "at the same time"

@eddyb
Copy link
Member

eddyb commented Jan 27, 2018

@mikeyhew I did mean that, originally, but the important bit is that they're independent.

@nikomatsakis
Copy link
Contributor

Huh. Re-reading this thread, I'm feeling a bit lost. I certainly agree that the concept of an "unsized union" with multiple fields has a lot of question marks.

Right, by "in parallel" I meant independently, which means you could allow multiple at once or one at a time, but you'd still be able to unsize more than one field (kind of neat, huh?).

I don't get it. =) How would you unsize multiple fields independently? Would they have to have compatible metadata?

@eddyb
Copy link
Member

eddyb commented Jan 29, 2018

@nikomatsakis The context is choosing struct for union metadata, from #47650 (comment).
That way, &Union<A, B> would always behave like (&A, &B) but the data pointer is shared.
So you can choose which "fields" to unsize, and they don't interact with eachother, assuming nothing needs to read the actual data during the unsizing or to compute size/alignment.
Given the latest proposal, such unions would require AlignFromMeta + SizeFromMeta types.

@kennytm
Copy link
Member

kennytm commented Jan 29, 2018

@nikomatsakis Just clarifying #47650 (comment) since the thread seems to start from my comment. Unsizing will never modify data. It just generates the necessary metadata from the original sized type, and reinterpret-cast the data into the DST. So unsizing a Union<[u16; 3], [u32; 4], [u64; 5]> into Union<[u16], [u32], [u64]> would:

  1. Reinterpret-cast the pointer part
  2. Set (3, 4, 5) as the metadata (3 words)

The actual representation of &Union<[u16], [u32], [u64]> in memory would be the 4-tuple (&«original_memory», 3, 4, 5).

(I don't think we should support this in this PR.)

@nikomatsakis
Copy link
Contributor

Hmm, maybe I was thinking about it wrong. I guess that unsizing doesn't require that the actual data be valid, at least not currently. That is, size_of_val only (again, currently) inspects the meta-data, and that is generated just from the types involved, so in theory we could compute the type of a union no problem.

But this may not be true once we hit custom dst, right? I guess it depends on just how the trait looks.

@eddyb
Copy link
Member

eddyb commented Jan 29, 2018

Wait. With the custom DST proposal, size_of_val can't be used on an union unless it either guarantees SizeFromMeta, or getting a reference to the field is safe (is this ever the case?).
So without that proposal, it's impossible to restrict DST unions for size_of_val to be safe.

@nikomatsakis
Copy link
Contributor

nikomatsakis commented Jan 29, 2018

@eddyb

unless it either guarantees SizeFromMeta

I think by this you mean: unless it can compute the size without a reference to the actual data, i.e., purely from the metadata...right?

If so, that is precisely what I was trying to get at, yeah. =)

@RalfJung
Copy link
Member

@mikeyhew
Copy link
Contributor Author

@pietroalbini thanks for the ping, and sorry for letting this slide. I'm still interested in finishing this, if it is desired

@RalfJung

We talked about the issues around unsized unions and thin pointers and valid bit patterns for unions at the unsafe code guidelines meetings on the all-hands, but did not even reach consensus even amongst those present in the discussion. TBH I think this requires an RFC; there is clearly non-trivial design work to be done here, some of which interacts tightly with unsafe code guidelines.

To avoid drawing ourselves into a corner that we cannot go out of without breaking backwards-compatibility, I'd prefer if we made ManuallyDrop a special magic type that can be unsized, but do not otherwise support unsized unions. (I think this is actually all we should ever to, but it is future-compatible with eventually allowing unions to be unsized and defining ManuallyDrop as a non-magic union.)

I wasn't at the meeting, so I'm not sure what specific issues were discussed around unsized unions or how unions are dropped. But would it be OK to implement unsized unions anyway, with the intention of keeping them unstable at least until the details have been figured out? ManuallyDrop would be stable, but the fact that it uses a union is an implementation detail and could be changed later if need be.

@RalfJung
Copy link
Member

The issue is that we may not want to make any restrictions for the bit patterns that are valid for a union. If we adapt that policy, then unsized unions with thin pointers do not work as they have to always be able to determine the size by dereferencing the union. This is not related to dropping.

But would it be OK to implement unsized unions anyway, with the intention of keeping them unstable at least until the details have been figured out?

I have no idea who's even in charge of such decisions. ;) They certainly have to be unstable at first. Is there an RFC for this feature? The unions RFC does not mention "unsized" at all.

@mikeyhew
Copy link
Contributor Author

mikeyhew commented Apr 17, 2018

@RalfJung

The issue is that we may not want to make any restrictions for the bit patterns that are valid for a union. If we adapt that policy, then unsized unions with thin pointers do not work as they have to always be able to determine the size by dereferencing the union. This is not related to dropping.

Can you explain what "that policy" is? EDIT: I misread "adapt" as "adopt"

I don't think that this PR puts us into any corners that we can't get out of. My guess is that unions with thin-pointer DST fields would be completely unsized by default, since as you said you have to dereference the union to read the metadata and get the size/alignment of the union field. This PR does not support thin-pointer DSTs, but it doesn't rule them out either, it just supports the dynamically-sized types that currently exist in Rust, where the size and alignment are determined from the pointer metadata.

@RalfJung
Copy link
Member

My guess is that unions with thin-pointer DST fields would be completely unsized by default, since as you said you have to dereference the union to read the metadata and get the size/alignment of the union field.

Interesting. I don't think this alternative came up in the discussions so far. But wouldn't that make them fairly useless?

@mikeyhew
Copy link
Contributor Author

@RalfJung by themselves, yeah. But AFAIK there is no safe way to get the size of such a union, so that's what it would have to be. What were the other options that came up?

@RalfJung
Copy link
Member

The idea so far was that if DSTs work in unions, then they'd fully work. So Unnion<Thin> would actually guarantee that the pointer in that union is always "valid enough" to obtain a vtable.

This is in conflict with the idea that a union is just a bag of bytes and makes no assumptions about its contents being valid.

@mikeyhew
Copy link
Contributor Author

And keep in mind, with custom DST you could still implement DynSized manually on the union or the type that contains it, if you have some way of knowing which union field is active.

@mikeyhew
Copy link
Contributor Author

mikeyhew commented Apr 18, 2018

The idea so far was that if DSTs work in unions, then they'd fully work. So Unnion would actually guarantee that the pointer in that union is always "valid enough" to obtain a vtable.

This is in conflict with the idea that a union is just a bag of bytes and makes no assumptions about its contents being valid.

OK, I see. It looks like the conflict lies in one statement saying that the vtable will always be valid, and the other saying that no fields are guaranteed to be valid. That works for normal trait objects, because the vtable is stored in the pointer metadata, but not for thin trait objects, where the vtable is stored with the actual data.

I always thought that at least one union field had to be active (and therefore valid). Is there an advantage to that not being the case?

@RalfJung
Copy link
Member

I always thought that at least one union field had to be active (and therefore valid). Is there an advantage to that not being the case?

This is a discussion we are currently having in another thread.

@shepmaster shepmaster added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-waiting-on-author Status: This is awaiting some action (such as code changes or more information) from the author. S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels Apr 30, 2018
@pietroalbini pietroalbini added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels May 14, 2018
@kennytm kennytm added the S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). label May 15, 2018
@pietroalbini pietroalbini added S-blocked Status: Blocked on something else such as an RFC or other implementation work. and removed S-blocked Status: Blocked on something else such as an RFC or other implementation work. labels May 28, 2018
@TimNN TimNN added A-allocators Area: Custom and system allocators and removed A-allocators Area: Custom and system allocators labels Jun 5, 2018
@pietroalbini
Copy link
Member

Ping from triage @RalfJung! What's the outcome of that discussion?

@RalfJung
Copy link
Member

I don't think there is an outcome yet...

Someone will have to make an RFC for settling this, I think.

@pietroalbini
Copy link
Member

Should this PR be closed in the meantime then?

@RalfJung
Copy link
Member

I think that's for the lang team to decide.

@stokhos
Copy link

stokhos commented Jun 29, 2018

Ping from triage @rust-lang/lang will someone have time to review this PR ?

@mikeyhew
Copy link
Contributor Author

I'm going to close this for now. I don't think unions are on anyone's mind right now, and I personally am busy with #50173. Maybe once that's finished and merged, I'll start the discussion back up if someone else hasn't already, probably in one of the threads that @RalfJung linked.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
S-blocked Status: Blocked on something else such as an RFC or other implementation work. S-waiting-on-team Status: Awaiting decision from the relevant subteam (see the T-<team> label). T-lang Relevant to the language team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging this pull request may close these issues.