Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the enum changes in RFC 2195 #879

Merged
merged 10 commits into from
Sep 4, 2020
176 changes: 155 additions & 21 deletions src/type-layout.md
Original file line number Diff line number Diff line change
Expand Up @@ -174,7 +174,8 @@ layout such as reinterpreting values as a different type.
Because of this dual purpose, it is possible to create types that are not useful
for interfacing with the C programming language.

This representation can be applied to structs, unions, and enums.
This representation can be applied to structs, unions, and enums. The exception
is [zero-variant enums] for which the `C` representation is an error.

#### \#[repr(C)] Structs

Expand Down Expand Up @@ -273,9 +274,9 @@ assert_eq!(std::mem::size_of::<SizeRoundedUp>(), 8); // Size of 6 from b,
assert_eq!(std::mem::align_of::<SizeRoundedUp>(), 4); // From a
```

#### \#[repr(C)] Enums
#### \#[repr(C)] Field-less Enums

For [C-like enumerations], the `C` representation has the size and alignment of
For [field-less enums], the `C` representation has the size and alignment of
the default `enum` size and alignment for the target platform's C ABI.

> Note: The enum representation in C is implementation defined, so this is
Expand All @@ -285,40 +286,171 @@ the default `enum` size and alignment for the target platform's C ABI.
<div class="warning">

Warning: There are crucial differences between an `enum` in the C language and
Rust's C-like enumerations with this representation. An `enum` in C is
Rust's [field-less enums] with this representation. An `enum` in C is
mostly a `typedef` plus some named constants; in other words, an object of an
`enum` type can hold any integer value. For example, this is often used for
bitflags in `C`. In contrast, Rust’s C-like enumerations can only legally hold
the discriminant values, everything else is undefined behaviour. Therefore,
using a C-like enumeration in FFI to model a C `enum` is often wrong.
bitflags in `C`. In contrast, Rust’s [field-less enums] can only legally hold
the discrimnant values, everything else is [undefined behavior]. Therefore,
using a field-less enum in FFI to model a C `enum` is often wrong.

</div>

It is an error for [zero-variant enumerations] to have the `C` representation.
#### \#[repr(C)] Enums With Fields

For all other enumerations, the layout is unspecified.
The representation of a `repr(C)` enum with fields is a `repr(C)` struct with
two fields, also called a "tagged union" in C:

Likewise, combining the `C` representation with a primitive representation, the
layout is unspecified.
- a `repr(C)` version of the enum with all fields removed ("the tag")
- a `repr(C)` union of `repr(C)` structs for the fields of each variant that had
them ("the payload")

> Note: due to the representation of `repr(C)` structs and unions, if a variant
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
> has a single field there is no difference between putting that field directly
> in the union or wrapping it in a struct; any system which wishes to manipulate
> such an `enum`'s representation may therefore use whichever form is more
> convenient/consistent for them
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved

```rust
// This Enum has the same layout as ...
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
#[repr(C)]
enum MyEnum {
A(u32),
B(f32, u64),
C { x: u32, y: u8 },
D,
}

// ... this struct.
#[repr(C)]
struct MyEnumRepr {
tag: MyEnumDiscriminant,
payload: MyEnumFields,
}

// This is the discriminant enum.
#[repr(C)]
enum MyEnumDiscriminant { A, B, C, D }

// This is the variant union.
#[repr(C)]
union MyEnumFields {
A: MyAFields,
B: MyBFields,
C: MyCFields,
D: MyDFields,
}

#[repr(C)]
#[derive(Copy, Clone)]
struct MyAFields(u32);

#[repr(C)]
#[derive(Copy, Clone)]
struct MyBFields(f32, u64);

#[repr(C)]
#[derive(Copy, Clone)]
struct MyCFields { x: u32, y: u8 }

// This struct could be omitted (it is a zero-sized type), and it must be in
// C/C++ headers.
#[repr(C)]
#[derive(Copy, Clone)]
struct MyDFields;
```

> Note: `union`s with non-`Copy` fields are unstable, see [55149].

<span id="c-primitive-representation">Combining the `repr(C)`
and a primitive representation is only defined for enums with fields. The
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
primitive representation modifies the `repr(C)` by changing the representation
of the discriminant enum to have the representation of the chosen primitive
representation. So, if you chose the `u8` representation, then the discriminant
enum would have a size and alignment of 1 byte.</span>

> Note: This representation is primarily intended for Rust code that wants to
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
> interoperate with the idioms of preexisting C/C++ codebases. If you have
> control over both the Rust and C code, such as using C as FFI glue between
> Rust and some third language, then you should use a
> [primitive representation](#primitive-representation-of-enums-with-fields)
> instead.

### Primitive representations

The *primitive representations* are the representations with the same names as
the primitive integer types. That is: `u8`, `u16`, `u32`, `u64`, `u128`,
`usize`, `i8`, `i16`, `i32`, `i64`, `i128`, and `isize`.

Primitive representations can only be applied to enumerations.
Primitive representations can only be applied to enumerations and have
different behavior whether the enum has fields or no fields. It is an error
for [zero-variant enumerations] to have a primitive representation.

Combining two primitive representations together is unspecified.
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved

Combining the `C` representation and a primitive representation is described
[above](#c-primitive-representation).

#### Primitive Representation of Field-less Enums

For [field-less enums], they set the size and alignment to be the same as
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
the primitive type of the same name. For example, a field-less enum with
a `u8` representation can only have discriminants between 0 and 255 inclusive.

#### Primitive Representation of Enums With Fields

The representation of a `repr(int)` enum is a `repr(C)` union of `repr(C)`
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
structs for each variant with a field. The first field of each struct in the
union is a `repr(int)` version of the enum with all fields removed ("the tag")
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
and the remaining fields are the fields of that variant.

> Note: this representation is unchanged if the tag is given its own member in
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
> the union, should that make manipulation more clear for you (although in C++,
> to follow The Exact Word Of The Standard the tag member should be wrapped in
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
> a `struct`).

Because unions with non-`Copy` fields aren't allowed, this representation can
poliorcetics marked this conversation as resolved.
Show resolved Hide resolved
only be expressed in Rust if every field is also [`Copy`].

For [C-like enumerations], they set the size and alignment to be the same as the
primitive type of the same name. For example, a C-like enumeration with a `u8`
representation can only have discriminants between 0 and 255 inclusive.
```rust
// This enum has the same layout as ...
#[repr(u8)]
enum MyEnum {
A(u32),
B(f32, u64),
C { x: u32, y: u8 },
D,
}

// ... this union.
#[repr(C)]
union MyEnumRepr {
A: MyVariantA,
B: MyVariantB,
C: MyVariantC,
D: MyVariantD,
}

// This is the discriminant enum.
#[repr(u8)]
#[derive(Copy, Clone)]
enum MyEnumDiscriminant { A, B, C, D }

#[repr(C)]
#[derive(Clone, Copy)]
struct MyVariantA(MyEnumDiscriminant, u32);

It is an error for [zero-variant enumerations] to have a primitive
representation.
#[repr(C)]
#[derive(Clone, Copy)]
struct MyVariantB(MyEnumDiscriminant, f32, u64);

For all other enumerations, the layout is unspecified.
#[repr(C)]
#[derive(Clone, Copy)]
struct MyVariantC { tag: MyEnumDiscriminant, x: u32, y: u8 }

Likewise, combining two primitive representations together is unspecified.
#[repr(C)]
#[derive(Clone, Copy)]
struct MyVariantD(MyEnumDiscriminant);
```

### The alignment modifiers

Expand Down Expand Up @@ -379,12 +511,14 @@ used with any other representation.
[`align_of`]: ../std/mem/fn.align_of.html
[`size_of`]: ../std/mem/fn.size_of.html
[`Sized`]: ../std/marker/trait.Sized.html
[`Copy`]: ../std/marker/trait.Copy.html
[dynamically sized types]: dynamically-sized-types.md
[C-like enumerations]: items/enumerations.md#custom-discriminant-values-for-fieldless-enumerations
[field-less enums]: items/enumerations.md#custom-discriminant-values-for-fieldless-enumerations
[enumerations]: items/enumerations.md
[zero-variant enumerations]: items/enumerations.md#zero-variant-enums
[zero-variant enums]: items/enumerations.md#zero-variant-enums
[undefined behavior]: behavior-considered-undefined.md
[27060]: https://github.com/rust-lang/rust/issues/27060
[55149]: https://github.com/rust-lang/rust/issues/55149
[`PhantomData<T>`]: special-types-and-traits.md#phantomdatat
[Default]: #the-default-representation
[`C`]: #the-c-representation
Expand Down