-
Notifications
You must be signed in to change notification settings - Fork 12.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Tracking issue for RFC 2645, "Transparent Unions" (formerly: and Enums) #60405
Comments
I plan on attempting to implement the feature by adapting my previous implementation. I'd appreciate it if someone could take a look at that commit and let me know if it's horribly wrong :) |
@pnkfelix Up for reviewing ^--- ? |
Tracking issue: rust-lang#60405
Implement RFC 2645 (transparent enums and unions) Tracking issue: #60405
Should we apply |
I'll submit a PR to update |
Tracking issue: rust-lang#60405
…=cramertj Make MaybeUninit #[repr(transparent)] Tracking issue: rust-lang#60405
…=cramertj Make MaybeUninit #[repr(transparent)] Tracking issue: rust-lang#60405
I ran into this today: #[repr(C)] struct S(u8, u32);
#[repr(transparent)]
union U { _x: S, y: () }
unsafe extern "C" fn foo(u: U) {
let pad = (&u as *const _ as *const u8).add(1).read();
assert_eq!(pad, 13); // Should fail when called below
}
pub unsafe fn bar() {
let mut u: U = U { y: () };
(&mut u as *mut _ as *mut u8).add(1).write(42);
foo(u);
} Because of However, due to For example, the ABI could state that S.0 is passed in a 1-byte register (e.g. core::ptr::write:
;; writes 42 to the address at ecx
mov byte ptr [ecx], 42
ret
core::ptr::<impl *mut T>::add:
;; increments the address at ecx by 1
lea eax, [ecx + 1]
ret
example::bar:
sub esp, 24
;; our union is at esp + 8, create ptr:
lea ecx, [esp + 8]
;; increment its address by 1
call core::ptr::<impl *mut T>::add
mov ecx, eax
;; writes 42 to the address (esp + 9)
call core::ptr::<impl *mut T>::write
;; now passing the union to foo begins
;; S.0 is put in `al` - padding be damned
mov al, byte ptr [esp + 8]
;; S.1 is put in `ecx`
mov ecx, dword ptr [esp + 12]
;; then these are moved back to the stack (esp + 16)
mov byte ptr [esp + 16], al
;; the padding at esp +17/18/19 is uninitialized
mov dword ptr [esp + 20], ecx
;; and then moved back and forth a couple of times
movsd xmm0, qword ptr [esp + 16]
movsd qword ptr [esp], xmm0
;; and then we call foo, which never sees 42
call example::foo
add esp, 24
ret If this happens, then the bytes at offset 1,2, and 3 of the example::bar:
ret If the AFAICT, these two constraints are incompatible. cc @eddyb @rust-lang/wg-unsafe-code-guidelines |
Ouch. Nice catch! The reason we implemented a "transparent" repr for unions (re-using the ABI of the only non-ZST field if possible) was because of code that used So, a way to keep that while also achieving that "unions preserve all bytes when copied around" would be to only use the ABI of the only non-ZST field if that ABI is an "Integer" or "vector" one, where we know all bytes are preserved. That would not really be |
@gnzlbg I think that's UB, with or without I'm not aware of any guarantees or requirements that a wrapper type (whether it's a |
make transparent enums more ordinary By recognizing that structs & unions have one variant, we can make the treatment of transparent enums less ad-hoc. cc rust-lang#60405 r? @davidtwco
make transparent enums more ordinary By recognizing that structs & unions have one variant, we can make the treatment of transparent enums less ad-hoc. cc rust-lang#60405 r? @davidtwco
We discussed this briefly on the language team meeting today. Attendance was low. However, those present agreed that it would be OK to move towards stabilizing |
Stabilization report & PR for |
…enkov Stabilize `#[repr(transparent)]` on `enum`s in Rust 1.42.0 # Stabilization report The following is the stabilization report for `#![feature(transparent_enums)]`. Tracking issue: #60405 [Version target](https://forge.rust-lang.org/#current-release-versions): 1.42 (2020-01-30 => beta, 2020-03-12 => stable). ## User guide A `struct` with only a single non-ZST field (let's call it `foo`) can be marked as `#[repr(transparent)]`. Such a `struct` has the same layout and ABI as `foo`. Here, we also extend this ability to `enum`s with only one variant, subject to the same restrictions as for the equivalent `struct`. That is, you can now write: ```rust #[repr(transparent)] enum Foo { Bar(u8) } ``` which, in terms of layout and ABI, is equivalent to: ```rust #[repr(transparent)] struct Foo(u8); ``` ## Motivation This is not a major feature that will unlock new and important use-cases. The utility of `repr(transparent)` `enum`s is indeed limited. However, there is still some value in it: 1. It provides conceptual simplification of the language in terms of treating univariant `enum`s and `struct`s the same, as both are product types. Indeed, languages like Haskell only have `data` as the only way to construct user-defined ADTs in the language. 2. In rare occasions, it might be that the user started out with a univariant `enum` for whatever reason (e.g. they thought they might extend it later). Now they want to make this `enum` `transparent` without breaking users by turning it into a `struct`. By lifting the restriction here, now they can. ## Technical specification The reference specifies [`repr(transparent)` on a `struct`](https://doc.rust-lang.org/nightly/reference/type-layout.html#the-transparent-representation) as: > ### The transparent Representation > > The `transparent` representation can only be used on `struct`s that have: > - a single field with non-zero size, and > - any number of fields with size 0 and alignment 1 (e.g. `PhantomData<T>`). > > Structs with this representation have the same layout and ABI as the single non-zero sized field. > > This is different than the `C` representation because a struct with the `C` representation will always have the ABI of a `C` `struct` while, for example, a struct with the `transparent` representation with a primitive field will have the ABI of the primitive field. > > Because this representation delegates type layout to another type, it cannot be used with any other representation. Here, we amend this to include univariant `enum`s as well with the same static restrictions and the same effects on dynamic semantics. ## Tests All the relevant tests are adjusted in the PR diff but are recounted here: - `src/test/ui/repr/repr-transparent.rs` checks that `repr(transparent)` on an `enum` must be univariant, rather than having zero or more than one variant. Restrictions on the fields inside the only variants, like for those on `struct`s, are also checked here. - A number of codegen tests are provided as well: - `src/test/codegen/repr-transparent.rs` (the canonical test) - `src/test/codegen/repr-transparent-aggregates-1.rs` - `src/test/codegen/repr-transparent-aggregates-2.rs` - `src/test/codegen/repr-transparent-aggregates-3.rs` - `src/test/ui/lint/lint-ctypes-enum.rs` tests the interactions with the `improper_ctypes` lint. ## History - 2019-04-30, RFC rust-lang/rfcs#2645 Author: @mjbshaw Reviewers: The Language Team This is the RFC that proposes allowing `#[repr(transparent)]` on `enum`s and `union`. - 2019-06-11, PR #60463 Author: @mjbshaw Reviewers: @varkor and @rkruppe The PR implements the RFC aforementioned in full. - 2019, PR #67323 Author: @Centril Reviewers: @davidtwco The PR reorganizes the static checks taking advantage of the fact that `struct`s and `union`s are internally represented as ADTs with a single variant. - This PR stabilizes `transparent_enums`. ## Related / possible future work The remaining work here is to figure out the semantics of `#[repr(transparent)]` on `union`s and stabilize those. This work continues to be tracked in #60405.
This is now just tracking transparent unions. I think I just realized there is a problem with transparent unions if we want to provide "bag of bytes" semantics for unions: #[repr(transparent)]
union U { f: u32 } As a union, However, to my knowledge, LLVM actually makes an |
I have two questions:
I'm still not sold on the "bag of bits" idea. I've tried to put an alternative forward but instead of receiving a good rebuttal for why my alternative is inferior it instead go sidetracked by "let's decide what padding bits are" when I think we're pretty much already in agreement on what "padding bits" means. Is this something that a chat on Zulip or something could solve? I think some real-time vocal communication could resolve this in minutes, whereas asynchronous text comms will take days (or weeks) (and GitHub is a suboptimal forum due to it hiding replies and whatnot). |
Since repr(Rust) unions have no ABI commitments, we can just represent them as a literal byte array. That allows each byte to be poison-or-not independently. |
@hanna-kruppe That's true, but I'm not sure it's useful to allow such granularity for poison because you still can't read |
That goes more into your second question (why/how this observation matters), which I deliberately didn't go into because it's a more complex topic and my time is currently very limited. |
#[repr(transparent)]
union U { f: u32 }
let mut u = U { f: 0 };
(&mut u as *mut _ as *mut MaybeUninit<u8>).add(1).write(MaybeUninit::uninit());
let u2 = u;
let v = (&u2 as *const _ as *const u8).read();
println!("{}", v); The Rust semantics as they exist in my head and as it is drafted here would guarantee that this program prints 0. This is what "unions are just bags of bytes" means. But it turns out LLVM actually says "nope this is UB as all of Of course we could try to adjust our semantics, but (a) that will make the semantics of unions significantly more complicated, and (b) it seems like a shame that LLVM would force us to cripple our semantics like that, for no good reason. If LLVM's type system was not quite so restrictive, we could just tell LLVM to load 4 bytes at once and preserve which byte is |
Thanks for the demo, @RalfJung. I personally think this should have the same behavior as |
@mjbshaw do you still think that if we replace |
Doesn't writing Of course, if LLVM does change uninitialized memory to poison, then we would run into this problem, without bitwise semantics. |
Indeed, this was written assuming LLVM would switch to Cc #94428 |
Is the concern in this comment about poison spreading to entire union fields not equally valid for |
Yeah sure, that's the same situation. Judging from this discussion, it seems like LLVM will get a "freezing load" operation. Whenever we have a load that would allow partially uninit data, we could use the freezing load and be sure that the data is preserved correctly. This does lose some information, but at least it would resolve the concern about spreading poison to neighboring bytes. |
So, poking around the various open issues and PRs and this looks like the best place to mention this. I mentioned in #101179 that I think that allowing DSTs in Was there ever an explicit reason to disallow this, or was it just not implemented/considered since no one had discussed it much? |
The problem is custom DST. If |
Ah, right -- we haven't fully eliminated the possibility of thin DSTs. I guess that we're still unsure what a proper custom DST RFC would look like, although my guess is that My gut feeling is to say that any RFC which would permit thin DSTs (which, as demonstrated by |
This is a tracking issue for the RFC "Transparent Unions and Enums" (rust-lang/rfcs#2645).
Steps:
Unresolved questions:
Also it is not clear if transparent unions can even be implemented on LLVM without seriously restricting our semantics for unions overall.
The text was updated successfully, but these errors were encountered: