Proposal: initial `@bitCast` semantics (packed + vector + array) #19755

jacobly0 · 2024-04-24T01:45:49Z

This is a partial resurrection of #10547 with an initially reduced scope and taking into account the packed struct changes since then.

The status quo implementation of @bitSizeOf and @bitCast are inconsistent across different types and fairly unimplementable across the various backends. According to the original rejected proposal, @bitCast now has the semantics of loading from a @ptrCasted pointer, however this is plainly not true for status quo due to #17802 changing a @ptrCast to a @bitCast with the explicit goal of fixed undefined behavior (related to load/store sizes). I'm also not convinced that this would be a usable definition anyway, since even a simple value like var x: u20 = 0xABCDE; may be represented in memory in many different ways, depending on the target and backend:

DE BC XA (current behavior of little-endian targets with the llvm backend)
EX CD AB
DE BC XA XX (current behavior of little-endian targets with the c backend, and on the x86_64 backend)
EX CD AB XX
XX DE BC XA
XX EX CD AB
AB CD EX
XA BC DE (current behavior of big-endian targets with the llvm backend)
AB CD EX XX
XA BC DE XX
XX AB CD EX
XX XA BC DE (current behavior of big-endian targets with the c backend)

However I think an intrinsic like @bitCast should be defined in a way that does not invoke this complexity, whereas it seems perfectly reasonable and necessary to define pointer casting in terms of the target- and backend-specific memory layout. Additionally, the fact that pointer casting is already legal, makes adding an intrinsic defined precisely in terms of it not add any additional functionality to the language. It could be argued that two things having the same semantics also violates Only one obvious way to do things.

This means that @bitCast actually needs a specific definition (such as in a language spec 🙄), but since it currently doesn't, it has different semantics for different types and is implemented inconsistently across the compiler. By defining @bitCast in a target and backend agnostic way, this operation becomes "safer" in some sense than @ptrCast since you don't have to worry about it behaving differently on a big endian target, for example. I believe this leads to a clear delineation of use cases that makes @bitCast worth having in the language as a separate concept.

The main motivation for resurrecting this proposal, and an argument that was not explored in the original proposal is the effect of @bitCast on vectors. With vectors rightly not having well-defined memory layout (given the wide variety of vector semantics across architectures) we lose the ability to convert between differently packed vectors, or even just between @Vector(8, bool), @Vector(8, u1), @Vector(8, i1), u8, and i8. While @bitCast could be defined elementwise on vectors and it's possible to convert from bool with @select and to bool with comparisons, that doesn't solve the use case of converting a vector to an integer.

I am going to start off with the reasonable assumptions that @bitSizeOf should work for all types that are allowed for @bitCast, and that @as(To, @bitCast(@as(From, from))) requires that @bitSizeOf(To) == @bitSizeOf(From) and performs a copy of that number of bits. The open question is what types should be allowed and how the order of these bits is defined for each of those types. I propose starting off with a limited, fairly uncontroversial set and to leave more complicated cases for a future proposal, in order to unblock progress on the backends more quickly.

The proposed types to be allowed initially, along with the value that @bitSizeOf would return:

packable types (allowed as the type of a packed struct field)
- void: 0 bits
- bool: 1 bit
- uN: N bits
- iN: N bits
- fN: N bits
- *T, ?*T, [*]T, ?[*]T, [*c]T, usize, isize, for runtime-allowed T: @bitSizeOf(usize) bits (note that this is not allowed as the type of a @bitCast in favor of @ptrFromInt, @intFromPtr, and @ptrCast)
- enum (T): @bitSizeOf(T) bits (note that this is not allowed as the type of a @bitCast in favor of @enumFromInt and @intFromEnum)
- packed struct (T): @bitSizeOf(T) bits
- packed union: comptime size: { var size = 0; for (@typeInfo(U).Union.fields) |field| size = @max(size, @bitSizeOf(field.type)); break :size size; } (note that Proposal: don't allow unused bits in packed unions #19754 (comment) will vastly simplify this to just @bitSizeOf(T) as in the previous case)
[N]T, for runtime-allowed T: N * @bitSizeOf(T) bits
@Vector(N, T), for runtime-allowed T: N * @bitSizeOf(T) bits (note that this is currently a packable type, but I don't think it should be if given that arrays aren't allowed)

If you number bits from lsb to msb starting at the first field of a packed struct, or the first element of an array or vector, for two types, then @bitCast would copy numbered bits of one type to the same numbered bit of another type. This matches the way packed struct orders bits and is meant to be consistent with that.

Types to consider for future proposals:

Error sets with the same semantics as the "error int type".
Error unions with a defined order between the error and the payload.
Non-pointer optionals with a defined position and meaning of the extra bit.
All structs with valid field types, bits are accumulated in field declaration order, not related to memory layout and ignoring padding.
Unions, but it is an open question how to define this.

* Upgrade from u8 to usize element types. - WebAssembly assumes u64. It should probably try to be target-aware instead. * Move the covered PC bits to after the header so it goes on the same page with the other rapidly changing memory (the header stats). depends on the semantics of accepted proposal #19755 closes #20994

* Upgrade from u8 to usize element types. - WebAssembly assumes u64. It should probably try to be target-aware instead. * Move the covered PC bits to after the header so it goes on the same page with the other rapidly changing memory (the header stats). depends on the semantics of accepted proposal ziglang#19755 closes ziglang#20994

jacobly0 added this to the 0.13.0 milestone Apr 24, 2024

mlugg mentioned this issue Jun 10, 2024

Proposal: Anonymous function literals #20242

Closed

andrewrk added the accepted This proposal is planned. label Aug 9, 2024

andrewrk mentioned this issue Aug 9, 2024

fuzzing: more optimized and correct management of 8-bit PC counters #21006

Merged

pgy mentioned this issue Sep 29, 2024

doc,langref: mention diffs of Zig and C packed structs #21413

Closed

190n mentioned this issue Nov 19, 2024

chore: delete some unused functions in meta.zig oven-sh/bun#15252

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Proposal: initial `@bitCast` semantics (packed + vector + array) #19755

Proposal: initial `@bitCast` semantics (packed + vector + array) #19755

jacobly0 commented Apr 24, 2024

Proposal: initial @bitCast semantics (packed + vector + array) #19755

Proposal: initial @bitCast semantics (packed + vector + array) #19755

Comments

jacobly0 commented Apr 24, 2024

Proposal: initial `@bitCast` semantics (packed + vector + array) #19755

Proposal: initial `@bitCast` semantics (packed + vector + array) #19755