Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: initial @bitCast semantics (packed + vector + array) #19755

Open
jacobly0 opened this issue Apr 24, 2024 · 0 comments
Open

Proposal: initial @bitCast semantics (packed + vector + array) #19755

jacobly0 opened this issue Apr 24, 2024 · 0 comments
Labels
accepted This proposal is planned. backend-c The C backend (CBE) outputs C source code. backend-self-hosted breaking Implementing this issue could cause existing code to no longer compile or have different behavior. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Milestone

Comments

@jacobly0
Copy link
Member

This is a partial resurrection of #10547 with an initially reduced scope and taking into account the packed struct changes since then.

The status quo implementation of @bitSizeOf and @bitCast are inconsistent across different types and fairly unimplementable across the various backends. According to the original rejected proposal, @bitCast now has the semantics of loading from a @ptrCasted pointer, however this is plainly not true for status quo due to #17802 changing a @ptrCast to a @bitCast with the explicit goal of fixed undefined behavior (related to load/store sizes). I'm also not convinced that this would be a usable definition anyway, since even a simple value like var x: u20 = 0xABCDE; may be represented in memory in many different ways, depending on the target and backend:

  • DE BC XA (current behavior of little-endian targets with the llvm backend)
  • EX CD AB
  • DE BC XA XX (current behavior of little-endian targets with the c backend, and on the x86_64 backend)
  • EX CD AB XX
  • XX DE BC XA
  • XX EX CD AB
  • AB CD EX
  • XA BC DE (current behavior of big-endian targets with the llvm backend)
  • AB CD EX XX
  • XA BC DE XX
  • XX AB CD EX
  • XX XA BC DE (current behavior of big-endian targets with the c backend)

However I think an intrinsic like @bitCast should be defined in a way that does not invoke this complexity, whereas it seems perfectly reasonable and necessary to define pointer casting in terms of the target- and backend-specific memory layout. Additionally, the fact that pointer casting is already legal, makes adding an intrinsic defined precisely in terms of it not add any additional functionality to the language. It could be argued that two things having the same semantics also violates Only one obvious way to do things.

This means that @bitCast actually needs a specific definition (such as in a language spec 🙄), but since it currently doesn't, it has different semantics for different types and is implemented inconsistently across the compiler. By defining @bitCast in a target and backend agnostic way, this operation becomes "safer" in some sense than @ptrCast since you don't have to worry about it behaving differently on a big endian target, for example. I believe this leads to a clear delineation of use cases that makes @bitCast worth having in the language as a separate concept.

The main motivation for resurrecting this proposal, and an argument that was not explored in the original proposal is the effect of @bitCast on vectors. With vectors rightly not having well-defined memory layout (given the wide variety of vector semantics across architectures) we lose the ability to convert between differently packed vectors, or even just between @Vector(8, bool), @Vector(8, u1), @Vector(8, i1), u8, and i8. While @bitCast could be defined elementwise on vectors and it's possible to convert from bool with @select and to bool with comparisons, that doesn't solve the use case of converting a vector to an integer.

I am going to start off with the reasonable assumptions that @bitSizeOf should work for all types that are allowed for @bitCast, and that @as(To, @bitCast(@as(From, from))) requires that @bitSizeOf(To) == @bitSizeOf(From) and performs a copy of that number of bits. The open question is what types should be allowed and how the order of these bits is defined for each of those types. I propose starting off with a limited, fairly uncontroversial set and to leave more complicated cases for a future proposal, in order to unblock progress on the backends more quickly.

The proposed types to be allowed initially, along with the value that @bitSizeOf would return:

  • packable types (allowed as the type of a packed struct field)
    • void: 0 bits
    • bool: 1 bit
    • uN: N bits
    • iN: N bits
    • fN: N bits
    • *T, ?*T, [*]T, ?[*]T, [*c]T, usize, isize, for runtime-allowed T: @bitSizeOf(usize) bits (note that this is not allowed as the type of a @bitCast in favor of @ptrFromInt, @intFromPtr, and @ptrCast)
    • enum (T): @bitSizeOf(T) bits (note that this is not allowed as the type of a @bitCast in favor of @enumFromInt and @intFromEnum)
    • packed struct (T): @bitSizeOf(T) bits
    • packed union: comptime size: { var size = 0; for (@typeInfo(U).Union.fields) |field| size = @max(size, @bitSizeOf(field.type)); break :size size; } (note that Proposal: don't allow unused bits in packed unions #19754 (comment) will vastly simplify this to just @bitSizeOf(T) as in the previous case)
  • [N]T, for runtime-allowed T: N * @bitSizeOf(T) bits
  • @Vector(N, T), for runtime-allowed T: N * @bitSizeOf(T) bits (note that this is currently a packable type, but I don't think it should be if given that arrays aren't allowed)

If you number bits from lsb to msb starting at the first field of a packed struct, or the first element of an array or vector, for two types, then @bitCast would copy numbered bits of one type to the same numbered bit of another type. This matches the way packed struct orders bits and is meant to be consistent with that.

Types to consider for future proposals:

  • Error sets with the same semantics as the "error int type".
  • Error unions with a defined order between the error and the payload.
  • Non-pointer optionals with a defined position and meaning of the extra bit.
  • All structs with valid field types, bits are accumulated in field declaration order, not related to memory layout and ignoring padding.
  • Unions, but it is an open question how to define this.

Related:

@jacobly0 jacobly0 added breaking Implementing this issue could cause existing code to no longer compile or have different behavior. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned. frontend Tokenization, parsing, AstGen, Sema, and Liveness. backend-self-hosted backend-c The C backend (CBE) outputs C source code. labels Apr 24, 2024
@jacobly0 jacobly0 added this to the 0.13.0 milestone Apr 24, 2024
@andrewrk andrewrk added the accepted This proposal is planned. label Aug 9, 2024
andrewrk added a commit that referenced this issue Aug 9, 2024
* Upgrade from u8 to usize element types.
  - WebAssembly assumes u64. It should probably try to be target-aware
    instead.
* Move the covered PC bits to after the header so it goes on the same
  page with the other rapidly changing memory (the header stats).

depends on the semantics of accepted proposal #19755

closes #20994
igor84 pushed a commit to igor84/zig that referenced this issue Aug 11, 2024
* Upgrade from u8 to usize element types.
  - WebAssembly assumes u64. It should probably try to be target-aware
    instead.
* Move the covered PC bits to after the header so it goes on the same
  page with the other rapidly changing memory (the header stats).

depends on the semantics of accepted proposal ziglang#19755

closes ziglang#20994
SammyJames pushed a commit to SammyJames/zig that referenced this issue Aug 13, 2024
* Upgrade from u8 to usize element types.
  - WebAssembly assumes u64. It should probably try to be target-aware
    instead.
* Move the covered PC bits to after the header so it goes on the same
  page with the other rapidly changing memory (the header stats).

depends on the semantics of accepted proposal ziglang#19755

closes ziglang#20994
Rexicon226 pushed a commit to Rexicon226/zig that referenced this issue Aug 13, 2024
* Upgrade from u8 to usize element types.
  - WebAssembly assumes u64. It should probably try to be target-aware
    instead.
* Move the covered PC bits to after the header so it goes on the same
  page with the other rapidly changing memory (the header stats).

depends on the semantics of accepted proposal ziglang#19755

closes ziglang#20994
richerfu pushed a commit to richerfu/zig that referenced this issue Oct 28, 2024
* Upgrade from u8 to usize element types.
  - WebAssembly assumes u64. It should probably try to be target-aware
    instead.
* Move the covered PC bits to after the header so it goes on the same
  page with the other rapidly changing memory (the header stats).

depends on the semantics of accepted proposal ziglang#19755

closes ziglang#20994
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
accepted This proposal is planned. backend-c The C backend (CBE) outputs C source code. backend-self-hosted breaking Implementing this issue could cause existing code to no longer compile or have different behavior. frontend Tokenization, parsing, AstGen, Sema, and Liveness. proposal This issue suggests modifications. If it also has the "accepted" label then it is planned.
Projects
None yet
Development

No branches or pull requests

2 participants