Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

What should SIMD bitmasks look like? #126217

Open
RalfJung opened this issue Jun 10, 2024 · 6 comments
Open

What should SIMD bitmasks look like? #126217

RalfJung opened this issue Jun 10, 2024 · 6 comments
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-discussion Category: Discussion or questions that doesn't represent real issues. PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team

Comments

@RalfJung
Copy link
Member

RalfJung commented Jun 10, 2024

The portable-simd intrinsics simd_bitmask and simd_bitmask_select work with a bit-efficient representation of a SIMD vector of bool. Specifically they actually support two representations: as an integer of sufficient size, and as an array of u8.
However, the exact format of the array-based bitmask is endianess-dependent in subtle ways, so much so that portable-simd stopped using that format. The integer-based format is pretty simple (albeit still being endianess-dependent) but does not scale to vectors of arbitrary size.

IMO we should only have one format supported by the low-level intrinsics, and leave it to higher-level code like portable-simd to convert that to other formats if needed. That one format probably has to be the array-based one, since that is the only way to actually support vectors of arbitrary size. But what should that format look like? Currently it is endianess-dependent, and then portable-simd has to do some work to convert that into an endianess-independent format and back. Can we make the intrinsic directly use an endianess-independent format? (It is extremely rare for our intrinsics to behave in an endianess-dependent way.)

What are the key constraints the format has to satisfy? Without knowing those, here's a completely naive proposal:
the array must be big enough to contain at least as many bits as the vector has elements (but that's just a lower bound, arbitrarily bigger arrays are allowed), and then vector elements are mapped to bits in the array as follows: the vector element i is represented in array element i / 8, in the bit i % 8, where bits are indexed from most significant to least significant. So for instance the vector [-1, -1, -1, 0, 0, 0, 0, 0, 0, -1] becomes the bitmask [0b11100000, 0b01_000000], simply filling in the vector elements left-to right bit-for-bit and then padding with 0 until the array is full. I don't know if that format is any good -- probably it is not -- but it is certainly easy to explain. :)

Cc @calebzulawski @workingjubilee @programmerjake @rust-lang/opsem

@rustbot rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 10, 2024
@programmerjake
Copy link
Member

programmerjake commented Jun 10, 2024

if we get generic-sized integers uint<N> (like C's unsigned _BitInt(N)) somewhat soon, I would just use those.

Without knowing those, here's a completely naive proposal: the array must be big enough to contain at least as many bits as the vector has elements (but that's just a lower bound, arbitrarily bigger arrays are allowed), and then vector elements are mapped to bits in the array as follows: the vector element i is represented in array element i / 8, in the bit i % 8, where bits are indexed from most significant to least significant.

The standard format that LLVM uses on little-endian (and x86 and a few other arches too) is that bits are counted from the LSB end to the MSB end, not MSB to LSB. The idea is if you have some integer type uint<N> then:
(uint::<N>::from_le_bytes(the_bytes) >> i) & 1 != 0 is true iif element i of the corresponding mask is true.

I strongly think that we should just use that format everywhere if we don't want an endian-dependent format and generic-sized integers aren't ready yet.

@RalfJung
Copy link
Member Author

RalfJung commented Jun 10, 2024 via email

@jieyouxu jieyouxu added T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. A-SIMD Area: SIMD (Single Instruction Multiple Data) C-discussion Category: Discussion or questions that doesn't represent real issues. PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) T-opsem Relevant to the opsem team and removed needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. labels Jun 10, 2024
@programmerjake
Copy link
Member

programmerjake commented Jun 10, 2024

if we get generic-sized integers uint (like C's unsigned _BitInt(N)) somewhat soon, I would just use those.
I don't know of any initiative working on them. What is the current status?

There's a postponed RFC that people recently have been asking if it's been long enough to restart it: rust-lang/rfcs#2581 iirc 3-4 different people have been talking about it in the last month or two (mostly on Zulip or other random corners of the Rust project).

@RalfJung
Copy link
Member Author

RalfJung commented Jun 10, 2024

I strongly think that we should just use that format everywhere if we don't want an endian-dependent format and generic-sized integers aren't ready yet.

I don't have an opinion either way -- this sounds perfectly reasonable to me. We'd then say:
the vector element i is represented in array element i / 8, in the bit i % 8, where bits are indexed from least significant to most significant.

I think the LLVM IR for big-endian would then be something like

  • do the bitcast from <N x i1> to iN
  • reverse the bits
  • zero-extend to match the size of the array
  • reverse the bytes (strangely LLVM bswap only works for types with an even number of bytes, so if the array has e.g. length 3 we'd have to use some other encoding...)
  • transmute to array

Is there some particular instruction sequence we want to generate here or would something like that work?

@RalfJung
Copy link
Member Author

RalfJung commented Jun 10, 2024

FWIW I am also entirely open to the idea that the current behavior is already what we want (including on big-endian). But the fact that portable-simd stopped using the array-based variant entirely is an indication that something is not optimal. I have no idea what, as I don't really know the design space here. I see my role as that of an advisor with a t-opsem view point.

The reason the current semantics seem odd is that Miri currently has exactly 4 places where endianess matters:

  • loading integers
  • storing integers
  • simd_bitmask
  • simd_select_bitmask

So, the intrinsics are certainly somewhat striking. But maybe that's expected for converting between arrays of bits and a more compact representation; I don't have any intuition for what to expect here.

@programmerjake
Copy link
Member

FWIW I am also entirely open to the idea that the current behavior is already what we want (including on big-endian). But the fact that portable-simd stopped using the array-based variant entirely is an indication that something is not optimal.

the reasons we stopped are that because generic const exprs aren't working that well, we have to have the output byte array have the same length as the input mask, despite being 8x overkill -- this would be solved by uint<N> since that N specifies bit count rather than byte count. also because those intrinsics are currently plain broken for non-power-of-two lengths on at least aarch64, probably due to a combination of rustc and llvm bugs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-SIMD Area: SIMD (Single Instruction Multiple Data) C-discussion Category: Discussion or questions that doesn't represent real issues. PG-portable-simd Project group: Portable SIMD (https://github.com/rust-lang/project-portable-simd) T-compiler Relevant to the compiler team, which will review and decide on the PR/issue. T-opsem Relevant to the opsem team
Projects
None yet
Development

No branches or pull requests

4 participants