What should SIMD bitmasks look like? #126217

RalfJung · 2024-06-10T05:59:31Z

The portable-simd intrinsics simd_bitmask and simd_bitmask_select work with a bit-efficient representation of a SIMD vector of bool. Specifically they actually support two representations: as an integer of sufficient size, and as an array of u8.
However, the exact format of the array-based bitmask is endianess-dependent in subtle ways, so much so that portable-simd stopped using that format. The integer-based format is pretty simple (albeit still being endianess-dependent) but does not scale to vectors of arbitrary size.

IMO we should only have one format supported by the low-level intrinsics, and leave it to higher-level code like portable-simd to convert that to other formats if needed. That one format probably has to be the array-based one, since that is the only way to actually support vectors of arbitrary size. But what should that format look like? Currently it is endianess-dependent, and then portable-simd has to do some work to convert that into an endianess-independent format and back. Can we make the intrinsic directly use an endianess-independent format? (It is extremely rare for our intrinsics to behave in an endianess-dependent way.)

What are the key constraints the format has to satisfy? Without knowing those, here's a completely naive proposal:
the array must be big enough to contain at least as many bits as the vector has elements (but that's just a lower bound, arbitrarily bigger arrays are allowed), and then vector elements are mapped to bits in the array as follows: the vector element i is represented in array element i / 8, in the bit i % 8, where bits are indexed from most significant to least significant. So for instance the vector [-1, -1, -1, 0, 0, 0, 0, 0, 0, -1] becomes the bitmask [0b11100000, 0b01_000000], simply filling in the vector elements left-to right bit-for-bit and then padding with 0 until the array is full. I don't know if that format is any good -- probably it is not -- but it is certainly easy to explain. :)

Cc @calebzulawski @workingjubilee @programmerjake @rust-lang/opsem

The text was updated successfully, but these errors were encountered:

programmerjake · 2024-06-10T06:20:09Z

if we get generic-sized integers uint<N> (like C's unsigned _BitInt(N)) somewhat soon, I would just use those.

Without knowing those, here's a completely naive proposal: the array must be big enough to contain at least as many bits as the vector has elements (but that's just a lower bound, arbitrarily bigger arrays are allowed), and then vector elements are mapped to bits in the array as follows: the vector element i is represented in array element i / 8, in the bit i % 8, where bits are indexed from most significant to least significant.

The standard format that LLVM uses on little-endian (and x86 and a few other arches too) is that bits are counted from the LSB end to the MSB end, not MSB to LSB. The idea is if you have some integer type uint<N> then:
(uint::<N>::from_le_bytes(the_bytes) >> i) & 1 != 0 is true iif element i of the corresponding mask is true.

I strongly think that we should just use that format everywhere if we don't want an endian-dependent format and generic-sized integers aren't ready yet.

RalfJung · 2024-06-10T07:11:12Z

if we get generic-sized integers uint<N> (like C's unsigned _BitInt(N)) somewhat soon, I would just use those.

I don't know of any initiative working on them. What is the current status?

programmerjake · 2024-06-10T08:17:29Z

if we get generic-sized integers uint (like C's unsigned _BitInt(N)) somewhat soon, I would just use those.
I don't know of any initiative working on them. What is the current status?

There's a postponed RFC that people recently have been asking if it's been long enough to restart it: rust-lang/rfcs#2581 iirc 3-4 different people have been talking about it in the last month or two (mostly on Zulip or other random corners of the Rust project).

RalfJung · 2024-06-10T11:54:02Z

I strongly think that we should just use that format everywhere if we don't want an endian-dependent format and generic-sized integers aren't ready yet.

I don't have an opinion either way -- this sounds perfectly reasonable to me. We'd then say:
the vector element i is represented in array element i / 8, in the bit i % 8, where bits are indexed from least significant to most significant.

I think the LLVM IR for big-endian would then be something like

do the bitcast from <N x i1> to iN
reverse the bits
zero-extend to match the size of the array
reverse the bytes (strangely LLVM bswap only works for types with an even number of bytes, so if the array has e.g. length 3 we'd have to use some other encoding...)
transmute to array

Is there some particular instruction sequence we want to generate here or would something like that work?

RalfJung · 2024-06-10T13:22:47Z

FWIW I am also entirely open to the idea that the current behavior is already what we want (including on big-endian). But the fact that portable-simd stopped using the array-based variant entirely is an indication that something is not optimal. I have no idea what, as I don't really know the design space here. I see my role as that of an advisor with a t-opsem view point.

The reason the current semantics seem odd is that Miri currently has exactly 4 places where endianess matters:

loading integers
storing integers
simd_bitmask
simd_select_bitmask

So, the intrinsics are certainly somewhat striking. But maybe that's expected for converting between arrays of bits and a more compact representation; I don't have any intuition for what to expect here.

programmerjake · 2024-06-10T19:46:24Z

FWIW I am also entirely open to the idea that the current behavior is already what we want (including on big-endian). But the fact that portable-simd stopped using the array-based variant entirely is an indication that something is not optimal.

the reasons we stopped are that because generic const exprs aren't working that well, we have to have the output byte array have the same length as the input mask, despite being 8x overkill -- this would be solved by uint<N> since that N specifies bit count rather than byte count. also because those intrinsics are currently plain broken for non-power-of-two lengths on at least aarch64, probably due to a combination of rustc and llvm bugs.

rustbot added the needs-triage This issue may need triage. Remove it if it has been sufficiently triaged. label Jun 10, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

What should SIMD bitmasks look like? #126217

What should SIMD bitmasks look like? #126217

RalfJung commented Jun 10, 2024 •

edited

Loading

programmerjake commented Jun 10, 2024 •

edited

Loading

RalfJung commented Jun 10, 2024 via email

programmerjake commented Jun 10, 2024 •

edited

Loading

RalfJung commented Jun 10, 2024 •

edited

Loading

RalfJung commented Jun 10, 2024 •

edited

Loading

programmerjake commented Jun 10, 2024

What should SIMD bitmasks look like? #126217

What should SIMD bitmasks look like? #126217

Comments

RalfJung commented Jun 10, 2024 • edited Loading

programmerjake commented Jun 10, 2024 • edited Loading

RalfJung commented Jun 10, 2024 via email

programmerjake commented Jun 10, 2024 • edited Loading

RalfJung commented Jun 10, 2024 • edited Loading

RalfJung commented Jun 10, 2024 • edited Loading

programmerjake commented Jun 10, 2024

RalfJung commented Jun 10, 2024 •

edited

Loading

programmerjake commented Jun 10, 2024 •

edited

Loading

programmerjake commented Jun 10, 2024 •

edited

Loading

RalfJung commented Jun 10, 2024 •

edited

Loading

RalfJung commented Jun 10, 2024 •

edited

Loading