fix big-endian bitmasks smaller than a byte #267

RalfJung · 2022-03-17T00:29:58Z

I don't actually know if 0b1000 is the expected value here on big-endian systems, but that seems more consistent with little-endian so I guess it is? Anyway, the fact that there is extra wiggle room here since the bitmask has to be padded to 8 bits means this seems like a case worth testing.

RalfJung · 2022-03-17T01:01:44Z

Hihi, the assertion actually fails. :D Fun, fun. Is this a bug in LLVM or in portable-simd?

workingjubilee · 2022-03-17T01:04:01Z

...Ferris have mercy.

workingjubilee · 2022-03-17T01:09:48Z

Oh right, I think this follows different rules based on endianness.

RalfJung · 2022-03-17T01:17:37Z

That would be a quirk only on masks less than 8 bits wide though. Output is the same across endianess for larger masks.

Also, https://doc.rust-lang.org/nightly/core/simd/trait.ToBitMask.html#tymethod.to_bitmask does not have a ton of detail, but it says: "Each bit of the bitmask corresponds to a mask lane, starting with the LSB."
I would argue the most reasonable interpretation says my test should succeed (the LSB in a u8 is on the same side no matter the endianess, it is 0b0000_0001, and anyway the docs do not mention that the output is endianess-dependent in any way). Certainly the LSB is not 0b0001_0000, and yet that is where a 4-bit mask starts on big-endian systems currently.

RalfJung · 2022-03-17T01:20:35Z

So I think this here:

portable-simd/crates/core_simd/src/masks/full_masks.rs

Line 140 in 5f49d4c

bitmask.reverse_bits()

should probably additionally rotate_right(8-LANES) if LANES<8.

calebzulawski · 2022-03-17T02:21:32Z

Yeah, a rotate or left shift would be appropriate (I wonder which would be faster or optimize better?). I never got around to it, but I did intend on providing a target-specific bitmask function as well, which has an unspecified (but consistent) ordering.

programmerjake · 2022-03-17T02:59:01Z

iirc llvm doesn't actually specify bitmask layout if the length times the element size in bits isn't a multiple of 8:
https://llvm.org/docs/LangRef.html#vector-type

When <N*M> isn’t evenly divisible by the byte size the exact memory layout is unspecified (just like it is for an integral type of the same size). This is because different targets could put the padding at different positions when the type size is smaller than the type’s store size.

so, imho we should use a swizzle to convert the vector to have a multiple of 8 lanes, then bitcast to bytes, rather than bitcasting then trying to fix up after the fact by shifting/rotating.

programmerjake · 2022-03-17T03:23:25Z

Swizzle, then bitcast demo:
https://rust.godbolt.org/z/6d3PYW46M

Output assembly for Mask<i8, 4> with swizzle then bitcast:

example::to_bitmask:
        movd    xmm0, dword ptr [rdi]
        pmovmskb        eax, xmm0
        ret

calebzulawski · 2022-03-17T03:28:09Z

iirc llvm doesn't actually specify bitmask layout if the length times the element size in bits isn't a multiple of 8: https://llvm.org/docs/LangRef.html#vector-type

When <N*M> isn’t evenly divisible by the byte size the exact memory layout is unspecified (just like it is for an integral type of the same size). This is because different targets could put the padding at different positions when the type size is smaller than the type’s store size.

so, imho we should use a swizzle to convert the vector to have a multiple of 8 lanes, then bitcast to bytes, rather than bitcasting then trying to fix up after the fact by shifting/rotating.

rustc zero-extends the integer before storing it: https://github.com/rust-lang/rust/blob/461e8078010433ff7de2db2aaae8a3cfb0847215/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1109

programmerjake · 2022-03-17T03:34:51Z

rustc zero-extends the integer before storing it: https://github.com/rust-lang/rust/blob/461e8078010433ff7de2db2aaae8a3cfb0847215/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1109

ah, so that means the problem is that we're bit reversing an i8 rather than the iN for length N vectors, rather than having anything to do with the bitcast.

calebzulawski · 2022-03-17T03:45:50Z

Yep, more or less.

jhorstmann · 2022-03-17T09:20:35Z

Is the LLVM behavior with the reversed bitmask on big-endian documented somewhere? I couldn't find anything in the llvm docs.

RalfJung · 2022-03-17T14:09:42Z

It'd almost be better if the bits would be first rotated and then zero-extended... but that might not be feasible.

programmerjake · 2022-03-17T16:24:31Z

Is the LLVM behavior with the reversed bitmask on big-endian documented somewhere? I couldn't find anything in the llvm docs.

I originally found it by reading through the source for const vector bitcast, but later discovered it's in the llvm ir language reference:
https://llvm.org/docs/LangRef.html#vector-type

One way to describe the layout is by describing what happens when a vector such as <N x iM> is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.

A bitcast from a vector type to a scalar integer type will see the elements being packed together (without padding). The order in which elements are inserted in the integer depends on endianess. For little endian element zero is put in the least significant bits of the integer, and for big endian element zero is put in the most significant bits.

RalfJung · 2022-03-17T17:16:05Z

Yeah, a rotate or left shift would be appropriate (I wonder which would be faster or optimize better?). I never got around to it, but I did intend on providing a target-specific bitmask function as well, which has an unspecified (but consistent) ordering.

Indeed a shift does it; I pushed that to this PR.

workingjubilee · 2022-03-21T07:05:25Z

Thank you! Everything looks in order here, I think?

portable-simd: test bitmasks smaller than a byte Blocked on rust-lang/portable-simd#267 propagating to the [rustc repo](https://github.com/rust-lang/rust/tree/master/library/portable-simd)

add bitmask roundtrip test for vector length below 8

50fbfa4

RalfJung mentioned this pull request Mar 17, 2022

implement simd bitmask intrinsics rust-lang/miri#2029

Merged

RalfJung mentioned this pull request Mar 17, 2022

Portable SIMD support rust-lang/miri#1912

Closed

fix big-endian bitmasks smaller than a byte

60555b5

RalfJung changed the title ~~add bitmask roundtrip test for vector length below 8~~ fix big-endian bitmasks smaller than a byte Mar 17, 2022

calebzulawski approved these changes Mar 17, 2022

View reviewed changes

workingjubilee merged commit 0711e11 into rust-lang:master Mar 21, 2022

RalfJung mentioned this pull request Mar 21, 2022

portable-simd: test bitmasks smaller than a byte rust-lang/miri#2035

Merged

RalfJung deleted the bitmask-roundtrip branch April 9, 2022 18:04

This was referenced Dec 3, 2023

also test simd_select_bitmask on arrays for less than 8 elements rust-lang/miri#3205

Merged

fix simd_bitmask docs #378

Merged

from_bitmask_vector on big-endian calls simd_select_bitmask with the mask at the wrong end of the byte #379

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix big-endian bitmasks smaller than a byte #267

fix big-endian bitmasks smaller than a byte #267

RalfJung commented Mar 17, 2022

RalfJung commented Mar 17, 2022

workingjubilee commented Mar 17, 2022

workingjubilee commented Mar 17, 2022 •

edited

Loading

RalfJung commented Mar 17, 2022

RalfJung commented Mar 17, 2022

calebzulawski commented Mar 17, 2022

programmerjake commented Mar 17, 2022

programmerjake commented Mar 17, 2022

calebzulawski commented Mar 17, 2022

programmerjake commented Mar 17, 2022

calebzulawski commented Mar 17, 2022

jhorstmann commented Mar 17, 2022

RalfJung commented Mar 17, 2022

programmerjake commented Mar 17, 2022 •

edited

Loading

RalfJung commented Mar 17, 2022

workingjubilee commented Mar 21, 2022

fix big-endian bitmasks smaller than a byte #267

fix big-endian bitmasks smaller than a byte #267

Conversation

RalfJung commented Mar 17, 2022

RalfJung commented Mar 17, 2022

workingjubilee commented Mar 17, 2022

workingjubilee commented Mar 17, 2022 • edited Loading

RalfJung commented Mar 17, 2022

RalfJung commented Mar 17, 2022

calebzulawski commented Mar 17, 2022

programmerjake commented Mar 17, 2022

programmerjake commented Mar 17, 2022

calebzulawski commented Mar 17, 2022

programmerjake commented Mar 17, 2022

calebzulawski commented Mar 17, 2022

jhorstmann commented Mar 17, 2022

RalfJung commented Mar 17, 2022

programmerjake commented Mar 17, 2022 • edited Loading

RalfJung commented Mar 17, 2022

workingjubilee commented Mar 21, 2022

workingjubilee commented Mar 17, 2022 •

edited

Loading

programmerjake commented Mar 17, 2022 •

edited

Loading