Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix big-endian bitmasks smaller than a byte #267

Merged
merged 2 commits into from
Mar 21, 2022

Conversation

RalfJung
Copy link
Member

I don't actually know if 0b1000 is the expected value here on big-endian systems, but that seems more consistent with little-endian so I guess it is? Anyway, the fact that there is extra wiggle room here since the bitmask has to be padded to 8 bits means this seems like a case worth testing.

@RalfJung
Copy link
Member Author

Hihi, the assertion actually fails. :D Fun, fun. Is this a bug in LLVM or in portable-simd?

@workingjubilee
Copy link
Member

...Ferris have mercy.

@workingjubilee
Copy link
Member

workingjubilee commented Mar 17, 2022

Oh right, I think this follows different rules based on endianness.

@RalfJung
Copy link
Member Author

That would be a quirk only on masks less than 8 bits wide though. Output is the same across endianess for larger masks.

Also, https://doc.rust-lang.org/nightly/core/simd/trait.ToBitMask.html#tymethod.to_bitmask does not have a ton of detail, but it says: "Each bit of the bitmask corresponds to a mask lane, starting with the LSB."
I would argue the most reasonable interpretation says my test should succeed (the LSB in a u8 is on the same side no matter the endianess, it is 0b0000_0001, and anyway the docs do not mention that the output is endianess-dependent in any way). Certainly the LSB is not 0b0001_0000, and yet that is where a 4-bit mask starts on big-endian systems currently.

@RalfJung
Copy link
Member Author

So I think this here:

should probably additionally rotate_right(8-LANES) if LANES<8.

@calebzulawski
Copy link
Member

Yeah, a rotate or left shift would be appropriate (I wonder which would be faster or optimize better?). I never got around to it, but I did intend on providing a target-specific bitmask function as well, which has an unspecified (but consistent) ordering.

@programmerjake
Copy link
Member

iirc llvm doesn't actually specify bitmask layout if the length times the element size in bits isn't a multiple of 8:
https://llvm.org/docs/LangRef.html#vector-type

When <N*M> isn’t evenly divisible by the byte size the exact memory layout is unspecified (just like it is for an integral type of the same size). This is because different targets could put the padding at different positions when the type size is smaller than the type’s store size.

so, imho we should use a swizzle to convert the vector to have a multiple of 8 lanes, then bitcast to bytes, rather than bitcasting then trying to fix up after the fact by shifting/rotating.

@programmerjake
Copy link
Member

Swizzle, then bitcast demo:
https://rust.godbolt.org/z/6d3PYW46M

Output assembly for Mask<i8, 4> with swizzle then bitcast:

example::to_bitmask:
        movd    xmm0, dword ptr [rdi]
        pmovmskb        eax, xmm0
        ret

@calebzulawski
Copy link
Member

iirc llvm doesn't actually specify bitmask layout if the length times the element size in bits isn't a multiple of 8: https://llvm.org/docs/LangRef.html#vector-type

When <N*M> isn’t evenly divisible by the byte size the exact memory layout is unspecified (just like it is for an integral type of the same size). This is because different targets could put the padding at different positions when the type size is smaller than the type’s store size.

so, imho we should use a swizzle to convert the vector to have a multiple of 8 lanes, then bitcast to bytes, rather than bitcasting then trying to fix up after the fact by shifting/rotating.

rustc zero-extends the integer before storing it: https://github.com/rust-lang/rust/blob/461e8078010433ff7de2db2aaae8a3cfb0847215/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1109

@programmerjake
Copy link
Member

rustc zero-extends the integer before storing it: https://github.com/rust-lang/rust/blob/461e8078010433ff7de2db2aaae8a3cfb0847215/compiler/rustc_codegen_llvm/src/intrinsic.rs#L1109

ah, so that means the problem is that we're bit reversing an i8 rather than the iN for length N vectors, rather than having anything to do with the bitcast.

@calebzulawski
Copy link
Member

Yep, more or less.

@jhorstmann
Copy link

Is the LLVM behavior with the reversed bitmask on big-endian documented somewhere? I couldn't find anything in the llvm docs.

@RalfJung
Copy link
Member Author

It'd almost be better if the bits would be first rotated and then zero-extended... but that might not be feasible.

@RalfJung RalfJung changed the title add bitmask roundtrip test for vector length below 8 fix big-endian bitmasks smaller than a byte Mar 17, 2022
@programmerjake
Copy link
Member

programmerjake commented Mar 17, 2022

Is the LLVM behavior with the reversed bitmask on big-endian documented somewhere? I couldn't find anything in the llvm docs.

I originally found it by reading through the source for const vector bitcast, but later discovered it's in the llvm ir language reference:
https://llvm.org/docs/LangRef.html#vector-type

One way to describe the layout is by describing what happens when a vector such as <N x iM> is bitcasted to an integer type with N*M bits, and then following the rules for storing such an integer to memory.

A bitcast from a vector type to a scalar integer type will see the elements being packed together (without padding). The order in which elements are inserted in the integer depends on endianess. For little endian element zero is put in the least significant bits of the integer, and for big endian element zero is put in the most significant bits.

@RalfJung
Copy link
Member Author

Yeah, a rotate or left shift would be appropriate (I wonder which would be faster or optimize better?). I never got around to it, but I did intend on providing a target-specific bitmask function as well, which has an unspecified (but consistent) ordering.

Indeed a shift does it; I pushed that to this PR.

@workingjubilee
Copy link
Member

Thank you! Everything looks in order here, I think?

@workingjubilee workingjubilee merged commit 0711e11 into rust-lang:master Mar 21, 2022
@RalfJung RalfJung deleted the bitmask-roundtrip branch April 9, 2022 18:04
bors added a commit to rust-lang/miri that referenced this pull request Jul 22, 2022
bors added a commit to rust-lang/miri that referenced this pull request Jul 22, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants