Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add mask method to extract the mask as an integer #166

Closed
gnzlbg opened this issue Sep 12, 2018 · 3 comments
Closed

Add mask method to extract the mask as an integer #166

gnzlbg opened this issue Sep 12, 2018 · 3 comments
Labels
Enhancement New feature or request

Comments

@gnzlbg
Copy link
Contributor

gnzlbg commented Sep 12, 2018

A common operation on masks is converting the vector into an integer where each bit denotes a lane.

@gnzlbg gnzlbg added the Enhancement New feature or request label Sep 12, 2018
gnzlbg referenced this issue in hsivonen/encoding_rs Sep 12, 2018
@GabrielMajeri
Copy link
Contributor

@gnzlbg From what I can understand, I think this might help with #150, specifically:
we can use a mask to track which lanes have diverged, and then convert this mask to an integer which we then directly output as a bitmap.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 12, 2018

@GabrielMajeri I think that to implement this portably we might want to generate llvm-ir that looks like this (cc @rkruppe - https://gcc.godbolt.org/z/VNbIFQ):

define i8 @m32x4_to_i8(<4 x i32>) {
    %a = trunc <4 x i32> %0 to <4 x i1>
    %b = bitcast <4 x i1> %a to i4
    %c = zext i4 %b to i8
    ret i8 %c
}

Since we can't directly use the <N x i1> types from Rust, we will probably need to add a rustc intrinsic for this, but that shouldn't be hard.

@gnzlbg
Copy link
Contributor Author

gnzlbg commented Sep 12, 2018

After optimizations, that IR becomes

define i8 @m32x4_to_i8_opt(<4 x i32>) {
  %2 = and <4 x i32> %0, <i32 1, i32 1, i32 1, i32 1>
  %a = icmp ne <4 x i32> %2, zeroinitializer
  %b = bitcast <4 x i1> %a to i4
  %c = zext i4 %b to i8
  ret i8 %c
}

We could also generate IR like this:

define i8  @m32x4_to_i8_2(<4 x i32>) {
  %a = icmp ne <4 x i32> %0, zeroinitializer
  %b = bitcast <4 x i1> %a to i4
  %c = zext i4 %b to i8
  ret i8 %c
}

but for some reason the quality of the generated assembly differs significantly, and the unoptimized IR seems to be the best... (https://gcc.godbolt.org/z/I-Sco7):

m32x4_to_i8: #unoptimized
 vpslld $0x1f,%xmm0,%xmm0
 vpsrad $0x1f,%xmm0,%xmm0
 vmovmskps %xmm0,%eax
 retq   
 nop
m32x4_to_i8_opt:
 vpbroadcastd 0x0(%rip),%xmm1        # 19 <m32x4_to_i8_opt+0x9>
 vpand  %xmm1,%xmm0,%xmm0
 vpcmpeqd %xmm1,%xmm0,%xmm0
 vmovmskps %xmm0,%eax
 retq   
 nopw   %cs:0x0(%rax,%rax,1)
m32x4_to_i8_2:
 vpxor  %xmm1,%xmm1,%xmm1
 vpcmpeqd %xmm1,%xmm0,%xmm0
 vpcmpeqd %xmm1,%xmm1,%xmm1
 vpxor  %xmm1,%xmm0,%xmm0
 vmovmskps %xmm0,%eax
 retq   

bors added a commit to rust-lang/rust that referenced this issue Jan 24, 2019
Add intrinsic to create an integer bitmask from a vector mask

This PR adds a new simd intrinsic: `simd_bitmask(vector) -> unsigned integer` that creates an integer bitmask from a vector mask by extracting one bit of each vector lane.

This is required to implement: rust-lang/packed_simd#166 .

EDIT: the reason we need an intrinsics for this is that we have to truncate the vector lanes to an `<i1 x N>` vector, and then bitcast that to an `iN` integer (while making sure that we only materialize `i8`, ... , `i64` - that is, no `i1`, `i2`, `i4`, types), and we can't do any of that in a Rust library.

r? @rkruppe
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants