Portable vector shuffles. #387

gnzlbg · 2018-03-20T14:00:37Z

This PR implements an API for portable vector shuffles.

I've opened #388 to discuss this API.

alexcrichton · 2018-03-20T15:01:33Z

Nice!

I'll admit though that I'm pretty wary about landing this, so much so that I think we'll want to keep this out of stdsimd for now if we can. I feel like the API for shuffles here is pretty up in the air (especially wrt language support), and I'm also not certain of the impact of this change once we include it in the standard library itself.

The stability of exported macros in libstd is historically a tricky topic (and even the exported traits here) and since this module will be directly included into libstd I'm hesitant to include this. I feel like in the long run we'll either want this to be a procedural macro in rustc and also have more MIR/typeck support for preventing errors at compile time.

How critical are shuffles though to the first pass of a portable API?

gnzlbg · 2018-03-20T15:14:37Z

I feel like in the long run we'll either want this to be a procedural macro in rustc and also have more MIR/typeck support for preventing errors at compile time.

I thought about this. Do you have a pointer to some macro that is implemented like this in rustc? I might give this a shot. About the type checking, the only thing that isn't type checked here is that the indices access the vectors in bounds (it is checked in trans). As mentioned in the comments, this should be possible in MIR/typeck, but not in librust_typeck/check/intrinsics.rs.

Also, I forgot to mention that the intrinsics should probably be annotated with the macro that checks that [T; N] is a compile-time constant in typeck instead of doing that in trans as well.

How critical are shuffles through to the first pass of a portable API?

They aren't in the first pass so they aren't critical at all. I just wanted to open an issue about a possible design, and thought that should better come with an implementation. I could add a #[cfg(feature = "stdbuild")] to the files and tests here so that these are not included in libstd builds but... i am just going to close this for now.

alexcrichton · 2018-03-20T15:26:40Z

Hm so thinking more implementation wise this would probably actually not be much of a procedural macro but rather almost entirely a typeck thing. In typeck we can do things like const eval and otherwise type checking so I the only reason we'd want to use a procedural macro would be to perhaps use a special AST node that can't be syntactically constructed (like asm!).

In that sense it may make sense to just leave these as intrinsics and just have the intrinsic be super specially typechecked?

Before implementing that though this is probably something we'd want agreement on via an RFC before having the implementation

gnzlbg · 2018-03-20T15:46:02Z

In that sense it may make sense to just leave these as intrinsics and just have the intrinsic be super specially typechecked?

cc @eddyb because were were talking about this a couple of hours ago. Basically if we are going to go through all that trouble, we must type check that the indices in the array of constants are in bounds. In particular, that for the one vector case they are in range [0, T::lanes()) and for the two-vectors case in range [0, 2*T::lanes()).

An alternative would be to have the macro in this PR have zero monomorphization time errors. The naive way to do that would be to not error on an index out-of-bounds in trans, and instead, insert a panic. But honestly I prefer the monomorphization-time error to that solution.

danielrh · 2018-04-24T18:53:52Z

Another way of approaching this is to have the function take in a trait with associated consts. I haven't found a less clunky way of doing it yet, but it could be something like this, where indices are checked at compile time:

struct SIMD {
    pub data: [i16;8],
}
macro_rules! check_indices {
    () => {fn check_indices() {
        let _test0: [u8;7 - Self::INDEX[0]] = [0;7 - Self::INDEX[0]];
        let _test1: [u8;7 - Self::INDEX[1]] = [0;7 - Self::INDEX[1]];
        let _test2: [u8;7 - Self::INDEX[2]] = [0;7 - Self::INDEX[2]];
        let _test3: [u8;7 - Self::INDEX[3]] = [0;7 - Self::INDEX[3]];
        let _test4: [u8;7 - Self::INDEX[4]] = [0;7 - Self::INDEX[4]];
        let _test5: [u8;7 - Self::INDEX[5]] = [0;7 - Self::INDEX[5]];
        let _test6: [u8;7 - Self::INDEX[6]] = [0;7 - Self::INDEX[6]];
        let _test7: [u8;7 - Self::INDEX[7]] = [0;7 - Self::INDEX[7]];
    }}
}
trait ConstIndices {
    const INDEX: [usize; 8];
    fn check_indices();
}

struct Backwards {}
impl ConstIndices for Backwards {
    const INDEX: [usize;8] = [7,6,5,4,3,2,1,0];
    check_indices!();
}

fn shuffle<Indices:ConstIndices>(vv: SIMD, _ind:Indices) -> SIMD{
    let v = vv.data;
    SIMD{
        data:[v[Indices::INDEX[0]],
             v[Indices::INDEX[1]],
             v[Indices::INDEX[2]],
             v[Indices::INDEX[3]],
             v[Indices::INDEX[4]],
             v[Indices::INDEX[5]],
             v[Indices::INDEX[6]],
             v[Indices::INDEX[7]]]
    }
}

fn main() {
    let _result = shuffle(SIMD{data:[2;8]}, Backwards{});
}

gnzlbg · 2018-04-27T13:09:51Z

Another way of approaching this is to have the function take in a trait with associated consts.

Note that the number of indices is variable: you can use shuffles to create smaller or larger vectors than the input ones:

// Given:
let a: i32x8;
let b: i32x8;

// All of these work:
let c: i32x2 = shuffle!(a, b, [3, 15]);
let d: i32x4 = shuffle!(a, b, [1, 15, 3, 12]);
let e: i32x16 = shuffle!(a, b, [0, 1, ..., 15]);

IIUC the associated const approach is going to need a little bit more work, since without const generics, the length of the associated const array cannot be generic either.

danielrh · 2018-04-27T22:25:01Z

Not sure this is still a good idea, but just to throw this out there with existing mechanisms: what if you had a separate function for narrowing or widening a vector that didn't take arguments (either 0 padding it or repeating it, whatever was easiest/fastest)
eg

// Given:
let a: i32x8;
let b: i32x8;

let c: i32x2 = i32x2::prefix_trunc(shuffle!(a, b, [3, 15, 0, 0]));
let d: i32x4 = i32x4::prefix_trunc(shuffle!(a, b, [1, 15, 3, 12, 0,  0, 0 ,0]));
let e: i32x16 = shuffle!(i32x16::concat(a, b), [0, 1, ..., 15]);

and then teach the optimizer to fuse the two shuffles you do internally
that way the type system stays simple... but it is a little more verbose than it could be.

gnzlbg · 2018-04-28T11:33:49Z

Ideally shuffle would be just a method on vectors with the following signature:

fn shuffle<const N: usize, R>(self, other: Self, const indices: [usize; N]) -> R 
    where R: SimdVector<Item=Self::Item, Length=N> { ... }

There are multiple problems that we currently have to face:

1. lack of const generics
1. lack of const function arguments

We can workaround lack of const generics by using a trait on arrays, so we can specify:

fn shuffle<I: Indices, R>(self, other: Self, const indices: I) -> R 
    where R: SimdVector<Item=Self::Item, Length=I::Length> { ... }

We can work around lack of const function arguments by making it a shuffle! macro instead, which means that we loose method position, but otherwise that's not too bad.

We could make it a very special free function (instead of a macro), by implementing it in MIR typeck as @alexcrichton suggested. There we can require that indices is an array of const items, inspect the array values to error if the indices are out-of-bounds at compile-time, etc.

So we would get a magic "function" with this signature instead:

fn shuffle<T: SimdVector, R, /*N is magic*/>(a: T, b: T, /*magically const*/ indices: [usize; N]) -> R 
    where R: SimdVector<Item=T::Item, Length=N> { ... }

This PR implements it as a macro in the language, because that's basically the only way we currently have to do this with the available compiler magic, but I agree with @alexcrichton that doing this in MIR typeck is the best path forward. Maybe as the language gets const generics and const function arguments, the shuffle "function" signature can become less and less magical.

FWIW, once you have shuffle, you can implement a.concat(b) on top of it without any magic:

trait Concat: SimdVector {
    type Result: SimdVector<Item=Self::Item>;
    fn concat(self, other: Self) -> Self::Result;
}

impl Concat for u32x4 {
    type Result = u32x8;
    fn concat(self, other: u32x4) -> u32x8 {
        shuffle!(self, other, [0, 1, 2, 3, 4, 5, 6, 7])
    }
}

let a: u32x4;
let b: u32x4;
let c: u32x8 = a.concat(b);

I think that adding concat to std::simd is something worth doing, but I prefer to nail down shuffle first.

gnzlbg · 2018-06-04T07:46:32Z

@alexcrichton shall I reopen and merge this. In a nutshell, I agree that it would be better to move this macro to rustc, but I don't have the time to do it, and its API is something worth getting experience with in the meantime.

danielrh · 2018-06-04T10:41:19Z

I agree it’s worth merging: Shuffle is very important for serious simd work

…

On Mon, Jun 4, 2018 at 12:46 AM gnzlbg ***@***.***> wrote: @alexcrichton <https://github.com/alexcrichton> shall I reopen and merge this. In a nutshell, I agree that it would be better to move this macro to rustc, but I don't have the time to do it, and its API is something worth getting experience with. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#387 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAC4kWQ_daKFjGQltqbjkNUMnZKEEvYQks5t5OXZgaJpZM4Sx8pN> .

gnzlbg · 2018-06-13T12:47:06Z

@alexcrichton maybe we could ask feedback for the lib teams on this?

Amanieu · 2018-06-15T19:49:02Z

coresimd/ppsv/api/shuffles.rs

+        }
+    };
+    ($vec:expr, [$($l:expr),*]) => {
+        shuffle!($vec, $vec, [$($l),*])


This will evaluate $vec twice.

@Amanieu

What's the best way to fix this? Just:

{ let v = $vec; shuffle!(v, v, ...) }

?

match $vec { v => shuffle!(v, v, ...) } is preferred because of let ... in ... semantics, which regular let doesn't have (less relevant here, but all the temporaries in $vec stay alive for the duration of the match).

Amanieu · 2018-06-15T20:13:56Z

Are there any plans to support single-element vectors (e.g. u64x1)? NEON has such types and LLVM does not use the same codegen as scalar types for these (for integer types, values are kept in SIMD registers rather then being first moved to a general-purpose register).

Amanieu · 2018-06-15T20:15:09Z

This is somewhat relevant to this issue since we will need to add a simd_shuffle1 intrinsic to support this, and I was wondering if it was worth extending this to the generic API as well.

gnzlbg · 2018-06-15T20:27:05Z

@Amanieu I’ll answer tomorrow more on depth but x86 (__m64) and ppc (64x1, 128x1,..) has them as well. So the answer is yes, we have to support these, but doing so requires a bit of support in rustc - check out the “x86_mmx” codegen in rustc if your are interested, we might need to extend that to work for arm and ppc as well. I’ll check the issue with the duplicated vectors tomorrow, but thanks for the feedback, the code shouldn’t do that.

gnzlbg · 2018-06-21T08:58:28Z

I am holding this PR until we have an idea about how to resolve: rust-lang/rfcs#2366 (comment)

gnzlbg · 2018-07-16T15:00:39Z

Superseeded by https://github.com/gnzlbg/ppv

gnzlbg mentioned this pull request Mar 20, 2018

Portable vector shuffles #388

Closed

gnzlbg closed this Mar 20, 2018

gnzlbg reopened this Jun 12, 2018

gnzlbg added 2 commits June 12, 2018 10:40

portable vector shuffles

159cbb1

re-export macro from std; add test

9965adb

gnzlbg force-pushed the shuffles branch from 3c386f3 to 9965adb Compare June 12, 2018 09:28

supress warning when shuffle! is called inside unsafe block

259d8c0

Amanieu reviewed Jun 15, 2018

View reviewed changes

avoid evaluating vec twice

f09075c

TheIronBorn mentioned this pull request Jun 29, 2018

add SIMD swap_bytes/to_le pitdicker/rand#1

Closed

gnzlbg closed this Jul 16, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Portable vector shuffles. #387

Portable vector shuffles. #387

gnzlbg commented Mar 20, 2018 •

edited

Loading

alexcrichton commented Mar 20, 2018

gnzlbg commented Mar 20, 2018

alexcrichton commented Mar 20, 2018

gnzlbg commented Mar 20, 2018 •

edited

Loading

danielrh commented Apr 24, 2018

gnzlbg commented Apr 27, 2018 •

edited

Loading

danielrh commented Apr 27, 2018

gnzlbg commented Apr 28, 2018 •

edited

Loading

gnzlbg commented Jun 4, 2018 •

edited

Loading

danielrh commented Jun 4, 2018 via email

gnzlbg commented Jun 13, 2018

Amanieu Jun 15, 2018

gnzlbg Jun 18, 2018

eddyb Jun 18, 2018

Amanieu commented Jun 15, 2018

Amanieu commented Jun 15, 2018

gnzlbg commented Jun 15, 2018 via email

gnzlbg commented Jun 21, 2018

gnzlbg commented Jul 16, 2018

Portable vector shuffles. #387

Portable vector shuffles. #387

Conversation

gnzlbg commented Mar 20, 2018 • edited Loading

alexcrichton commented Mar 20, 2018

gnzlbg commented Mar 20, 2018

alexcrichton commented Mar 20, 2018

gnzlbg commented Mar 20, 2018 • edited Loading

danielrh commented Apr 24, 2018

gnzlbg commented Apr 27, 2018 • edited Loading

danielrh commented Apr 27, 2018

gnzlbg commented Apr 28, 2018 • edited Loading

gnzlbg commented Jun 4, 2018 • edited Loading

danielrh commented Jun 4, 2018 via email

gnzlbg commented Jun 13, 2018

Amanieu Jun 15, 2018

Choose a reason for hiding this comment

gnzlbg Jun 18, 2018

Choose a reason for hiding this comment

eddyb Jun 18, 2018

Choose a reason for hiding this comment

Amanieu commented Jun 15, 2018

Amanieu commented Jun 15, 2018

gnzlbg commented Jun 15, 2018 via email

gnzlbg commented Jun 21, 2018

gnzlbg commented Jul 16, 2018

gnzlbg commented Mar 20, 2018 •

edited

Loading

gnzlbg commented Mar 20, 2018 •

edited

Loading

gnzlbg commented Apr 27, 2018 •

edited

Loading

gnzlbg commented Apr 28, 2018 •

edited

Loading

gnzlbg commented Jun 4, 2018 •

edited

Loading