add rotate_left/right #496

TheIronBorn · 2018-06-24T05:21:06Z

(also removed some duplicate tests)

It would be good to test that on supported hardware this compiles to the correct instruction, but I don't know how to do that.

Also some scalar convenience impls would be good.

(rust-random/rand#377)

gnzlbg · 2018-06-24T20:22:37Z

It would be good to test that on supported hardware this compiles to the correct instruction, but I don't know how to do that.

It is not as easy as for the intrinsics but it is possible. You can create a test in the crates/coresimd/tests directory, and import the assert_instr crate. Then you can create an opaque function that calls the portable intrinsics, and use the assert_instr macro on that to check the generated code. You will need to gate it on the architecture and cpu features using cfg_attr.

For examples, all intrinsics in the x86 and armdirectories ofcoresimduseassert_instr`.

Let me know if you run into issues while giving it a try.

TheIronBorn · 2018-06-25T22:48:03Z

Any concerns about organization etc?

TheIronBorn · 2018-06-26T01:07:01Z

Huh, some of these are being compiled from rotate_right to vprolq. Not sure how to make that pass the tests.

Also not sure what's happening with the other failures

gnzlbg · 2018-06-26T07:40:37Z

coresimd/ppsv/api/rotates.rs

+#![allow(unused)]
+
+/// Trait used for overloading rotates
+pub trait Rotate<T> {


~~Given that you provide a specific implementation for each vector type below, this doesn't really need to be a trait.

In general we try to keep the traits we use to a minimum, and most of them are sealed (not exposed in the public API of the library).~~ EDIT: it's ok, see below.

gnzlbg · 2018-06-26T07:43:17Z

coresimd/ppsv/api/rotates.rs

+            }
+        }
+
+        impl super::api::Rotate<$elem_ty> for $id {


Aha, I see why you need the trait now.

gnzlbg · 2018-06-26T07:45:07Z

crates/coresimd/tests/rotate_tests.rs

+
+#[inline]
+#[target_feature(enable = "avx512f")]
+#[cfg_attr(test, assert_instr(vprorvq))]


The assert_instr and target_feature need to be guarded on x86 and x86_64. Same for all other functions below:

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = ...))]

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), assert_instr(....))]

should be enough.

gnzlbg

So this looks good.

The only thing that worries me is about adding a trait called Rotate to core, which is a very general name and also begs the question why don't the integer types implement it.

I think it would be less controversial to just implement this for vectors as methods, without the trait, and require people to use splat. You can then post in the RFC rust-lang/rfcs#2366 about adding a trait for this, and see what people's consensus about this is.

So... I am not fundamentally opposed to adding the trait. In fact I think using the trait makes rotate nicer to use, but there are organizational concerns about exposing traits in the std library that really must make them be worth the effort. For things like shuffles and select, there really aren't any good alternatives to using traits, but for rotates, this is a bit less the case. Also, for select and shuffle those traits are sealed and not part of the library API, but this one would be in the library API which kinds of puts it at the same level as FromBits/IntoBits which are not proposed in the RFC. Does that make sense?

TheIronBorn · 2018-06-27T19:47:01Z

Could there be some way to seal the Rotate trait?

Otherwise I agree. Rotates probably aren't used that much anyway.

TheIronBorn · 2018-06-27T21:05:13Z

Making progress. The only failures now are a couple shift implementations and left/right swapping.

gnzlbg · 2018-06-28T07:42:49Z

Could there be some way to seal the Rotate trait?

I am not sure. IIUC sealed traits that have to be imported by users don't work because they can't be imported. That is, if you have an implementation detail that's not exposed to users, you can make it a sealed trait. If you want to express a where bound to restrict an implementation, you can also make it a sealed trait. But if the user needs to call a method of the trait, then the user needs to import it, and the trait has to be public.

One could try to make the rotate_left/rotate_right inherent methods of vectors, but make them generic. We could then use a sealed trait to constrain it to vectors of the same type, or an scalar value. The sealed trait could then have a e.g. to_vector() method that for vectors does nothing and for scalars does an appropriate splat:

impl f32x4 {
    fn rotate_right<T: RotateArg<Self>>(self, x: T) -> Self {
        let x: f32x4 = x.to_vec(); 
        ... x and self are f32x4, so implement for vectors as usual ...
    }
}
trait RotateArg<T> {
    fn to_vec(self) -> T;
}
impl RotateArg<f32x4> for f32x4 {
    fn to_vec(self) -> f32x4 { self } 
}
impl RotateArg<f32x4> for f32 {
   fn to_vec(self) -> f32x4 { f32x4::splat(self) }
}

But note that we could extend a non-generic rotate_left/rotate_right to allow taking scalars using such a trait later on in a backwards compatible way, so I'd prefer to get the non-generic rotates merged first as inherent methods for vectors only, and figure out how to make these more ergonomic for scalars in a different PR / opening an issue to ask around for ideas etc.

TheIronBorn · 2018-06-29T17:09:51Z

Is there some way to do any(assert_instr(vprolq), assert_instr(vprorq))?

gnzlbg · 2018-06-29T18:33:05Z

@TheIronBorn

Is there some way to do any(assert_instr(vprolq), assert_instr(vprorq))?

Easily only using #[cfg_attr(...), assert_instr(...)] which means that you can only have them for different archs, target features, etc.

For what exactly do you need it? (IIRC the assert_instr finds an intrinsic that starts with a pattern, so vpro might do the trick of matching both vprolq and vprorq).

TheIronBorn · 2018-06-29T18:37:54Z

Some compilations turn a left rotate into a right. Not sure why

gnzlbg · 2018-06-29T18:43:11Z

That's weird. As long as the tests pass that might just be llvm optimizations at work. If you tell me which one exactly I can take a look.

TheIronBorn · 2018-06-29T19:46:37Z

Here's one https://travis-ci.org/rust-lang-nursery/stdsimd/jobs/398348597

This reverts commit be0b786.

TheIronBorn · 2018-06-30T00:57:40Z

Nearly there. Just a few still doing shift + shift + or.

TheIronBorn · 2018-07-06T23:17:10Z

I've been trying to get codegen correct and found some weird behavior where just by clearing the cache and recompiling I get different results. https://godbolt.org/g/oJM7Tw
I'll report this to godbolt.

~~At the very least this makes progress on this difficult, and possibly even makes it impossible to guarantee codegen correctness (if not a godbolt error).~~

Nvm I was misusing godbolt

TheIronBorn · 2018-07-06T23:43:58Z

I think I've reached the end of my understanding of why these are still producing bad code

gnzlbg · 2018-07-07T12:48:09Z

I see you are only enabling the avx512f target feature, could you try enabling avx512vl as well?

https://www.felixcloutier.com/x86/VPROLD:VPROLVD:VPROLQ:VPROLVQ.html mentions both feature flags for the instructions that do not take immediate mode arguments, and the run-time ones seem to be the only ones failing.

The intel intrinsics guide agrees that both features are required: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=vprolvd&expand=4395,4405

TheIronBorn · 2018-07-07T23:18:08Z

That didn't seem to have any change (I may have implemented it incorrectly though)

gnzlbg · 2018-07-09T08:48:53Z

crates/coresimd/tests/rotate_tests.rs

+#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "avx512f"))]
+#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "avx512vl"))]
+#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), assert_instr(vpro))]
+unsafe fn rotate_right_variable(x: u64x8) -> u64x8 {


Could you add tests here for 128-bit wide and 256-bit wide vectors as well?

gnzlbg · 2018-07-09T09:00:39Z

Looking at the implementation of _mm_rolv_epi32 in clang, the only target feature attribute it has is avx512vl. It is implemented in term of __builtin_ia32_prolvd128; the similar builtins are all next to it there and all require avx512vl.

The LLVM-IR intrinsic that these should generate is llvm.x86.avx512.prolv, there are a couple of tests in the LLVM sources about it:

So what you can do is emit the LLVM-IR that is being generated for the variable length rotates (e.g. using RUSTFLAGS=--emit=llvm-ir, cargo llvm-ir ..., etc.) and see how it differs from the IR in those tests.

One weird thing is that in clang these require avx512vl, but with llc trunk it seems that they only require the avx512f attribute. I would still stick with the Intel docs and require both attributes though.

Note also that if instead of llc trunk you use llc 5.0 in that example, the machine code emitted is pretty bad, so it might well be that this only works on LLVM7 (the way to test this would be to use the llc shipped with Rust; sadly godbolt doesn't ship llc 6.0).

While those examples use i32s instead of i64, they produce good machine code too for i64 when the appropriate (q) intrinsic is used: https://godbolt.org/g/bF3DBt

So it seems that somewhere down the chain we are making a mistake that prevents LLVM from using rotates here, we just need to find out exactly where :D

TheIronBorn · 2018-07-09T23:17:05Z

I think the Intel docs say avx512vl is only required for vectors < 512-bits wide

gnzlbg · 2018-07-16T15:02:13Z

@TheIronBorn sorry about the organisational issues, but would you mind sending this PR to https://github.com/gnzlbg/ppv ? The ppsv module will be removed from core in the near future, and that library will probably be the replacement.

I'll leave this open here until we remove the ppsv module, and before removal will try to port this PR myself if you don't beat me to it.

gnzlbg · 2018-07-18T23:54:02Z

The ppsv module has been removed and this has already been merged upstream.

@TheIronBorn I'll try to propagate the assert_instr tests upstream tomorrow.

add rotate_left/right

1271a3f

gnzlbg self-requested a review June 25, 2018 09:29

TheIronBorn force-pushed the master branch from f02e949 to 160ecb4 Compare June 25, 2018 22:54

TheIronBorn mentioned this pull request Jun 25, 2018

Are rotate_left/right/swap_bytes methods possible? #419

Closed

TheIronBorn force-pushed the master branch 3 times, most recently from 26bf899 to e14710b Compare June 26, 2018 00:38

gnzlbg reviewed Jun 26, 2018

View reviewed changes

coresimd/ppsv/api/rotates.rs Outdated

}

}

impl super::api::Rotate<$elem_ty> for $id {

Copy link

Contributor

gnzlbg Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I see why you need the trait now.

gnzlbg reviewed Jun 26, 2018

View reviewed changes

TheIronBorn force-pushed the master branch from e14710b to dfdb356 Compare June 27, 2018 19:51

TheIronBorn force-pushed the master branch from dfdb356 to d252959 Compare June 27, 2018 21:06

rotate instr tests

a71eb28

TheIronBorn force-pushed the master branch from d252959 to a71eb28 Compare June 27, 2018 21:08

better match intel pseudocode

be0b786

TheIronBorn added 2 commits June 29, 2018 16:09

Revert "better match intel pseudocode"

32021c8

This reverts commit be0b786.

try instr prefix

86b1f48

always inline rotates

31823d7

TheIronBorn force-pushed the master branch from 9759c78 to 31823d7 Compare July 7, 2018 01:31

try avx512vl

cc6c887

TheIronBorn force-pushed the master branch from c46d848 to cc6c887 Compare July 7, 2018 18:47

gnzlbg reviewed Jul 9, 2018

View reviewed changes

gnzlbg closed this Jul 18, 2018

gnzlbg mentioned this pull request Aug 10, 2018

Verify code generation of the AVX-512 rotates rust-lang/packed_simd#74

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

add rotate_left/right #496

add rotate_left/right #496

TheIronBorn commented Jun 24, 2018 •

edited

Loading

gnzlbg commented Jun 24, 2018

TheIronBorn commented Jun 25, 2018

TheIronBorn commented Jun 26, 2018 •

edited

Loading

gnzlbg Jun 26, 2018 •

edited

Loading

gnzlbg Jun 26, 2018

gnzlbg Jun 26, 2018 •

edited

Loading

gnzlbg left a comment •

edited

Loading

TheIronBorn commented Jun 27, 2018

TheIronBorn commented Jun 27, 2018

gnzlbg commented Jun 28, 2018 •

edited

Loading

TheIronBorn commented Jun 29, 2018

gnzlbg commented Jun 29, 2018 •

edited

Loading

TheIronBorn commented Jun 29, 2018

gnzlbg commented Jun 29, 2018

TheIronBorn commented Jun 29, 2018

TheIronBorn commented Jun 30, 2018

TheIronBorn commented Jul 6, 2018 •

edited

Loading

TheIronBorn commented Jul 6, 2018

gnzlbg commented Jul 7, 2018 •

edited

Loading

TheIronBorn commented Jul 7, 2018 •

edited

Loading

gnzlbg Jul 9, 2018

gnzlbg commented Jul 9, 2018 •

edited

Loading

TheIronBorn commented Jul 9, 2018

gnzlbg commented Jul 16, 2018

gnzlbg commented Jul 18, 2018

add rotate_left/right #496

add rotate_left/right #496

Conversation

TheIronBorn commented Jun 24, 2018 • edited Loading

gnzlbg commented Jun 24, 2018

TheIronBorn commented Jun 25, 2018

TheIronBorn commented Jun 26, 2018 • edited Loading

gnzlbg Jun 26, 2018 • edited Loading

Choose a reason for hiding this comment

gnzlbg Jun 26, 2018

Choose a reason for hiding this comment

gnzlbg Jun 26, 2018 • edited Loading

Choose a reason for hiding this comment

gnzlbg left a comment • edited Loading

Choose a reason for hiding this comment

TheIronBorn commented Jun 27, 2018

TheIronBorn commented Jun 27, 2018

gnzlbg commented Jun 28, 2018 • edited Loading

TheIronBorn commented Jun 29, 2018

gnzlbg commented Jun 29, 2018 • edited Loading

TheIronBorn commented Jun 29, 2018

gnzlbg commented Jun 29, 2018

TheIronBorn commented Jun 29, 2018

TheIronBorn commented Jun 30, 2018

TheIronBorn commented Jul 6, 2018 • edited Loading

TheIronBorn commented Jul 6, 2018

gnzlbg commented Jul 7, 2018 • edited Loading

TheIronBorn commented Jul 7, 2018 • edited Loading

gnzlbg Jul 9, 2018

Choose a reason for hiding this comment

gnzlbg commented Jul 9, 2018 • edited Loading

TheIronBorn commented Jul 9, 2018

gnzlbg commented Jul 16, 2018

gnzlbg commented Jul 18, 2018

TheIronBorn commented Jun 24, 2018 •

edited

Loading

TheIronBorn commented Jun 26, 2018 •

edited

Loading

gnzlbg Jun 26, 2018 •

edited

Loading

gnzlbg Jun 26, 2018 •

edited

Loading

gnzlbg left a comment •

edited

Loading

gnzlbg commented Jun 28, 2018 •

edited

Loading

gnzlbg commented Jun 29, 2018 •

edited

Loading

TheIronBorn commented Jul 6, 2018 •

edited

Loading

gnzlbg commented Jul 7, 2018 •

edited

Loading

TheIronBorn commented Jul 7, 2018 •

edited

Loading

gnzlbg commented Jul 9, 2018 •

edited

Loading