Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

add rotate_left/right #496

Closed
wants to merge 7 commits into from
Closed

Conversation

TheIronBorn
Copy link
Contributor

@TheIronBorn TheIronBorn commented Jun 24, 2018

(also removed some duplicate tests)

It would be good to test that on supported hardware this compiles to the correct instruction, but I don't know how to do that.

Also some scalar convenience impls would be good.

(rust-random/rand#377)

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 24, 2018

It would be good to test that on supported hardware this compiles to the correct instruction, but I don't know how to do that.

It is not as easy as for the intrinsics but it is possible. You can create a test in the crates/coresimd/tests directory, and import the assert_instr crate. Then you can create an opaque function that calls the portable intrinsics, and use the assert_instr macro on that to check the generated code. You will need to gate it on the architecture and cpu features using cfg_attr.

For examples, all intrinsics in the x86 and armdirectories ofcoresimduseassert_instr`.

Let me know if you run into issues while giving it a try.

@gnzlbg gnzlbg self-requested a review June 25, 2018 09:29
@TheIronBorn
Copy link
Contributor Author

Any concerns about organization etc?

@TheIronBorn
Copy link
Contributor Author

TheIronBorn commented Jun 26, 2018

Huh, some of these are being compiled from rotate_right to vprolq. Not sure how to make that pass the tests.

Also not sure what's happening with the other failures

#![allow(unused)]

/// Trait used for overloading rotates
pub trait Rotate<T> {
Copy link
Contributor

@gnzlbg gnzlbg Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

~~Given that you provide a specific implementation for each vector type below, this doesn't really need to be a trait.

In general we try to keep the traits we use to a minimum, and most of them are sealed (not exposed in the public API of the library).~~ EDIT: it's ok, see below.

}
}

impl super::api::Rotate<$elem_ty> for $id {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I see why you need the trait now.


#[inline]
#[target_feature(enable = "avx512f")]
#[cfg_attr(test, assert_instr(vprorvq))]
Copy link
Contributor

@gnzlbg gnzlbg Jun 26, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The assert_instr and target_feature need to be guarded on x86 and x86_64. Same for all other functions below:

  • #[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = ...))]
  • #[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), assert_instr(....))]

should be enough.

Copy link
Contributor

@gnzlbg gnzlbg left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So this looks good.

The only thing that worries me is about adding a trait called Rotate to core, which is a very general name and also begs the question why don't the integer types implement it.

I think it would be less controversial to just implement this for vectors as methods, without the trait, and require people to use splat. You can then post in the RFC rust-lang/rfcs#2366 about adding a trait for this, and see what people's consensus about this is.

So... I am not fundamentally opposed to adding the trait. In fact I think using the trait makes rotate nicer to use, but there are organizational concerns about exposing traits in the std library that really must make them be worth the effort. For things like shuffles and select, there really aren't any good alternatives to using traits, but for rotates, this is a bit less the case. Also, for select and shuffle those traits are sealed and not part of the library API, but this one would be in the library API which kinds of puts it at the same level as FromBits/IntoBits which are not proposed in the RFC. Does that make sense?

@TheIronBorn
Copy link
Contributor Author

Could there be some way to seal the Rotate trait?

Otherwise I agree. Rotates probably aren't used that much anyway.

@TheIronBorn
Copy link
Contributor Author

Making progress. The only failures now are a couple shift implementations and left/right swapping.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 28, 2018

Could there be some way to seal the Rotate trait?

I am not sure. IIUC sealed traits that have to be imported by users don't work because they can't be imported. That is, if you have an implementation detail that's not exposed to users, you can make it a sealed trait. If you want to express a where bound to restrict an implementation, you can also make it a sealed trait. But if the user needs to call a method of the trait, then the user needs to import it, and the trait has to be public.

One could try to make the rotate_left/rotate_right inherent methods of vectors, but make them generic. We could then use a sealed trait to constrain it to vectors of the same type, or an scalar value. The sealed trait could then have a e.g. to_vector() method that for vectors does nothing and for scalars does an appropriate splat:

impl f32x4 {
    fn rotate_right<T: RotateArg<Self>>(self, x: T) -> Self {
        let x: f32x4 = x.to_vec(); 
        ... x and self are f32x4, so implement for vectors as usual ...
    }
}
trait RotateArg<T> {
    fn to_vec(self) -> T;
}
impl RotateArg<f32x4> for f32x4 {
    fn to_vec(self) -> f32x4 { self } 
}
impl RotateArg<f32x4> for f32 {
   fn to_vec(self) -> f32x4 { f32x4::splat(self) }
}

But note that we could extend a non-generic rotate_left/rotate_right to allow taking scalars using such a trait later on in a backwards compatible way, so I'd prefer to get the non-generic rotates merged first as inherent methods for vectors only, and figure out how to make these more ergonomic for scalars in a different PR / opening an issue to ask around for ideas etc.

@TheIronBorn
Copy link
Contributor Author

Is there some way to do any(assert_instr(vprolq), assert_instr(vprorq))?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 29, 2018

@TheIronBorn

Is there some way to do any(assert_instr(vprolq), assert_instr(vprorq))?

Easily only using #[cfg_attr(...), assert_instr(...)] which means that you can only have them for different archs, target features, etc.

For what exactly do you need it? (IIRC the assert_instr finds an intrinsic that starts with a pattern, so vpro might do the trick of matching both vprolq and vprorq).

@TheIronBorn
Copy link
Contributor Author

Some compilations turn a left rotate into a right. Not sure why

@gnzlbg
Copy link
Contributor

gnzlbg commented Jun 29, 2018

That's weird. As long as the tests pass that might just be llvm optimizations at work. If you tell me which one exactly I can take a look.

@TheIronBorn
Copy link
Contributor Author

@TheIronBorn
Copy link
Contributor Author

Nearly there. Just a few still doing shift + shift + or.

@TheIronBorn
Copy link
Contributor Author

TheIronBorn commented Jul 6, 2018

I've been trying to get codegen correct and found some weird behavior where just by clearing the cache and recompiling I get different results. https://godbolt.org/g/oJM7Tw
I'll report this to godbolt.

At the very least this makes progress on this difficult, and possibly even makes it impossible to guarantee codegen correctness (if not a godbolt error).

Nvm I was misusing godbolt

@TheIronBorn
Copy link
Contributor Author

I think I've reached the end of my understanding of why these are still producing bad code

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 7, 2018

I see you are only enabling the avx512f target feature, could you try enabling avx512vl as well?

https://www.felixcloutier.com/x86/VPROLD:VPROLVD:VPROLQ:VPROLVQ.html mentions both feature flags for the instructions that do not take immediate mode arguments, and the run-time ones seem to be the only ones failing.

The intel intrinsics guide agrees that both features are required: https://software.intel.com/sites/landingpage/IntrinsicsGuide/#text=vprolvd&expand=4395,4405

@TheIronBorn
Copy link
Contributor Author

TheIronBorn commented Jul 7, 2018

That didn't seem to have any change (I may have implemented it incorrectly though)

#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "avx512f"))]
#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), target_feature(enable = "avx512vl"))]
#[cfg_attr(any(target_arch = "x86", target_arch = "x86_64"), assert_instr(vpro))]
unsafe fn rotate_right_variable(x: u64x8) -> u64x8 {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add tests here for 128-bit wide and 256-bit wide vectors as well?

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 9, 2018

Looking at the implementation of _mm_rolv_epi32 in clang, the only target feature attribute it has is avx512vl. It is implemented in term of __builtin_ia32_prolvd128; the similar builtins are all next to it there and all require avx512vl.

The LLVM-IR intrinsic that these should generate is llvm.x86.avx512.prolv, there are a couple of tests in the LLVM sources about it:

So what you can do is emit the LLVM-IR that is being generated for the variable length rotates (e.g. using RUSTFLAGS=--emit=llvm-ir, cargo llvm-ir ..., etc.) and see how it differs from the IR in those tests.

One weird thing is that in clang these require avx512vl, but with llc trunk it seems that they only require the avx512f attribute. I would still stick with the Intel docs and require both attributes though.

Note also that if instead of llc trunk you use llc 5.0 in that example, the machine code emitted is pretty bad, so it might well be that this only works on LLVM7 (the way to test this would be to use the llc shipped with Rust; sadly godbolt doesn't ship llc 6.0).

While those examples use i32s instead of i64, they produce good machine code too for i64 when the appropriate (q) intrinsic is used: https://godbolt.org/g/bF3DBt

So it seems that somewhere down the chain we are making a mistake that prevents LLVM from using rotates here, we just need to find out exactly where :D

@TheIronBorn
Copy link
Contributor Author

I think the Intel docs say avx512vl is only required for vectors < 512-bits wide

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 16, 2018

@TheIronBorn sorry about the organisational issues, but would you mind sending this PR to https://github.com/gnzlbg/ppv ? The ppsv module will be removed from core in the near future, and that library will probably be the replacement.

I'll leave this open here until we remove the ppsv module, and before removal will try to port this PR myself if you don't beat me to it.

@gnzlbg
Copy link
Contributor

gnzlbg commented Jul 18, 2018

The ppsv module has been removed and this has already been merged upstream.

@TheIronBorn I'll try to propagate the assert_instr tests upstream tomorrow.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants