Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Depend on packed_simd #565

Closed
wants to merge 1 commit into from
Closed

Conversation

pitdicker
Copy link
Contributor

@pitdicker pitdicker commented Jul 22, 2018

The CI is broken again... rust-lang/rust#52535 landed, and removes std::simd in favor of packed_simd. It doesn't yet have a release on crates.io, but that probably doesn't take long.

I ran into three issues:

  • The shift operations don't take arbitrary integers, but only u32 as argument — easy clean-up on our side.
  • There are no m1x* types anymore, but I suppose that was a typo in our code.
  • There is no From implementation to cast integers to floats of the same size. I hope https://github.com/gnzlbg/packed_simd/pull/31 is acceptable, or that there will be some other solution.

So this PR doesn't build yet, but I'll make it anyway to show what's going on.

@pitdicker
Copy link
Contributor Author

or that there will be some other solution.

The other solution is to use .trunc() 😄.

@pitdicker
Copy link
Contributor Author

AppVeyor fails because of a network error.

@TheIronBorn
Copy link
Collaborator

TheIronBorn commented Jul 22, 2018

m1x was a weird naming quirk for 512 bit vectors

@sicking
Copy link
Contributor

sicking commented Jul 22, 2018

Yeah, that was the type that was return from .le() and friends. I was curious about it, but it worked so I didn't worry too much about it. Glad the names have been fixed though.

@gnzlbg
Copy link

gnzlbg commented Jul 22, 2018

The shift operations don't take arbitrary integers, but only u32 as argument — easy clean-up on our side.

This was per the RFC. If you need shift operations with other types, let me know why, and I'll try to push for them. I considered them a convenience, but I couldn't come up with any reasons beyond this for them. If more people found them convenient, maybe we can get them back.

There are no m1x* types anymore, but I suppose that was a typo in our code.

So, the m1x* types might come back, or not, we don't really know yet. The reason they have been renamed is that the m1x types were "improperly implemented": m1x64 should be 64-bit wide, but it was, 8 * 64 = 512 bits wide because we need to add better supports for these types inside rustc... So at least the new names reflect the state of things...

However, 512-bit wide vector types are not in the RFC, and you should treat them as "super unstable". I wanted to put them behind a feature flag in packed_simd, so that one has to opt-in to them, but we ran out-of-time...

My recommendation with respect of the m1x types would be to use a type alias, to minimize breakage if we change their name in the future. If I get to put things behind a feature flag, I'll submit a PR here before publishing so that you don't have to go through this breakage again. I am sorry that it happened.

Also, I expect to publish packed_simd on crates.io as soon as we move it into the nursery (probably tomorrow), so that you can publish rand versions that depend on it afterwards. There are some issues that we have to workaround a bit on packed_simd, since right-now it needs to recompile some parts of std for some targets depending on target features, and the way it is currently done is a bit hacky :/

Also, if there are any features that you need in packed_simd and aren't implemented there yet, please open issues and I'll look into them.

@sicking
Copy link
Contributor

sicking commented Jul 22, 2018

Regarding the m1x types, and the mask types in general, the only "unusual" requirement that we have, I think, is this code. That code does a bit-wise cast of a f32x*/f64x* to a u32x*/u64x* and then uses a mask to subtract 1 from the lanes where the mask is true. Finally it converts the result back to f32x*/f64x*.

We'll also use things like mask.select(...), mask.any/all/none().

These are at least the requirements that I'm aware of.

@pitdicker
Copy link
Contributor Author

If you need shift operations with other types, let me know why, and I'll try to push for them.

Just u32 seems fine to me.

So, the m1x* types might come back, or not, we don't really know yet. The reason they have been renamed is that the m1x types were "improperly implemented": m1x64 should be 64-bit wide, but it was, 8 * 64 = 512 bits wide because we need to add better supports for these types inside rustc... So at least the new names reflect the state of things...

I have to admit I know next to nothing about simd, so this explains my confusion 😄. Good change.

However, 512-bit wide vector types are not in the RFC, and you should treat them as "super unstable".

Would you recommend we remove our 'support' for them for now (it is not much more work than changing a couple of macro calls)?

Also, I expect to publish packed_simd on crates.io as soon as we move it into the nursery (probably tomorrow), so that you can publish rand versions that depend on it afterwards.

Thank you! Don't let this issue hurry you, I just happened to have some time today to investigate.

@gnzlbg
Copy link

gnzlbg commented Jul 22, 2018

@sicking

Regarding the m1x types, and the mask types in general, the only "unusual" requirement that we have, I think, is this code. That code does a bit-wise cast of a f32x*/f64x* to a u32x*/u64x* and then uses a mask to subtract 1 from the lanes where the mask is true. Finally it converts the result back to f32x*/f64x*.

Interesting. May I ask why is a mask being used here? Or put differently, if the code needs to add 1 to some lanes, why isn't it using u32xN/u64xN vectors instead, with some lanes set to 1 and the rest set to 0? That way it wouldn't need a bitwise cast. Tangentially related, the behavior of bitwise casts is endian dependent, so depending on where the mask is coming from, I don't know if this will behave as it is intended on all platforms, or if it might do something weird on big-endian platforms (e.g. adding 1 to the wrong lanes).

We'll also use things like mask.select(...), mask.any/all/none().

So all the vertical vector comparisons return masks of an appropriate size, and these all can be used with select, have reductions, etc. Independently of which names and sizes we end up giving to the 512-bit wide masks, all these things will still need to work properly. So I'd say that you can count on this not changing modulo we don't know when, if ever, we are going to stabilize the 512-bit wide vector types.

@pitdicker

Would you recommend we remove our 'support' for them for now (it is not much more work than changing a couple of macro calls)?

It appears to me that rand has some real and cool use cases for 512-bit wide vectors and operations on them like the AVX-512 rotates, so I think you should try to use them. Some people are using the 512-bit wide vectors and are really happy with the state of things, so YMMV depending on what exactly are you doing. Personally, I consider them "alpha"-quality, and if you decide to use them, I recommend that at some point, you check that the assembly you are getting is what you expect.

If it isn't, you can always use the non-portable AVX-512 intrinsics in {core,std}::arch to temporarily work around it, and if you fill a bug, we can fix it.

@sicking
Copy link
Contributor

sicking commented Jul 23, 2018

Interesting. May I ask why is a mask being used here? Or put differently, if the code needs to add 1 to some lanes, why isn't it using u32xN/u64xN vectors instead, with some lanes set to 1 and the rest set to 0? That way it wouldn't need a bitwise cast.

Here's the code that calls the decrease_masked function:

 loop {
     let mask = (scale * max_rand + low).ge_mask(high);
     if mask.none() {
         break;
     }
     scale = scale.decrease_masked(mask);
}

(Actual code here)

where decrease_masked is the function which decreases the masked lanes by 1.

I.e. the set of lanes that we want to decrease is not constant, but rather calculated at runtime.

I guess we could do something like let offset = mask.select(u32x4::splat(1), u32x4::splat(0)) and then subtract offset, but the casting seemed faster.

Tangentially related, the behavior of bitwise casts is endian dependent, so depending on where the mask is coming from, I don't know if this will behave as it is intended on all platforms, or if it might do something weird on big-endian platforms (e.g. adding 1 to the wrong lanes).

Casting a mask seems to produce a value where all bits are set, or all bits are clear. So I don't think endianness would be a problem. Of course, if casting behavior could vary between platforms, that would indeed be a problem.

@gnzlbg
Copy link

gnzlbg commented Jul 23, 2018

Gotcha, that makes sense. So casting is probably faster than the select here,

Of course, if casting behavior could vary between platforms, that would indeed be a problem.

This is the case in general (e.g. see the tests in https://github.com/gnzlbg/packed_simd/blob/master/tests/endianness.rs#L133), but for masks this should not matter because all bytes within a lane are either all set or all cleared, so their value won't change with the order.

@dhardy
Copy link
Member

dhardy commented Jul 25, 2018

@alexcrichton can you re-start the AppVeyor builds please? The service seems to be user-centric rather than project-centric so I don't have authorisation to.

@alexcrichton
Copy link
Contributor

Ah sorry now it says "OK Pull request #565 is non-mergeable."

I've been meaning to switch this though to rust-lang-libs, I can try to do that soon!

@alexcrichton
Copy link
Contributor

(or you can set it up under your own user if you'd like)

@dhardy
Copy link
Member

dhardy commented Jul 26, 2018

@pitdicker rebase please

@dhardy dhardy closed this Jul 27, 2018
@pitdicker pitdicker deleted the packed_simd branch July 27, 2018 17:15
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants