Depend on packed_simd #565

pitdicker · 2018-07-22T13:11:23Z

The CI is broken again... rust-lang/rust#52535 landed, and removes std::simd in favor of packed_simd. It doesn't yet have a release on crates.io, but that probably doesn't take long.

I ran into three issues:

The shift operations don't take arbitrary integers, but only u32 as argument — easy clean-up on our side.
There are no m1x* types anymore, but I suppose that was a typo in our code.
There is no From implementation to cast integers to floats of the same size. I hope https://github.com/gnzlbg/packed_simd/pull/31 is acceptable, or that there will be some other solution.

So this PR doesn't build yet, but I'll make it anyway to show what's going on.

pitdicker · 2018-07-22T14:33:39Z

or that there will be some other solution.

The other solution is to use .trunc() 😄.

pitdicker · 2018-07-22T14:55:16Z

AppVeyor fails because of a network error.

TheIronBorn · 2018-07-22T16:29:55Z

m1x was a weird naming quirk for 512 bit vectors

sicking · 2018-07-22T17:31:48Z

Yeah, that was the type that was return from .le() and friends. I was curious about it, but it worked so I didn't worry too much about it. Glad the names have been fixed though.

gnzlbg · 2018-07-22T18:38:03Z

The shift operations don't take arbitrary integers, but only u32 as argument — easy clean-up on our side.

This was per the RFC. If you need shift operations with other types, let me know why, and I'll try to push for them. I considered them a convenience, but I couldn't come up with any reasons beyond this for them. If more people found them convenient, maybe we can get them back.

There are no m1x* types anymore, but I suppose that was a typo in our code.

So, the m1x* types might come back, or not, we don't really know yet. The reason they have been renamed is that the m1x types were "improperly implemented": m1x64 should be 64-bit wide, but it was, 8 * 64 = 512 bits wide because we need to add better supports for these types inside rustc... So at least the new names reflect the state of things...

However, 512-bit wide vector types are not in the RFC, and you should treat them as "super unstable". I wanted to put them behind a feature flag in packed_simd, so that one has to opt-in to them, but we ran out-of-time...

My recommendation with respect of the m1x types would be to use a type alias, to minimize breakage if we change their name in the future. If I get to put things behind a feature flag, I'll submit a PR here before publishing so that you don't have to go through this breakage again. I am sorry that it happened.

Also, I expect to publish packed_simd on crates.io as soon as we move it into the nursery (probably tomorrow), so that you can publish rand versions that depend on it afterwards. There are some issues that we have to workaround a bit on packed_simd, since right-now it needs to recompile some parts of std for some targets depending on target features, and the way it is currently done is a bit hacky :/

Also, if there are any features that you need in packed_simd and aren't implemented there yet, please open issues and I'll look into them.

sicking · 2018-07-22T19:33:19Z

Regarding the m1x types, and the mask types in general, the only "unusual" requirement that we have, I think, is this code. That code does a bit-wise cast of a f32x*/f64x* to a u32x*/u64x* and then uses a mask to subtract 1 from the lanes where the mask is true. Finally it converts the result back to f32x*/f64x*.

We'll also use things like mask.select(...), mask.any/all/none().

These are at least the requirements that I'm aware of.

pitdicker · 2018-07-22T19:54:03Z

If you need shift operations with other types, let me know why, and I'll try to push for them.

Just u32 seems fine to me.

So, the m1x* types might come back, or not, we don't really know yet. The reason they have been renamed is that the m1x types were "improperly implemented": m1x64 should be 64-bit wide, but it was, 8 * 64 = 512 bits wide because we need to add better supports for these types inside rustc... So at least the new names reflect the state of things...

I have to admit I know next to nothing about simd, so this explains my confusion 😄. Good change.

However, 512-bit wide vector types are not in the RFC, and you should treat them as "super unstable".

Would you recommend we remove our 'support' for them for now (it is not much more work than changing a couple of macro calls)?

Also, I expect to publish packed_simd on crates.io as soon as we move it into the nursery (probably tomorrow), so that you can publish rand versions that depend on it afterwards.

Thank you! Don't let this issue hurry you, I just happened to have some time today to investigate.

gnzlbg · 2018-07-22T20:44:49Z

@sicking

Regarding the m1x types, and the mask types in general, the only "unusual" requirement that we have, I think, is this code. That code does a bit-wise cast of a f32x*/f64x* to a u32x*/u64x* and then uses a mask to subtract 1 from the lanes where the mask is true. Finally it converts the result back to f32x*/f64x*.

Interesting. May I ask why is a mask being used here? Or put differently, if the code needs to add 1 to some lanes, why isn't it using u32xN/u64xN vectors instead, with some lanes set to 1 and the rest set to 0? That way it wouldn't need a bitwise cast. Tangentially related, the behavior of bitwise casts is endian dependent, so depending on where the mask is coming from, I don't know if this will behave as it is intended on all platforms, or if it might do something weird on big-endian platforms (e.g. adding 1 to the wrong lanes).

We'll also use things like mask.select(...), mask.any/all/none().

So all the vertical vector comparisons return masks of an appropriate size, and these all can be used with select, have reductions, etc. Independently of which names and sizes we end up giving to the 512-bit wide masks, all these things will still need to work properly. So I'd say that you can count on this not changing modulo we don't know when, if ever, we are going to stabilize the 512-bit wide vector types.

@pitdicker

Would you recommend we remove our 'support' for them for now (it is not much more work than changing a couple of macro calls)?

It appears to me that rand has some real and cool use cases for 512-bit wide vectors and operations on them like the AVX-512 rotates, so I think you should try to use them. Some people are using the 512-bit wide vectors and are really happy with the state of things, so YMMV depending on what exactly are you doing. Personally, I consider them "alpha"-quality, and if you decide to use them, I recommend that at some point, you check that the assembly you are getting is what you expect.

If it isn't, you can always use the non-portable AVX-512 intrinsics in {core,std}::arch to temporarily work around it, and if you fill a bug, we can fix it.

sicking · 2018-07-23T06:51:59Z

Interesting. May I ask why is a mask being used here? Or put differently, if the code needs to add 1 to some lanes, why isn't it using u32xN/u64xN vectors instead, with some lanes set to 1 and the rest set to 0? That way it wouldn't need a bitwise cast.

Here's the code that calls the decrease_masked function:

 loop {
     let mask = (scale * max_rand + low).ge_mask(high);
     if mask.none() {
         break;
     }
     scale = scale.decrease_masked(mask);
}

(Actual code here)

where decrease_masked is the function which decreases the masked lanes by 1.

I.e. the set of lanes that we want to decrease is not constant, but rather calculated at runtime.

I guess we could do something like let offset = mask.select(u32x4::splat(1), u32x4::splat(0)) and then subtract offset, but the casting seemed faster.

Tangentially related, the behavior of bitwise casts is endian dependent, so depending on where the mask is coming from, I don't know if this will behave as it is intended on all platforms, or if it might do something weird on big-endian platforms (e.g. adding 1 to the wrong lanes).

Casting a mask seems to produce a value where all bits are set, or all bits are clear. So I don't think endianness would be a problem. Of course, if casting behavior could vary between platforms, that would indeed be a problem.

gnzlbg · 2018-07-23T07:57:52Z

Gotcha, that makes sense. So casting is probably faster than the select here,

Of course, if casting behavior could vary between platforms, that would indeed be a problem.

This is the case in general (e.g. see the tests in https://github.com/gnzlbg/packed_simd/blob/master/tests/endianness.rs#L133), but for masks this should not matter because all bytes within a lane are either all set or all cleared, so their value won't change with the order.

dhardy · 2018-07-25T16:34:22Z

@alexcrichton can you re-start the AppVeyor builds please? The service seems to be user-centric rather than project-centric so I don't have authorisation to.

alexcrichton · 2018-07-25T19:50:57Z

Ah sorry now it says "OK Pull request #565 is non-mergeable."

I've been meaning to switch this though to rust-lang-libs, I can try to do that soon!

alexcrichton · 2018-07-25T19:51:07Z

(or you can set it up under your own user if you'd like)

dhardy · 2018-07-26T07:47:02Z

@pitdicker rebase please

pitdicker force-pushed the packed_simd branch from 4571723 to 83c1022 Compare July 22, 2018 13:22

Depend on packed_simd

e16b4cd

pitdicker force-pushed the packed_simd branch from 83c1022 to e16b4cd Compare July 22, 2018 14:32

This was referenced Jul 24, 2018

Move XorShiftRng to its own crate #557

Merged

Implement From<RangeInclusive> for Uniform #566

Merged

dhardy mentioned this pull request Jul 27, 2018

Depend on packed_simd for simd support #569

Merged

dhardy closed this Jul 27, 2018

pitdicker deleted the packed_simd branch July 27, 2018 17:15

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Depend on packed_simd #565

Depend on packed_simd #565

pitdicker commented Jul 22, 2018 •

edited

Loading

pitdicker commented Jul 22, 2018

pitdicker commented Jul 22, 2018

TheIronBorn commented Jul 22, 2018 •

edited

Loading

sicking commented Jul 22, 2018

gnzlbg commented Jul 22, 2018

sicking commented Jul 22, 2018

pitdicker commented Jul 22, 2018

gnzlbg commented Jul 22, 2018

sicking commented Jul 23, 2018 •

edited

Loading

gnzlbg commented Jul 23, 2018

dhardy commented Jul 25, 2018

alexcrichton commented Jul 25, 2018

alexcrichton commented Jul 25, 2018

dhardy commented Jul 26, 2018

Depend on packed_simd #565

Depend on packed_simd #565

Conversation

pitdicker commented Jul 22, 2018 • edited Loading

pitdicker commented Jul 22, 2018

pitdicker commented Jul 22, 2018

TheIronBorn commented Jul 22, 2018 • edited Loading

sicking commented Jul 22, 2018

gnzlbg commented Jul 22, 2018

sicking commented Jul 22, 2018

pitdicker commented Jul 22, 2018

gnzlbg commented Jul 22, 2018

sicking commented Jul 23, 2018 • edited Loading

gnzlbg commented Jul 23, 2018

dhardy commented Jul 25, 2018

alexcrichton commented Jul 25, 2018

alexcrichton commented Jul 25, 2018

dhardy commented Jul 26, 2018

pitdicker commented Jul 22, 2018 •

edited

Loading

TheIronBorn commented Jul 22, 2018 •

edited

Loading

sicking commented Jul 23, 2018 •

edited

Loading