switch to std::simd, expand SIMD & docs #1239

TheIronBorn · 2022-07-07T17:58:48Z

Closes #1232. Should also fix #1162.

The new API is pretty easy and the new trait system made for some convenient generic implementations.

Changes:

moved __m128i & __m256i away from simd_support feature
added __m512i and some internal AVX512 optimizations
dropped u8x2/i8x2 types due to lack of support in std::simd
added SIMD types (and NonZero) to the list in Standard type documentation
implemented Distribution<maskNxM> for Standard, behaves like rng.gen::<bool>
~~implemented Distribution<mask64xM> for Bernoulli, each lane uses the same prob~~

The trait system allows impls to adapt to future SIMD types and doesn't clutter the rendered documentation. Using it in more places would mean some code duplication though.

__m512i requires nightly Rust's stdsimd feature at the moment so I put it under the simd_support feature but nightly might make more sense.

TheIronBorn · 2022-07-07T18:04:43Z

Since std::simd uses the same LLVM API as packed_simd I doubt this will change benchmarks at all but I'll try to run some later.

TheIronBorn · 2022-07-07T18:09:49Z

Turns out we don't have any SIMD benchmarks

dhardy

Good riddance, fragile packed_simd! Lets hope std SIMD is better.

This isn't a full review. Thanks @TheIronBorn.

benches/misc.rs

src/distributions/bernoulli.rs

src/distributions/integer.rs

TheIronBorn · 2022-07-08T19:30:21Z

I'm agreed that introducing seemingly unpredictable type inference is bad.

An extra trait or something would also make documentation easier, we could use #[cfg_attr(doc_cfg, doc(cfg(feature = "simd_support")))] and other stuff as well.

The Bernoulli stuff should perhaps wait for #1227

move __m128i to stable, expand documentation, add SIMD to Bernoulli, add maskNxM, add __m512i genericize simd uniform int remove some debug stuff remove bernoulli foo foo

TheIronBorn · 2022-07-09T18:05:54Z

Removed the Bernoulli stuff and squashed some commits.

Ran some benchmarks for 128/256 bit vectors and they're mostly the same except for small optimizations packed_simd had but std::simd isn't prioritizing. Maybe LLVM might optimize it better in the future.

src/distributions/integer.rs

dhardy · 2022-07-11T07:41:50Z

src/distributions/other.rs

+    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> Mask<T, LANES> {
+        rng.gen().lanes_lt(Simd::default())
+    }


Doesn't this compute to false in all lanes?

And, regarding the proceeding doc, ... doesn't this make the Mark type inappropriate if normally only one bit from each lane is used? Ugh, guess we can't fix that.

BTW one can assume that a random boolean is true with 50% probability, but should one actually assume anything about what a random Mask is?

Maybe instead we should provide a higher level API around select etc... though given that SIMD is a little niche, maybe just a few recipes in the book would be enough.

MaskElement must be a signed integer so we have some form of iNxM.lt(0)

There are other SIMD operations which use all the bits of a mask, like an SSE2 select (x & m) | (y & !m), and std::simd doesn't specify layout or value representation so we have to do this.

Correct, we can't assume anything about Mask value representation but I don't know what users might need that for. Is there a specific usecase example you're thinking of?

No specific use-case; I was just going by what you mentioned (select). Probably this is unnecessarily to get into however.

TheIronBorn · 2022-07-11T18:32:06Z

Realized that the nightly feature is documented to only provide optimizations, not extra API stuff so putting __m512 under simd_support probably makes the most sense

dhardy · 2022-07-11T19:54:01Z

Realized that the nightly feature is documented to only provide optimizations, not extra API stuff so putting __m512 under simd_support probably makes the most sense

Any flags depending on nightly are unstable so we don't have to worry too much. I would guess if other SIMD parts get stabilized in libstd first, then __m512 would eventually end up under another feature flag.. or simd_support would end up only adding the non-stabilized SIMD features.

TheIronBorn · 2022-08-04T20:55:29Z

Anything left to review here?

TheIronBorn · 2022-08-04T21:30:24Z

Oh there's a weird new failure. I don't understand why it seems to be enabling simd_support with just nightly

dhardy · 2022-08-05T08:40:42Z

src/distributions/integer.rs

-#[cfg(feature = "simd_support")]
-macro_rules! simd_impl {
-    ($(($intrinsic:ident, $vec:ty),)+) => {$(
+macro_rules! intrinsic_impl {


Since this is an x86 only impl, maybe the name should reflect that?

dhardy · 2022-08-05T09:07:51Z

src/distributions/other.rs

+/// Note that on some hardware like x86/64 mask operations like [`_mm_blendv_epi8`]
+/// only care about a single bit. This means that you could use uniform random bits
+/// directly:
+///
+/// ```ignore
+/// // this may be faster...
+/// let x = unsafe { _mm_blendv_epi8(a.into(), b.into(), rng.gen::<__m128i>()) };
+///
+/// // ...than this
+/// let x = rng.gen::<mask8x16>().select(b, a);
+/// ```
+///
+/// Since most bits are unused you could also generate only as many bits as you need.
+///
+/// [`_mm_blendv_epi8`]: https://www.intel.com/content/www/us/en/docs/intrinsics-guide/index.html#text=_mm_blendv_epi8&ig_expand=514/
+/// [`simd_support`]: https://github.com/rust-random/rand#crate-features
+#[cfg(feature = "simd_support")]
+impl<T, const LANES: usize> Distribution<Mask<T, LANES>> for Standard
+where
+    T: MaskElement + PartialOrd + SimdElement<Mask = T> + Default,
+    LaneCount<LANES>: SupportedLaneCount,
+    Standard: Distribution<Simd<T, LANES>>,
+{
+    #[inline]
+    fn sample<R: Rng + ?Sized>(&self, rng: &mut R) -> Mask<T, LANES> {
+        // `MaskElement` must be a signed integer, so this is equivalent
+        // to the scalar `i32 < 0` method
+        rng.gen().lanes_lt(Simd::default())
+    }
+}


From what I understand this is correct, but not necessarily the most efficient: only the high bit of each lane generated by rng.gen() is used. I'm not sure if this is actually worth worrying about however.

I'm also just a little surprised it type-checks: it relies on only one type supporting lanes_lt(..) -> Mask<T, LANES>. Better to clarify with rng.gen::<T>().lates_lt(T::default())?

The inefficiency can actually be worse in scalar. mask8x16 wastes 7 bits per lane while bool wastes 31, mask64x2 though wastes 63. I'll look into more efficient methods but might just point readers to it in the docs

dhardy

But the above comments aren't blockers, so I'll also approve this PR.

TheIronBorn · 2022-08-06T05:51:02Z

Ah, the tests are failing because the stdsimd API changed in a recent nightly. Shame, I hoped stdsimd would be more stable, though I suppose it's a different kind of instability.

We could add some compilation conditions or wait until the changes reach stable.

dhardy · 2022-08-06T07:17:00Z

I'm sure you know more than I do about the state of std::simd so I'll let you decide on that. Our status quo is also broken.

TheIronBorn · 2022-08-06T07:21:18Z

we could simply merge the packed_simd fix while we wait

dhardy · 2022-08-06T07:46:27Z

Can do. The only thing against is that your migrations here will likely end up stale. Any idea on the time frame for stability or how much churn is likely to std::simd?

TheIronBorn · 2022-08-08T00:21:40Z

I don't know what I was thinking earlier but since std::simd is only available on nightly we can just fix and merge right now

TheIronBorn · 2022-08-08T00:54:18Z

Unfortunately there's no roadmap for stdsimd but there is likely to be at least a little churn. Though we aren't using too many features so I think we won't be affected much.
And at least it's more likely to be API instability than forcing us to rollback as often as packed_simd

TheIronBorn · 2022-08-08T04:59:03Z

The only remaining failure is crossbeam

dhardy reviewed Jul 8, 2022

View reviewed changes

benches/misc.rs Outdated Show resolved Hide resolved

src/distributions/bernoulli.rs Outdated Show resolved Hide resolved

src/distributions/integer.rs Outdated Show resolved Hide resolved

TheIronBorn force-pushed the std-simd branch 2 times, most recently from acd5020 to d41a948 Compare July 9, 2022 18:02

switch to std::simd, expand SIMD stuff & docs

599d7f8

move __m128i to stable, expand documentation, add SIMD to Bernoulli, add maskNxM, add __m512i genericize simd uniform int remove some debug stuff remove bernoulli foo foo

TheIronBorn force-pushed the std-simd branch from d41a948 to 599d7f8 Compare July 9, 2022 18:03

dhardy reviewed Jul 11, 2022

View reviewed changes

fix simd ints, clarify mask behavior

d4b8748

dhardy mentioned this pull request Aug 4, 2022

Bump MSRV to 1.51.0 #1246

Merged

fix doc link

f89f15f

dhardy reviewed Aug 5, 2022

View reviewed changes

dhardy approved these changes Aug 5, 2022

View reviewed changes

TheIronBorn added 2 commits August 7, 2022 17:22

fix stdsimd, add mask opt notes

2fab15d

fix doc test

949d70f

dhardy approved these changes Aug 8, 2022

View reviewed changes

TheIronBorn merged commit 2c16a92 into rust-random:master Aug 10, 2022

dhardy mentioned this pull request Nov 9, 2022

Update packed_simd_2 to 0.3.8 #1238

Closed

dhardy mentioned this pull request Dec 7, 2022

Make Uniform constructors return a result #1229

Merged

atouchet mentioned this pull request Apr 3, 2024

Switch back to packed_simd dependency thomwiggers/lpn#39

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

switch to std::simd, expand SIMD & docs #1239

switch to std::simd, expand SIMD & docs #1239

TheIronBorn commented Jul 7, 2022 •

edited

Loading

TheIronBorn commented Jul 7, 2022

TheIronBorn commented Jul 7, 2022

dhardy left a comment

TheIronBorn commented Jul 8, 2022

TheIronBorn commented Jul 9, 2022

dhardy Jul 11, 2022

TheIronBorn Jul 11, 2022

dhardy Jul 11, 2022

TheIronBorn commented Jul 11, 2022

dhardy commented Jul 11, 2022

TheIronBorn commented Aug 4, 2022

TheIronBorn commented Aug 4, 2022

dhardy Aug 5, 2022

TheIronBorn Aug 6, 2022

dhardy Aug 5, 2022

TheIronBorn Aug 6, 2022

dhardy left a comment

TheIronBorn commented Aug 6, 2022

dhardy commented Aug 6, 2022

TheIronBorn commented Aug 6, 2022

dhardy commented Aug 6, 2022

TheIronBorn commented Aug 8, 2022

TheIronBorn commented Aug 8, 2022

TheIronBorn commented Aug 8, 2022

switch to std::simd, expand SIMD & docs #1239

switch to std::simd, expand SIMD & docs #1239

Conversation

TheIronBorn commented Jul 7, 2022 • edited Loading

TheIronBorn commented Jul 7, 2022

TheIronBorn commented Jul 7, 2022

dhardy left a comment

Choose a reason for hiding this comment

TheIronBorn commented Jul 8, 2022

TheIronBorn commented Jul 9, 2022

dhardy Jul 11, 2022

Choose a reason for hiding this comment

TheIronBorn Jul 11, 2022

Choose a reason for hiding this comment

dhardy Jul 11, 2022

Choose a reason for hiding this comment

TheIronBorn commented Jul 11, 2022

dhardy commented Jul 11, 2022

TheIronBorn commented Aug 4, 2022

TheIronBorn commented Aug 4, 2022

dhardy Aug 5, 2022

Choose a reason for hiding this comment

TheIronBorn Aug 6, 2022

Choose a reason for hiding this comment

dhardy Aug 5, 2022

Choose a reason for hiding this comment

TheIronBorn Aug 6, 2022

Choose a reason for hiding this comment

dhardy left a comment

Choose a reason for hiding this comment

TheIronBorn commented Aug 6, 2022

dhardy commented Aug 6, 2022

TheIronBorn commented Aug 6, 2022

dhardy commented Aug 6, 2022

TheIronBorn commented Aug 8, 2022

TheIronBorn commented Aug 8, 2022

TheIronBorn commented Aug 8, 2022

TheIronBorn commented Jul 7, 2022 •

edited

Loading