Remove zerocopy from rand #1579

dhardy · 2025-02-06T12:20:22Z

Added a CHANGELOG.md entry

Summary

Replace zerocopy dependency with unsafe code (up from 12 to 17 instances).

Add benchmarks for some SIMD / wide types.

Remove two #[inline(never)] attributes which were apparently motivated by benchmark results, but caused more harm than help with the new benches.

Motivation

Make the dependency on zerocopy optional #1574: zerocopy is "a big crate with a huge amount of unsafe code"
I've also seen some chatter about compile time increase in rand v0.9 due to now depending on two versions of zerocopy

I'm not a big fan of this, but together with #1575 it removes the dependency on zerocopy v0.8, so is probably an improvement.

Project Safe Transmute

If this project lands safe transmute support into the standard library, we would of course want to use that.

Details

Replacing zerocopy::transmute! with core::mem::transmute is easy and results in identical code generation (tested with StdRng and SmallRng); this reverts a change in #1349.

Replacing the fill impls is more complex but I believe acceptable; this reverts a change in #1502.

In both cases, this would have resulted in a usage of unsafe in a macro where safety depends on a type passed by the macro caller. In the first case I decided to inline the three macro usages while in the second I prefixed the macro name with unsafe_.

Benchmark results

$ cargo bench --bench simd --features simd_support -- --baseline master 
   Compiling rand v0.9.0 (/home/dhardy/projects/rand/rand)
   Compiling rand_distr v0.5.0 (/home/dhardy/projects/rand/rand/rand_distr)
   Compiling benches v0.1.0 (/home/dhardy/projects/rand/rand/benches)
    Finished `bench` profile [optimized] target(s) in 1.38s
     Running benches/simd.rs (target/release/deps/simd-2905efe84e67fa8e)
random_simd/u128        time:   [1.8751 ns 1.8831 ns 1.8948 ns]
                        change: [-0.1321% +0.6261% +1.4131%] (p = 0.12 > 0.05)
                        No change in performance detected.
Found 15 outliers among 100 measurements (15.00%)
  5 (5.00%) high mild
  10 (10.00%) high severe
random_simd/m128i       time:   [1.9753 ns 1.9790 ns 1.9833 ns]
                        change: [+5.4631% +5.6551% +5.8561%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 14 outliers among 100 measurements (14.00%)
  1 (1.00%) high mild
  13 (13.00%) high severe
random_simd/m256i       time:   [3.7588 ns 3.7755 ns 3.7931 ns]
                        change: [-0.0698% +0.3828% +0.7685%] (p = 0.07 > 0.05)
                        No change in performance detected.
Found 17 outliers among 100 measurements (17.00%)
  4 (4.00%) high mild
  13 (13.00%) high severe
random_simd/m512i       time:   [6.8739 ns 6.8901 ns 6.9097 ns]
                        change: [+0.1511% +0.3741% +0.6309%] (p = 0.00 < 0.05)
                        Change within noise threshold.
Found 11 outliers among 100 measurements (11.00%)
  2 (2.00%) high mild
  9 (9.00%) high severe
random_simd/u64x2       time:   [1.9767 ns 1.9817 ns 1.9875 ns]
                        change: [-72.129% -72.012% -71.890%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 13 outliers among 100 measurements (13.00%)
  3 (3.00%) high mild
  10 (10.00%) high severe
random_simd/u32x4       time:   [3.9506 ns 3.9572 ns 3.9651 ns]
                        change: [-50.352% -50.035% -49.827%] (p = 0.00 < 0.05)
                        Performance has improved.
Found 11 outliers among 100 measurements (11.00%)
  1 (1.00%) low severe
  1 (1.00%) high mild
  9 (9.00%) high severe
random_simd/u32x8       time:   [3.7498 ns 3.7598 ns 3.7717 ns]
                        change: [-0.0915% +0.3002% +0.8262%] (p = 0.20 > 0.05)
                        No change in performance detected.
Found 16 outliers among 100 measurements (16.00%)
  5 (5.00%) high mild
  11 (11.00%) high severe
random_simd/u16x8       time:   [3.7647 ns 3.7792 ns 3.7953 ns]
                        change: [-0.0710% +0.6785% +1.3454%] (p = 0.06 > 0.05)
                        No change in performance detected.
Found 10 outliers among 100 measurements (10.00%)
  5 (5.00%) high mild
  5 (5.00%) high severe
random_simd/u8x16       time:   [3.7806 ns 3.7950 ns 3.8118 ns]
                        change: [+1.1070% +1.5527% +2.1092%] (p = 0.00 < 0.05)
                        Performance has regressed.
Found 9 outliers among 100 measurements (9.00%)
  1 (1.00%) high mild
  8 (8.00%) high severe

Unfinished business?

The Simd and m128i etc. type generation should be equivalent, but they're not in terms of code; the Simd impls currently use fill to avoid more unsafe code here.

Notice from the above that u32x4, u16x8 and u8x16 are the same size as u128 and m128i but cost about twice as much to generate here. This indicates the fill code may be sub-optimal.

Additionally, the m128i impl performed even worse when transmuting a u128 value (~4.3ns or +%130) which, as far as I can tell, is purely because the u128 value is returned via rax, rdx while the __m128i value is returned via rdx, r10 (with rax equal to the struct address). I don't understand this.

Results show that some Simd types are 2-4 times as expensive as expected

Results in few minor regressions and two large improvements in benchmarks: -72% time for u64x2, -50% for u32x4.

Code gen is identical and benchmarks unaffected.

…_parts_mut Mostly code gen appears equivalent, though it affects inlining of u32x4 gen with SmallRng. Benchmarks are not significantly affected.

mitsuhiko · 2025-02-06T22:40:07Z

Replacing zerocopy::transmute! with core::mem::transmute is easy and results in identical code generation (tested with StdRng and SmallRng); this reverts a change in #1349.

For those cases where you just call zerocopy::transmute! you could still use zerocopy in CI. You could declare an optional dependency to zerocopy and have a macro that switches between the zerocopy transmute for CI and tests and the stdlib one. That way you do get the verification in CI that zerocopy enables.

I have been proposing this for ahash: tkaitchuck/aHash#253

I'm not sure if this is a great idea, but it's I think a compromise that has some value.

joshlf · 2025-02-12T00:54:32Z

If this project lands safe transmute support into the standard library, we would of course want to use that.

I should clarify that Project Safe Transmute will likely never replace zerocopy/bytemuck, but just replace their derives (zerocopy-derive and bytemuck-derive). Some very limited functionality may exist directly in the standard library, but we think of Safe Transmute as mostly being a building block that makes it easier to write sound unsafe code, not a building block that permits you to avoid writing unsafe code entirely. I suspect this doesn't change the calculus here, but I figured it was worth mentioning.

dhardy added 6 commits February 6, 2025 10:17

Add simd benchmark

8fab522

Results show that some Simd types are 2-4 times as expensive as expected

Remove #[inline(never)] statements on Fill::fill

4ccd0c0

Results in few minor regressions and two large improvements in benchmarks: -72% time for u64x2, -50% for u32x4.

Replace zerocopy::transmute! with unsafe transmute

8ec4cf4

Code gen is identical and benchmarks unaffected.

Replace zerocopy::IntoBytes::as_mut_bytes with unsafe slice::from_raw…

0d27d3f

…_parts_mut Mostly code gen appears equivalent, though it affects inlining of u32x4 gen with SmallRng. Benchmarks are not significantly affected.

Remove zerocopy dependency

80b8d95

CHANGELOG

b81c644

dhardy requested a review from josephlr February 6, 2025 12:20

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Remove zerocopy from rand #1579

Remove zerocopy from rand #1579

dhardy commented Feb 6, 2025

mitsuhiko commented Feb 6, 2025

joshlf commented Feb 12, 2025

Remove zerocopy from rand #1579

Are you sure you want to change the base?

Remove zerocopy from rand #1579

Conversation

dhardy commented Feb 6, 2025

Summary

Motivation

Project Safe Transmute

Details

Benchmark results

Unfinished business?

mitsuhiko commented Feb 6, 2025

joshlf commented Feb 12, 2025