Would this repo be open to a SIMD implementation of ChaCha20? #667

oconnor663 · 2018-12-16T18:53:40Z

There's probably a factor of 5-6x to be had here. I could potentially port in some experimental code I have in https://github.com/oconnor663/chacha20_simd. What do people think?

burdges · 2018-12-16T19:09:47Z

Just curious: Are the optimizations in https://github.com/cesarb/chacha20-poly1305-aead/ any different?

oconnor663 · 2018-12-16T22:54:26Z

I think there's a substantial difference. I'm not familiar with that code, but by the looks of it it's operating on u32x4 vectors, which would correspond to 128-bit SSE instructions. The 256-bit AVX instruction set would provide about double the throughput, give or take.

TheIronBorn · 2018-12-16T23:03:17Z

Looks like chacha20-poly1305-aead expects at least portions of the code to compile to AVX2 instructions: https://github.com/cesarb/chacha20-poly1305-aead/blob/c10fa44e38c54006df7e1fc376158fdfd5befe03/src/simd_opt/u32x4.rs#L38

TheIronBorn · 2018-12-16T23:05:44Z

They also apply pshufb optimizations which likely speed up the rotates quite a bit.

sneves · 2018-12-17T11:15:43Z

When I added the ChaCha generator a lifetime ago, having compatible SIMD implementations was precisely the goal. At the time, however, there were no intrinsics or any good way to write it in Rust, so it seems to have stayed with the reference "one-star performance" implementation ever since.

In my experience writing C++ <random>-compatible engines, vectorized ChaCha8 with a modest ~256- (SSE2+) or ~512-byte (AVX2) state is as fast as PCG, and faster in some applications where you're not just generating random numbers for benchmarking. If the code in question uses integer multiplications, PCG cannibalizes the single multiplication port to a degree that makes ChaCha8, XorShift, etc significantly faster in those use cases.

You can also add forward security (which for some reason is listed as a feature on the documentation) to ChaCha8 very easily, at virtually no cost in performance, but this loses compatibility and jump-ahead capability.

dhardy · 2018-12-17T13:03:12Z

Yes, @oconnor663, very much. Replacing Hc128 in StdRng with a fast, forward secure variant of ChaCha makes a lot of sense (ChaCha12 possibly; I'm not sure which variant is the most appropriate choice but ChaCha8 is very close to the minimum complexity not proven insecure).

PCG is another thing altogether; it really depends on whether you want a secure batched RNG or a minimal-memory-usage RNG. As such we shouldn't talk about replacing PCG with ChaCha in SmallRng.

vks · 2018-12-17T14:07:40Z

As such we shouldn't talk about replacing PCG with ChaCha in SmallRng.

We could consider removing SmallRng. ChaCha is only 8.5 times as large. Using smaller RNGs might be a very niche case.

dhardy · 2018-12-18T11:08:54Z

I guess we could, but that's off-topic here and a quick search shows that SmallRng is already being used.

tarcieri · 2019-01-04T14:52:04Z

FYI, I've been meaning to push a core::simd impl of ChaCha20 as part of https://github.com/RustCrypto/stream-ciphers but have not yet had the time to complete it

vks · 2019-02-04T12:23:54Z

There is another SIMD implementation: https://github.com/cryptocorrosion/cryptocorrosion/tree/master/stream-ciphers/chacha

dhardy · 2019-04-19T11:30:26Z

@kazcw would you be open to adding rand_core support into your ChaCha implementation? Use of the existing block-rng code should in theory be enough to get us good performance here, though of course it needs testing (also #779 will result in a few changes).

Alternatively we could update the rand_chacha crate with the subset of the code relevant to us.

kazcw · 2019-04-19T18:30:07Z

@dhardy Sure, I'll try to get to it this weekend

tarcieri · 2019-04-27T18:57:47Z

There's an open PR to add ChaCha20 support to RustCrypto, which could be published as the chacha20 crate.

Personally I think it'd be swell if people could combine efforts on a single, high-quality, well-maintained ChaCha20 implementation all Rust users could leverage.

Performance: gen_bytes_chacha20: 254 MB/s -> 603 MB/s [on a Xeon L5630 (SSE4.1)] Minor version bump: the only breaking change is that no-std builds now require default-features=false (std is required by default for runtime cpu detection; no_std builds will use the best implementation supported by the target-features/target-cpu enabled at compile time) New functionality: ChaChaXRng is parameterized by round count at compile time. Convenient aliases for the typical 20/12/8 round implementations exposed. ChaChaRng is aliased to ChaCha20Rng for backward compatibility. Closes rust-random#667

dhardy mentioned this issue Jan 28, 2019

Tracker: Rand 0.7 #715

Closed

22 tasks

kazcw mentioned this issue Feb 3, 2019

ChaCha cryptocorrosion/cryptocorrosion#1

Merged

vks mentioned this issue Apr 8, 2019

Future of SmallRng #767

Closed

kazcw mentioned this issue Apr 19, 2019

ChaCha: rand_core support cryptocorrosion/cryptocorrosion#14

Closed

kazcw mentioned this issue Apr 30, 2019

use c2-chacha for stable, runtime-dispatched SIMD #789

Merged

dhardy closed this as completed in #789 May 7, 2019

dhardy mentioned this issue Aug 28, 2019

rand_chacha and reducing dependencies #872

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Would this repo be open to a SIMD implementation of ChaCha20? #667

Would this repo be open to a SIMD implementation of ChaCha20? #667

oconnor663 commented Dec 16, 2018

burdges commented Dec 16, 2018

oconnor663 commented Dec 16, 2018 •

edited

Loading

TheIronBorn commented Dec 16, 2018

TheIronBorn commented Dec 16, 2018

sneves commented Dec 17, 2018 •

edited

Loading

dhardy commented Dec 17, 2018 •

edited

Loading

vks commented Dec 17, 2018

dhardy commented Dec 18, 2018

tarcieri commented Jan 4, 2019

vks commented Feb 4, 2019

dhardy commented Apr 19, 2019

kazcw commented Apr 19, 2019

tarcieri commented Apr 27, 2019

Would this repo be open to a SIMD implementation of ChaCha20? #667

Would this repo be open to a SIMD implementation of ChaCha20? #667

Comments

oconnor663 commented Dec 16, 2018

burdges commented Dec 16, 2018

oconnor663 commented Dec 16, 2018 • edited Loading

TheIronBorn commented Dec 16, 2018

TheIronBorn commented Dec 16, 2018

sneves commented Dec 17, 2018 • edited Loading

dhardy commented Dec 17, 2018 • edited Loading

vks commented Dec 17, 2018

dhardy commented Dec 18, 2018

tarcieri commented Jan 4, 2019

vks commented Feb 4, 2019

dhardy commented Apr 19, 2019

kazcw commented Apr 19, 2019

tarcieri commented Apr 27, 2019

oconnor663 commented Dec 16, 2018 •

edited

Loading

sneves commented Dec 17, 2018 •

edited

Loading

dhardy commented Dec 17, 2018 •

edited

Loading