-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replacing rand_chacha with chacha20 #934
Comments
You'll need |
Regarding Probably the most questionable use of unsafe is here: https://github.com/RustCrypto/stream-ciphers/blob/master/chacha20/src/rng.rs#L74 It'd be good to evaluate various safe ways of replacing that. Really it's just a same-size type conversion which ideally could be done with a type cast or using |
I get the same results with |
I just benchmarked the These are on a 2.3 GHz Intel Core i9, benchmarked with
|
I've made some significant improvements to |
This issue is now 32 months old! Is it worth revisiting?
If we don't do this, possibly we should instead support |
@dhardy the v0.9 release made I'd be happy to add the RNG types back and make the |
The reason that never happened so far is because (1) there is not (to me) a clear motivation for switching and (2) this crate hasn't seen a lot of maintenance activity recently. I'm not aware of any problem with A motivation against is the larger number of crates in the dependency tree; something many would not consider an issue but a few people have brought it up in the past.
9 for |
Yeah, we could make the |
The main motivation is to unify code bases, so improvements in |
We should ensure @kazcw has the chance to comment here too.
Since the libraries don't have a huge scope and are basically done already, is this actually motivation?
Is this common? My POV is still: there is little real motivation either way. But if enough other people feel strongly about this, it will likely persuade me. |
@dhardy I wouldn’t consider any of these libraries “done” per se. AFAIK none of them have e.g. ARM NEON support and only provide SIMD backends on x86. Once |
@nstilt1 I'm not sure that's relevant to this issue, which is about retiring |
I saw this issue when I was looking for issues regarding the lack of support for
I was going to attempt the refactor process with help from AutoGPT, but then I realized that |
@nstilt1 thanks for providing a rationale for this change. Would someone like to open a PR? |
I am working on it, but I've found a slight difference in implementation within chacha20. To maintain full backwards compatibility of functions/function signatures of ChaChaXRng, the new implementation of chacha20 will need a 64-bit counter instead of 32 bits. It seems like the SIMD backend would need some relatively minor adjustments to make that possible. For most use cases of an RNG, people probably won't need to generate 1 ZiB of data from a seed + stream_id in ChaChaXRng. The output limit with a 32-bit counter is about 256 GiB. My current use case for ChaChaXRng requires significantly less than this limit. Is it okay to keep it at a 32-bit counter, or do we want to make it 64 bits? The stream_id would then need to be 96 bits if the counter was kept at 32 bits. Optionally, there could be a feature that uses a 32-bit counter instead of 64, but... the backend would still need to be adapted to use a 64 bit counter. |
The 32-bit vs 64-bit counters are one of the main differences between the IETF flavor of ChaCha20 (which the We definitely want the IETF variant to use a 32-bit counter. If someone is really interested in changing I would personally suggest always using the IETF construction everywhere possible so as to phase out the legacy construction (and really you should probably rekey before you hit 256 GiB of keystream anyway) |
In my opinion, minor API breaking changes are fine. I agree that most people won't need to generate more than 256 GiB. Feature-flags are probably not a good solution here. |
I went ahead and benched the chacha20 implementation and c2-chacha. Here are the results.
With neon on an M1:
I got these comparisons by running a bench with rand_chacha first, then chacha20 in the next run... not sure if there's a more automated way to compare them like this without manually running it twice. The source is available in https://github.com/nstilt1/stream-ciphers-rng/tree/master/benches |
Do we want to add Rngs for XChaCha and ChaCha Legacy as well? |
I don't think so. The objective should be to provide strong RNGs which remain value-compatible with the previously-supported versions. |
Yeah, the other variants are mostly-the-same enough it doesn't provide much value adding them as alternatives. Nice performance numbers there! |
How unsafe is this: |
Very nice performance numbers indeed, this gets us close to the throughput of the reference implementation, which is about 0.7 cycles / byte with AVX2. It's also nice to see that chacha20 is faster than c2-chacha, this was not always the case. |
Do we need those
The word you want is contiguous. But do we need |
The |
What are you thinking of? |
For internal use you can use a private method rather than a public trait |
I'm not quite sure what you are referring to. I saw that I used I also saw that I went ahead and added a Seed wrapper that impls |
I saw that issue, and I agree that it could be beneficial to have I've got a branch of
If If everybody is okay with that functionality and it were to make it to release, then it could be detrimental to call
With a My reasoning behind the alternative inputs is for a specific use-case. Many cryptographic operations output bytes... particularly Key Derivation Functions (KDFs) and HKDFs. Using alternative byte inputs removes some conversions when trying to set some values arbitrarily. |
Should |
This is the type of use-case |
I think I have done most of what I can with the There are a few things that might be unnecessary, such as the There's also the |
Allows the crate's AVX2 / NEON implementations to be used as `rand_core`-compatible RNGs. See also: rust-random/rand#934
I support switching the underlying implementation from
|
The current implementation using |
@tarcieri has recently implemented ChaCha*Rng via the
chacha20
crate. docs are here. Breaking out into a new issue this isn't really the topic of #872...This brings the question: should we prefer this over the current (@kazcw's) implementation?
First things first, the
rand_chacha
crate has 44 reverse dependencies, is clearly a rand family crate and is recommended in our book. There is no plan to retire this crate.Second, lets look at a few stats.
Unsafe
Here are the results of running
cargo geiger
on the currentrand_core
(after #931):And on
chacha20
:Something's wrong here:
stream-chiper
is only a dev-dependency andrand_core
is depended on in exactly the same way as inrand_chacha
. So only the first line is relevant.Lines of code
Tokei output for
rand_chacha
:(83% of this is from
ppv-lite86
).For
chacha20
:There are no dependencies we need, so that's it. A nice improvement.
Benchmarks
(64-bit Haswell)
Here's
rand_chacha
:and here's
chacha20
:Hmm, looks like
chacha20
needs some help:Closer, but still behind
rand_chacha
(which gets negligible boost fromtarget-cpu=native
thanks to auto-detection).Running
chacha20
's built-in benchmarks, I get around 6-6.2 cycles/byte withouttarget-cpu=native
, and 2.5-2.7 with; this is significantly short of the 1.4 cycles/byte @tarcieri claims so something gives here (perhaps just CPU-specific optimisations).Of course, there's more to this than a few stats, and number-of-
unsafe
-usages is not a particularly useful comparison (since it says nothing about the size of theunsafe
blocks). This is all I have time for right now. Thanks to all authors (also significantly @newpavlov).The text was updated successfully, but these errors were encountered: