Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Key erasure AES variant #299

Open
Tracked by #1432
vks opened this issue Mar 13, 2018 · 12 comments
Open
Tracked by #1432

Key erasure AES variant #299

vks opened this issue Mar 13, 2018 · 12 comments
Labels
F-new-ext Functionality: new, outside of Rand P-low Priority: Low

Comments

@vks
Copy link
Collaborator

vks commented Mar 13, 2018

I ported an implementation of a fast-key-erasure RNG suggested by djb to Rust: https://github.com/vks/aesrng

It is about two times faster than non-SIMD xoroshiro when generating 100 MiB of random data. Only fill_bytes is implemented for now.

Once stdsimd is stabilized, this might be of interest for Rand.

@dhardy
Copy link
Member

dhardy commented Mar 14, 2018

Your point is that it's possible to produce very fast cryptographic generators for bulk output by using CPU-specific features? Cool. But I don't much like comparing apples to pineapples.

I suggest this would make a good external crate, but not be such a good inclusion with Rand itself since:

  • The speed comes from using SIMD instructions to produce large blocks of data. Many users of RNGs only want small amounts of random data at a time, so this is not a good general purpose RNG.
  • Fast key erasure is likely mainly of interest to specific users with very high security requirements? It appears a "sensible precaution" but one with high cost and little expected benefit; for most users isn't use of ReseedingRng going to be sufficient?
  • CPU-specific instructions mean your code only works on modern x86_64 CPUs; Rand is designed to be portable.
  • Low-level instructions are hard to review, so it would be better in its own crate so that individual projects can choose whether or not to use it IMO.

To be fair, we should probably consider SIMD-optimised generators at some point, though I don't know whether it would be worth making them the default generators (for many applications where randomness is consumed in small quantities the performance enhancements may not be significant).

@dhardy dhardy added F-new-int Functionality: new, within Rand P-low Priority: Low F-new-ext Functionality: new, outside of Rand and removed F-new-int Functionality: new, within Rand labels Mar 14, 2018
@nagisa
Copy link
Contributor

nagisa commented Mar 14, 2018

To be fair, we should probably consider SIMD-optimised generators at some point, though I don't know whether it would be worth making them the default generators (for many applications where randomness is consumed in small quantities the performance enhancements may not be significant).

Could always pick a SIMD specialisation on when user calls a fill_buffer with a larger buffer, and otherwise use the non-SIMD implementation with less throughput but possibly better latency characteristics.

@dhardy
Copy link
Member

dhardy commented Mar 14, 2018

Yes, but SIMD probably makes more sense with buffer-backed RNGs like the current cryptographic ones (generate a block at once).

@vks
Copy link
Collaborator Author

vks commented Mar 19, 2018

@dhardy

The speed comes from using SIMD instructions to produce large blocks of data. Many users of RNGs only want small amounts of random data at a time, so this is not a good general purpose RNG.

Why? Our current StdRng is buffered as well, via BlockRng. If I use a similar buffer (128 bytes), I get slightly worse performance than StdRng on master (5.1 ns instead of 3.55 ns per u64). Initialization is a lot cheaper (31 ns instead of 7.6 us) and the key is erased every 128 bytes.

Fast key erasure is likely mainly of interest to specific users with very high security requirements? It appears a "sensible precaution" but one with high cost and little expected benefit;

It gives forward secrecy. This is a nice property to have, because it means that i.e. compromising a server does not compromise previously generated secrets (if they are not stored in some other form). In that regard it is similar to ReseedingRng, but more reliable. I'm not sure what you mean with high cost? The total cost is comparable to StdRng. Removing key erasure improves next_u64 performance by about 40 % (3.08 ns, making it faster than StdRng).

CPU-specific instructions mean your code only works on modern x86_64 CPUs; Rand is designed to be portable.

Rand can still be portable by falling back to something else on other platforms.

Low-level instructions are hard to review, so it would be better in its own crate so that individual projects can choose whether or not to use it IMO.

That is a fair point, it contains a bit of unsafe code. I hope this can be improved by using a safe abstraction like faster and maybe aesni.

@pitdicker
Copy link
Contributor

@vks Nice project!

I tried to read up a bit on the design of this RNG, but I am not there yet 😄. Is my understanding right?

The basic idea is that common RNGs for cryptography keep some of their generated values in memory, and this memory can be leaked (swapping to disk, inspecting process memory, etc.).
This RNG by design has to overwrite a generated value when making the next one, so it is secure by default.
It is very fast because it uses a hardware implementation of AES, and because it does not follow the clunky implementation requirements of NIST.

I sure think it is neat. It may be good to wait a while before really using it to give it time to prove itself. And also good to take lessons from. One thing I had planned for the BlockRng wrapper was to erase used values.

But I see no reason why this RNG has to be in Rand, at least not right now. We are moving towards no, or as few as possible, RNG implementations, not many.

@vks
Copy link
Collaborator Author

vks commented Mar 21, 2018

This is certainly not something for the near future (the algorithm, the implementation and SIMD in Rust are quite immature). I was just wondering whether it might be a better default in the long run, and the point of this issue is to discuss the pros and cons.

Is my understanding right?

Yes, I think it is.

@pitdicker pitdicker changed the title Faster RNG for filling buffers Key erasure AES variant May 8, 2018
@pitdicker
Copy link
Contributor

Renamed the issue from "Faster RNG for filling buffers" to "Key erasure AES variant", as this is more about a new CSPRNG than about filling buffers.

@pitdicker
Copy link
Contributor

This article offers an improvement over using AES in counter mode, in that it offers backtracking resistance at lower cost or a better API.

If at some point the internal state of the common variant is exposed, all previously generated values can be recovered. The NIST document has some requirements that the common variant should overwrite its state when the user is 'done with it', but that is not easy for an API.

This variant will, after generating a buffer with a couple of blocks of values, also use one block to generate a new internal state (or seed). This overwrites the old key, making it impossible to recover previous rounds.

@dhardy
Copy link
Member

dhardy commented May 15, 2018

Another fast backtracking-resistant RNG: https://github.com/google/randen

@vks
Copy link
Collaborator Author

vks commented Jun 25, 2018

Note that Randen provides a Rust implementation in the official repository.

@dhardy
Copy link
Member

dhardy commented Oct 5, 2018

The Randen paper has been published: https://arxiv.org/abs/1810.02227

@dhardy
Copy link
Member

dhardy commented Sep 13, 2019

We shouldn't forget about Randen as a possible alternative to ChaCha; it is fast, fairly simple code and has backtracking resistance. Right now I see two issues:

  1. Limited portability: it relies on a CPU-native AES instruction for performance, thus we should keep an alternative available for some platforms (like we do with HC-128 now).
  2. Trust: while the code looks reasonably straightforward and is over a year old, the algorithm is still fairly new in academic terms, apparently with little review.

This would provide an alternative to #872, for some platforms.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
F-new-ext Functionality: new, outside of Rand P-low Priority: Low
Projects
None yet
Development

No branches or pull requests

4 participants