-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ChaCha: support reading counter and setting nonce #374
Conversation
Having both a What should happen with Removing the methods from I am however completely against any messing with a nonce. |
I'm not really concerned about Also, if users are using a nonce, they may want to extract just the counter with |
Meaning you don't think we should add these set-nonce functions? Why? I can't see ChaCha changing in incompatible ways (stability and equivalence across implementations is one of the big advantages of ChaCha). |
I think we should make up our mind on what our implementation of ChaCha is. Is it an RNG or a stream cipher? Or should it be both? Do we want to have anything to do with RNGs in Rand being used as stream ciphers? If we decide it is an RNG (what I assumed until now, and how it is implemented until now) there is no use for a nonce. If we really want to allow using it with a nonce, we should change the implementation to also threat it as such, and not as part of the counter. Even if a high counter is unlikely unless someone starts doing things with |
To nuance my position a bit: I don't care strongly whether we support setting a nonce or not. The thing is I believe it is not correct to have both a 128-bit counter and a nonce. The original implementation of |
Well, okay... I agree, it should primarily be an RNG. But I don't think honesty has much to do with it — effectively the only deviation from the standard cipher is the ability to keep generating results after the first 256 GiB. This just means that if somebody was going to use this to encrypt/decrypt then they should ensure the length is no more than 256 GiB or split the input. There is a small reason to include the nonce code: extra flexibility in reproducing results from other implementations. If you really want we can remove them, but I don't see much point. |
We have the choice between three variants here:
How many bytes can be generated with a 64-bit counter before wrapping? counter * state size = 2^64 * 2^6 = 2^70 bytes. 2^10=kilo, 2^20=mega, 2^30=giga, 2^40=tera, 2^50=peta, 2^60=exa, 2^70=yotta. Seems like a reasonable max. A 32-bit counter on the other hand can be much too small, it would wrap around within 8 minutes on my PC. That equals 2^(32+6)=2^38=250gb of generated values. For a nonce 64 bit may sometimes be a bit too small, because a nonce is usually chosen at random and gets in trouble because of the birthday problem. With ~10 million messages, there is a chance of 1 in a million of having a duplicated nonce. So the IETF variant with a 96-bit seems like a practical choice for encrypting messages, that all should have a unique nonce but are expected to all be smaller than 250gb. We clearly can't make use of the IETF variant because the counter is too small for use as an RNG. A 128-bit counter is useless. But so is a nonce for RNGs. @dhardy You make a good point that choosing a variant with a counter would give us reproducibility with other implementations. Also we currently have the problem that a 128-bit counter is not the easiest to set, because Would it make sense to switch our ChaCha implementation to the original variant by Bernstein? To end having this discussion coming up every other month, have reproducibility with other implementations, and a simple API (for |
I still don't see what's wrong with what is implemented here in this PR, except for the redundancy between 64-bit and 128-bit counter methods. We should have at least 64-bit counter, but as you say this doesn't match up properly when using 96-bit nonce. I don't see how it's a major issue that this impl would simply wrap to the next nonce should more than 256 GiB be generated. |
If you start incrementing the nonce, it is no longer a user-supplied constant but simply part of the counter. The way the ChaCha implementation by Peter Reid does it is by offering both variants of nonce and counter size, and storing the choice. It will only have to be checked very rarely anyway. But whatever we do, it should simply be correct. How do we expect users to have any trust in our ChaCha implementation if we start including hacks like this? Edit: didn't like the sound of 'hacks' in the post, apologies. And if the problems only start after generating 250 GB it might have been somewhat excusable, if we didn't also have the possibility of setting the counter. |
@dhardy What are your thoughts at the moment? I am kind of warming up to the idea of using just Bernsteins original version with a 64-bit counter and a 64-bit nonce. |
I still think we're sacrificing functionality without a good reason. E.g. with 64-bit counter if somebody does |
I can get behind adding more functionality, but not at the cost of correctness. What do you think of the enum approach, where we store the choice between the 3 variants? It would not add any real performance overhead, as it only needs to be checked when incrementing the counter past 32 bits. |
Aside from the nonce stuff, are we agreed on the changes to get/setting the counter? I'm not quite sure myself; since 64-bits is enough for 1 ZiB maybe we shouldn't even bother supporting larger counters. We could instead call the high 64-bits the "stream" for example (which is roughly what the old Or, as you say, we could support multiple counter sizes, although I'm a little against adding complexity which almost never gets used. BTW what should happen when the counter does wrap? The only options (I think) are:
I still prefer the last option to be honest. |
With this I can fully agree (proposed before).
If we go and support multiple incompatible schemes with exactly the same code, we should document clearly how this is possible. Maybe by explaining what happens when the counter 'runs out'. Something like this for
But I still fear that doing something that is theoretically incorrect is going to bite us in the end, so I'd prefer the first option. |
It would be good to get some more feedback on whether this is even useful; also the variable rounds stuff. I put everything except the basic get/set counter functionality behind an "experimental" feature gate so we can ship the feature but make it clear that it might go away or change later. It's not the nicest approach, but does let us make a release without worrying about this too much! |
I would really like to see a release, and yes, this ChaCha stuff is holding things up. But it seems like we are so close to at least deciding on the nonce and counter part, can't we finish it? Do you have a problem with a 64-bit nonce and 64-bit counter? |
I'm slightly disconnected from this conversation now, but the important point is that the default is (a) secure and (b) meshes well with standards. I'll point out that XChaCha increased both the counter and nonce by incurring some startup cost, which suggests that a 64 bit counter can create security issues. Is there a standardization issue for ChaChaRng with a 128 bit counter but no nonce? I think ChaCha counters increment the nonce when they roll over anyways, right? If so, then Apologies for not paying closer attention lately. Anyways an important question is how the standard ChaChas role over their counter. |
What is the use case for this? People using ChaCha for encryption should probably rather use a different crate with a higher-level interface that supports authenticated encryption. I don't think we should encourage non-RNG use cases. I would prefer not to provide these features which are very easy to misuse. Another option alternative to the |
There is a whole world of cryptography outside simple authenticated encryption, including much that benefits from being close to standards, but for which nobody can or will take the time to develop standards. As an example, the old rand crate made ZCash's Power of Tau trusted setup program harder to reimplement in Go. |
Why should such functionality be provided by rand? You could just provide
your own crate for your niche use. If you really want the low-level
algorithm, it could be factored out to its own crate (isn't that what
rust-crypto already does?), without having to expose it to every rand user.
How is some Go implementation related to rand?
…On Wed, Apr 18, 2018, 19:45 Jeff Burdges ***@***.***> wrote:
There is a whole world of cryptography outside simple authenticated
encryption, including much that benefits from being close to standards, but
for which nobody can or will take the time to develop standards. As an
example, the old rand crate made ZCash's Power of Tau trusted setup program
harder to reimplement in Go.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#374 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AACCtJlp0Kl1C7caz5KN1tk8fZnNxQ0Gks5tp3uxgaJpZM4TIS8d>
.
|
A CSPRNG is always a low level algorithm, but especially a deterministic one like ChChaRng. rand is a low level algorithms crates the moment it decides to address cryptography. There are no scenarios under which ChChaRng will be used for encryption. We do use it as a CSPRNG however, or more formally as a PRF. If you pick a PRF that is too non-standard, then your existing protocol cannot easily become more standard, including by being implemented elsewhere, so you tends to box yourself into doing necessary work. Anyways.. IETF ChaCha has only a 32 bit block counter. All other ChaCha variants have a 64 bit block counter. Afaik, there are no cryptography uses for variable round ChaCha, but the next_u32 vs next_u64 business could mess people up, so we reiterating any issues around skipped output and endianess there that sounds wise. |
Okay, then I agree that the best solution for now is just to drop the nonce functions (unless there are related standards for use of ChaCha as an RNG). Quite honestly unless there any standards to guide this, perhaps we should go back to the old |
Sorry I wrote the above ↑ while on a plane and didn't read your last messages before posting. I agree, standards are important. Are there any relevant here? Using a 64-bit counter plus 64-bit "stream" or nonce may be the best option; in this case is it still reasonable to wrap the counter to 0 rather than panic? I'd really prefer to avoid panics. 1ZiB is ridiculously huge so there's no worry about repeats unless the counter gets set while generating. Regarding |
I would say wrapping the counter is by far the most reasonable thing to do, without giving any notification at all. The trouble does not lie in the counter, but in repeating the same stream of random numbers. As calculated, this is an incredibly large amount of data, taking many decennia/centuries to generate, all available for an attack. Instead we can assume that when the counter wraps around, |
I think the old In We could even provide |
Well, I've been trying to adapt ChaCha to support variable size nonce, defined at construction time. That works, but it's a bit complex (interacting with nonce-defining code and offset/counter adjusting code). So a fixed 64-bit counter and 64-bit nonce is much easier; and also probably a better way to go if we want to make our implementation a "standard". I've also been trying to add |
I pushed some code implementing the above (with fixed 64-bit counter); I'm not really happy with it but it's sufficient to get some feedback. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
A lot of new functionality here!
rand_core/src/impls.rs
Outdated
@@ -219,20 +219,38 @@ impl<R: BlockRngCore> BlockRng<R> { | |||
} | |||
|
|||
/// Return a reference the wrapped `BlockRngCore`. | |||
#[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just curious: did inlining turn out to be necessary, or did you just inline all methods?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I haven't benchmarked at all, still meaning to do that.
rand_core/src/impls.rs
Outdated
pub fn reset(&mut self) { | ||
self.index = self.results.as_ref().len(); | ||
} | ||
|
||
/// Generate a new set of results immediately, setting the index to the | ||
/// given value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see the use in combining both pieces of functionality, but doing it with a method named generate
is a bit strange. I think it is better to just have get_index
, set_index
and generate
separately.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did do that, but this provides an extra level of protection to avoid re-using the same results. It also seems to fit with the existing code extremely well.
src/prng/chacha.rs
Outdated
self.0.inner_mut().set_counter(counter_low, counter_high); | ||
self.0.reset(); // force recomputation on next use | ||
/// This returns an error only if the result is not representable in the | ||
/// 128-bit return value. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is always representable?
Does the returned position represent the index of the last returned value, or that of the next random number? The choice seems arbitrary, but good to document. get_counter
pointed at the next block of results.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, left-over from when I was trying to support 128-bit counters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Practically, u64
would be large enough most of the time, but probably better to use u128
to avoid extra error handling.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, I think that is a good choice.
src/prng/chacha.rs
Outdated
/// This returns an error only if the result is not representable in the | ||
/// 128-bit return value. | ||
#[cfg(feature = "i128_support")] | ||
pub fn get_offset(&self) -> u128 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not convinced on the name offset
, as that sounds more like the distance between two points. What do you think of position
? Or other alternatives?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not convinced either. It's also quite a clunky function compared to get_counter
, but I suppose some users may actually need this functionality! [Seek
] uses seek(pos)
but doesn't have a "get" equivalent.
src/prng/chacha.rs
Outdated
/// Set the offset from the start of the stream, in bytes. | ||
#[cfg(feature = "i128_support")] | ||
pub fn set_offset(&mut self, offset: u128) -> Result<(), ()> { | ||
if (offset & 0x3) != 0 || (128 - offset.leading_zeros() - 6 > 64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(128 - offset.leading_zeros() - 6 > 64)
I think it is ok to just wrap around if the offset is beyond the period. I.e. just mask out the first couple of bits?
Can we define the offset/position to be in multiples of the word size? Users have to care about it already, so we may as well make the function infallible. I think there is just about no RNG that can seek to an exact byte, always to something word-size.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True, I suppose we wrap anyway during usage!
Seeking to word size also makes a lot of sense given that we're trying to "standardise" this such that fill_bytes
consumes a whole number of words.
src/prng/chacha.rs
Outdated
core: ChaChaCore, | ||
} | ||
|
||
impl ChaChaBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like the builder approach here. Does it have any real advantages?
The three provided methods here also make sense an the RNG itself, not only on the builder.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, there's a little problem with setting the nonce: one may need to regenerate the current buffer, and it doesn't exactly make sense to do that mid-buffer — surely you'd only set the nonce if resetting the counter too? When it comes to selecting a stream I'm not sure.
The set_rounds
thing I don't know; I'm still not quite convinced what we should do with this especially with regards to CryptoRng
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am (already was) ok with hiding/removing set_rounds
for now. Actually I glanced over the function too fast, I though the three builder methods for for nonce, stream and counter. What do you think of just three 'setting' methods on ChaChaRng
, like so?
set_position(&mut self, position: u128)
. High-level, has to regenerate the buffered results if the position does not lie on a block boundary. Otherwise waits with generating until on next use. Position is the position of the next value to return.set_stream(&mut self, stream: u64)
. High-level, schedules generating a new block of results.set_counter_and_nonce(&mut self, counter: u64, nonce: [u8; 8])
. Low-level, also schedules generating a new block of results.
And I suppose three corresponding getters.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That seems a reasonable choice. But if we had separate set_counter
and set_nonce
which each reset the block, the compiler would optimise out the extra reset if both were called anyway, right? So maybe four methods are better.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reset does nothing more than reset the index to results.len()
. If this is optimized out or not is immeasurable.
I thought combining them made sense because as you said, if you change the nonce you probably also want to change the counter. And if you want to change only the counter, either do it in combination with the nonce or use set_position
(when i128
gets more common).
src/prng/chacha.rs
Outdated
/// Set the nonce from a stream number. | ||
/// | ||
/// This simply reinterprets the nonce region as a stream code. | ||
pub fn set_stream(mut self, stream: u64) -> Self { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was slightly confused about the use of this, but 👍.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would it be useful with parallel iterators perhaps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, but I suppose there are more situations.
B.t.w. I don't envy you making the controversial PRs. Keep them coming 😄. |
@dhardy To be honest I expected rand 0.5 to be released about three weeks ago. Is there something I can do to help? |
Find a mutually-acceptable solution to #416? |
I've been thinking about generalising this by adding some things to trait SeedableRng {
...
type Nonce: Sized + Default + AsMut<[u8]>; // nonce / seed type
type Options: Default; // type-specific creation options
fn from_seed_and_nonce(seed: Seed, nonce: Nonce, opts: Options);
fn with_seed(seed: Seed) -> SeedableRngBuilder<Self>;
}
struct<T: SeedableRng> SeedableRngBuilder<T> {
seed: <T as SeedableRng>::Seed,
nonce: <T as SeedableRng>::Nonce,
pub opts: <T as SeedableRng>::Opts,
}
impl<T: SeedableRng> SeedableRngBuilder<T> {
fn with_nonce(mut self, nonce) -> Self { .. }
fn with_stream(mut self, stream: u64) -> Self { .. } // set nonce from stream
fn build(self) -> T {..}
}
// used like:
let mut rng = ChaChaRng::with_seed(my_seed).with_nonce(my_nonce).build(); It would a be fairly simple addition and wouldn't need to impair on the existing ways to use Note that the more obvious model, an extra trait with It seems an interesting idea finding a general method to support nonces in RNGs, but of course the simpler general method is simply to use part of the seed as the nonce, where required ( |
Are you sure you are not over-thinking things? Because I understand it, a nonce has nothing to do with RNGs. It is already a concession to expose it for ChaCha. |
Allowing configuration of the number of rounds via state adjustment has disadvantages: undesirable manipulation of the generator and interaction with CryptoRng.
Also restricts the counter to 64-bits (still allows 1 ZiB output) The primary rationale is simpler specification of this RNG.
Rewritten and massively simplified:
This is partly with @burdges comments in mind about keeping the specification simple and easy for other implementations to match. Possibly we should add a test-vector involving streams and counters, though we sort-of have that already. |
} | ||
} | ||
} | ||
|
||
impl CryptoRng for ChaChaCore {} | ||
|
||
impl From<ChaChaCore> for ChaChaRng { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! Worth adding to HC-128 and ISAAC too (eventually)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Glad to say this is looking good!
I only have a few small comments.
/// nonce. A 64-bit counter over 64-byte (16 word) blocks allows 1 ZiB of output | ||
/// before cycling, and the stream identifier allows 2<sup>64</sup> unique | ||
/// streams of output per seed. Both counter and stream are initialized to zero | ||
/// but may be set via [`set_word_pos`] and [`set_stream`]. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document that set_word_pos
is only available with the nightly
or i128_support
features?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess, but i128
support will be standard at some point in the near-ish future anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since a u64
can represent 64 Exabytes, it feels a bit stupid using a u128
here — should we use a u64
instead and panic if the upper four bits of the counter are non-zero? This will suit the vast majority of users just fine, with only half the output size.
Note that in this case there is some rationale for allowing set/get counter directly too (arguably jumping to an arbitrary block address is a separate use case).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If u128
was stabilised a couple of months earlier, would you also worry about it?
It seems like a good property to me that all positions can be indicated and set.
/// Since the generated blocks are 16 words (2<sup>4</sup>) long and the | ||
/// counter is 64-bits, the offset is a 68-bit number. Sub-word offsets are | ||
/// not supported, hence the result can simply be multiplied by 4 to get a | ||
/// byte-offset. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you document that the returned position is that of the previous value? And I have a small preference for returning the next value instead. Otherwise you get u64::MAX
right after initialization.
Maybe also add a test that you can generate a couple of values twice using get_word_pos
and set_word_pos
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you get u64::MAX right after initialization
Sounds like I should fix that!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually, this is incorrect — set_word_pos(0)
will set both parts of the counter to zero, then call reset (effectively triggering generation of values on next use). get_word_pos()
will return zero when the counter is zero and index is high — this relies on the index never being zero, which seems to be an informal contract of the BlockRng
code. Perhaps we should make that explicit, or test for zero in get_word_pos
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I read your code wrong. Yes, it is good 😄.
this relies on the index never being zero, which seems to be an informal contract of the
BlockRng
code.
The old ISAAC code, where this comes from, did use a zero index right after seeding. Because a new batch of results was already generated after seeding, and 0 values of those where used.
Your new generate_and_set
can provide the same, and I see no harm in it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
True. Annoying to have to check another thing, but best to be correct. 👍
Edit: I see my code does not rely on index not being zero, so no problem there.
src/prng/chacha.rs
Outdated
self.0.reset(); // force recomputation on next use | ||
/// This is initialized to zero; 2<sup>64</sup> unique streams of output | ||
/// are available per seed/key. | ||
pub fn set_stream(&mut self, stream: u64) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You decided to drop set_nonce
? Or only to not add it for now?
Would it be enough to just explain here that converting a [u8, 8]
nonce to a u64
little-endian stream
does the trick?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes; RNG streams we potentially have use-cases for; nonces as you mentioned aren't usually used with RNGs. I guess I could mention this, but we don't support the IETF variant properly anyway due to counter increment, and probably that is the more widely used variant with nonces?
|
||
/// Generate a new set of results immediately, setting the index to the | ||
/// given value. | ||
pub fn generate_and_set(&mut self, index: usize) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good, you agree. Given that I wanted to avoid/minimise breakage with the published API, this seemed like the best option.
I addressed some of @pitdicker's review concerns and found a bug in |
🎉 |
Part of #349 @burdges @pitdicker
set_counter
to take only a single 64-bit counterget_counter
This doesn't resolve the issue about setting rounds yet.
Also this removes functionality from
ChaChaCore
but I assume that will not be used directly (except withinReseedingRng
where these functions aren't useful anyway).