-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add BlockRng abstraction #281
Conversation
11af2b0
to
d45cf03
Compare
Added a commit that modifies some inline hints, which takes care of the lower performance in |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am wondering if we should try to support #[derive(RngCore, SeedableRng)]
somehow, though that can be done later of course.
Mostly looks good, but I'm a little concerned with the error handling of generate()
. I learned before that the let _ = ...
pattern can hide problems too easily.
I still want to go through the reseeding stuff more closely, though not concerned.
src/impls.rs
Outdated
/// | ||
/// `next_u32` simply indexes the array. `next_u64` tries to read two `u32` | ||
/// values at a time if possible, and handles edge cases like when only one | ||
/// value is left. `try_fill_bytes` is optimized to even attempt to use the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Documentation indicates this is specific to u32
internal values, so why not rename it to Block32Rng
or something?
Third sentence would be better put something like:
try_fill_bytes
is optimized to use theBlockRngCore
implementation directly when generating fresh values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I still hope there is some way to use specialization in the future to use BlockRng
also with u64
values. No problem to rename it, do you think I should?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get why you would want to do that though — the next_u32
and next_u64
impls would have to be rewritten, so it's only the try_fill_bytes
bit that gets reused — and some abstraction could allow that. So you might as well just have Block32Rng
and Block64Rng
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The idea was that you can then use both 64- en 32-bit RNGs with wrappers like ReseedingRng
(just like now). But as I haven't got it working, I am not sure. Maybe best to keep this one with the name it has, and if specialisation doesn't work out name only the other Block64Rng
. A 32-bit variant is by far the most common anyway.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you still could with separate Block64Rng
, though there might be a lot of code redundancy. But your plan sounds good 👍
src/impls.rs
Outdated
self.index += 2; | ||
// Read an u64 from the current index | ||
if cfg!(any(target_arch = "x86", target_arch = "x86_64")) { | ||
unsafe { *(&self.results.as_ref()[index] as *const u32 as *const u64) } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe add a comment that this assumes both LE and support for unaligned reads
src/impls.rs
Outdated
let _ = self.core.generate(&mut self.results); | ||
self.index = 2; | ||
let x = self.results.as_ref()[0] as u64; | ||
let y = self.results.as_ref()[1] as u64; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You didn't want to use the x86 optimisation here? You could probably put that inside a local read_u64
function/closure to save repetition.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't even think about doing that. Do you think it is worth the extra complexity? Not that things are easy now...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Didn't seem like it would add much complexity to me.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I didn't look close enough. We already do the same a few lines up 😄 (now near a computer again).
src/impls.rs
Outdated
|
||
let len_remainder = | ||
(dest.len() - filled) % (self.results.as_ref().len() * 4); | ||
let len_direct = dest.len() - len_remainder; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rename len_direct
→ end_direct
? It's not the length (because it includes filled_u8
).
src/impls.rs
Outdated
|
||
// As an optimization we try to write directly into the output buffer. | ||
// This is only enabled for platforms where unaligned writes are known to | ||
// be safe and fast. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
on LE platforms where — as above, there are two requirements I believe
src/impls.rs
Outdated
(y << 32) | x | ||
} else { | ||
let x = self.results.as_ref()[len-1] as u64; | ||
let _ = self.core.generate(&mut self.results); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwrap instead of let _ =
please
src/impls.rs
Outdated
} | ||
|
||
fn fill_bytes(&mut self, dest: &mut [u8]) { | ||
let _ = self.try_fill_bytes(dest); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
unwrap instead of let _ =
please
src/prng/hc128.rs
Outdated
x.rotate_right(17) ^ x.rotate_right(19) ^ (x >> 10) | ||
// Cannot be derived because [u32; 1024] does not implement Clone in | ||
// Rust < 1.21.0 (since https://github.com/rust-lang/rust/pull/43690) | ||
impl Clone for Hc128Core { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We now require Rust ≥ 1.22.0 anyway
@@ -39,73 +40,125 @@ use {RngCore, SeedableRng, Error, ErrorKind}; | |||
/// `ReseedingRng` with the ISAAC RNG. That algorithm, although apparently | |||
/// strong and with no known attack, does not come with any proof of security | |||
/// and does not meet the current standards for a cryptographically secure | |||
/// PRNG. By reseeding it frequently (every 32 MiB) it seems safe to assume | |||
/// PRNG. By reseeding it frequently (every 32 kiB) it seems safe to assume |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, given that this is discussing the old StdRng
/// It is usually best to use the infallible methods `next_u32`, `next_u64` and | ||
/// `fill_bytes` because they can make use of this error handling strategy. | ||
/// Use `try_fill_bytes` and possibly `try_reseed` if you want to handle | ||
/// reseeding errors explicitly. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why delete this doc?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because you changed try_reseed
to have the same error handling, so no reason to prefer them.
The idea with |
src/reseeding.rs
Outdated
// we should return the error from reseeding. | ||
// The only way to get this behaviour without destroying performance | ||
// was to split part of the function out into a | ||
// `reseed_and_generate` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was about branch prediction then, presumably? I can't find anything Rust-specific, and the answers here contradict each other... so maybe this is the best option (and more portable than relying on which branch is most likely to be predicted).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think it had to do with branch prediction, but with the size of the function and LLVM trying to combine both generate()
functions and both returns, moving things around and adding checks outside the single branch we have now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I'm still not happy with the error handling (see suggestions).
src/reseeding.rs
Outdated
/// through some combination of retrying and delaying reseeding until later. | ||
/// It will also report the error with `ErrorKind::Transient` with the | ||
/// original error as cause. | ||
fn auto_reseed(&mut self) -> Result<(), Error> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't really like the changes to the name and doc. The name auto_reseed
doesn't really imply anything more than reseed
(since obviously it's talking about itself).
src/reseeding.rs
Outdated
// Behaviour is identical to `try_reseed`; we just squelch the error. | ||
let _res = self.try_reseed(); | ||
fn reseed(&mut self) -> Result<(), Error> { | ||
R::from_rng(&mut self.reseeder).map(|result| self.core = result) | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
... this is called by ReseedingRng::reseed
(though it probably shouldn't be) and by auto_reseed
. Just inline it; we don't need the extra function. Then you can rename auto_reseed
→ reseed
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
O, I forgot about this function in the first comment, and also with the recent work around ReseedingRng
.
Before, ReseedingRng
had a method users could manually call to force the RNG to reseed. Now the reseed
method has gotten much smarter (thanks to you). Delaying, and wrapping the error. That is very useful for the situations where we reseed 'because it is time'. But I think it useful to also expose the basic functionality again.
That is why the extra function in ReseedingRng::reseed
(and Reseeder::reseed)...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
All the "smart" logic does is potentially delay automatic reseeding and change the error kind. I don't see how that is a problem here. It still "forces" reseeding just as much as the old logic; it just acts a bit differently if the attempt fails.
|
||
/// Reseed the internal PRNG. | ||
pub fn reseed(&mut self) -> Result<(), Error> { | ||
self.0.core.reseed() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doesn't adjust bytes_until_reseed
; it should probably call auto_reseed
instead...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is done on purpose, see the previous comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
But if reseeding is successful, then it makes sense to reset bytes_until_reseed
IMO.
src/reseeding.rs
Outdated
BlockRng { | ||
core: Reseeder { | ||
core: rng, | ||
reseeder: reseeder, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These names are confusing. "core" is used both for the core of the BlockRng
and as the inner RNG in Reseeder
. Reseeder
is an internal type used to implement ReseedingRng
; reseeder
is the RNG used for reseeding.
core
1: no changecore
2: rename toinner
orrng
orinner_rng
?Reseeder
: maybeReseederBlockRng
? Or leave as isreseeder
: possiblyreseeding_rng
, or leave as is but only if above name is changed
src/reseeding.rs
Outdated
let res1 = self.auto_reseed(); | ||
self.bytes_until_reseed -= results.as_ref().len() as i64 * 4; | ||
let res2 = self.core.generate(results); | ||
if res2.is_err() { res2 } else { res1 } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Since reseed_and_generate
is only used within generate
, I suggest ignoring any error from auto_reseed
here, but passing on errors from generate
— because if reseeding fails we can still output fresh random numbers, but if the latter fails (if that's even possible) then we have a much more serious problem.
This implies that if generate
doesn't return an Result
then this shouldn't either; I'm not sure which way it should go, but IMO generate
should never return errors from reseeding because there's nothing useful to do with them and it just ends up with bad handling like I already commented on above.
I will try to write a little better about my reasoning, but otherwise time for a discussion 😄. In all but this last variant of the Then, it seemed cleaner to reuse the With your PR #255 things became easier, with both The same thing was possible here, but then the implementations in Also continuing to request bytes even after a PRNG returns an error seems reasonable to me, in I prefer giving an error over giving no indication of a failed reseed ever. Also because than all the effort we put into that 😄. But I agree that if we keep things like this, the documentation of |
But if the error originated in the PRNG's |
Very helpful 😄. I have made most of the changes you recommended. But to be honest I still think the error handling story is not that bad as it is implemented now... Still it is all very theoretical, whith PRNGs that should never fail, and reseeding which not only uses
You mean |
Still sounds to me like your error handling is doing theoretically the wrong thing with the excuse it doesn't matter because the PRNG will never fail. But we don't know what PRNG people might use, and in any case this is a poor excuse. Are there any drawbacks to the error-handling I implemented? |
We are looking different at the error handling here, and I am not sure why. I don't see how relying on a PRNG to keep working is "theoretically the wrong thing". Is it because you have been bitten in the past by
I think that the fact that PRNGs can't fail is one of the foundations of APIs in Or maybe you expect the
It removes the main reason I added error handling to the |
I know I'm being difficult with the error handling. But this PR does not change or regress any scenario we have now. Only the I would like to move this PR forward. Basically there is one part left where we differ in opinion, and I would argue the most minor part of it. If we agree the abstraction is worth it, and there are no other concerns, can this be merged? |
Sorry @pitdicker I missed your comment from 4 days ago. Yes, I've been bitten by We've constructed a boundary between things that can be expected to fail and things that can't, and the way this is handled is that things not expected to fail (like calling I'll have another look over this later and see if I have any other ideas if you like. I'd also be happy removing the |
It is assumed that the
Sorry, but details are important. |
BTW do you want to tack this on to this PR? |
That was actually part of the PR, but I couldn't easily get it to compile. I will do a second try. |
Yes they are. I was only thinking at the time and (as it seems to me) endless rebases this took, and now I have to rebase again 🙁. Edit: no problem though. I am slowly leaning towards removing the error story from |
You can just merge if you prefer... hopefully the new rand-core crate doesn't cause too many problems. Yes, it's a pain dealing with many branches (I'm currently rebasing It's true, we lose all reseeding errors that way. I'm not too fussed really; if I think the "correct" way to do error handling like you've been trying to do would be to check the error kind, drop the error if it's |
f99f9cb
to
f684ed6
Compare
I have removed all traces of error handling from |
f684ed6
to
08924a3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW I'm merging this with master for you, plus making a few tweaks. I'll let you see before committing to master.
value | ||
} | ||
|
||
#[inline(always)] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you sure these functions should always be inlined? They're relatively large.
I've several questions:
or maybe
And non-generic solutions work right now, of course. |
Thanks for reviewing @burdges
|
|
|
|
5.a) I explained why crypto-rngs will never use the results buffer to generate the next output while non-crypto RNGs don't usually use a results buffer, so I don't think there's any need to support internal results buffers. |
We should imho file a rustc bug report if In the worst case, there are still many tricks one can play even if this incurs a copy, like maybe |
In fact, I suppose
where I'd say do this your way for now but plan to change |
The idea here is that many cryptographic RNGs generate values in blocks of 16 or more values. Examples that are currently in
rand
are ChaCha, HC-128 and ISAAC. Performance of these RNGs can be improved relatively much by using optimized methods to read the values from the results buffer, and these functions are not trivial. HC-128 currently contains the fastest variants, but it would be nice to make them easily available for all RNGs that generate values in blocks.Another interesting case is
ReseedingRng
. With every call tonext_u32
,next_u64
etc it has to check whether it is time to reseed. But this does not make sense if the values are already generated and in the results buffer of such a 'block' RNG. Only checking whether to reseed when it is time to generate a new block, can improve performance by ~40%.So this PR adds a trait
BlockRngCore
, that contains the core algorithm of such a block RNG. And a wrapperBlockRng
that implements theRngCore
methods optimally. I have implemented the trait for ChaCha, HC-128 andReseedingRng
. It is not hard to add ISAAC, but the PR is already challenge enough to review as it is I think 😄.A little fine-tuning still has to happen. Some benchmarks alternate between fast and slow (
reseeding_hc128_u32
for example between 1150 MB/s and 1380 MB/s). Some probably just need some smarter inlining, likeStdRng
. And I have seenreseeding_hc128_u64
to reach 1800 MB/s, although it at the moment stays at ~1535 MB/s. But I mostly wanted to get the basis out the door.And there is the open question of how to make the
BlockRng
wrapper not only support 32-bit generators but also 64-bit using specialisation (when available). Although ISAAC-64 is the only 64-bit algorithm I know of.Benchmarks (using
cargo benchcmp
):For a bit of history: this is my fifth attempt... The first attempt was 17 december (dhardy#76 (comment)). All kinds of cleanup had to happen before this PR was possible, and the master branch kept changing so much that this had to start from scratch 4 times.
I have tried to make as few as possible functional changes.
RngCore
gets it methods fromHc128Rng
.ChaChaRng
andHc128Rng
should be just a refactor, with no real changes.ReseedingRng
did need significant changes to it's logic, but all the pieces of the puzzle combine to give the same logic as #255. Only #255 (comment) had a performance impact of 15%, and I did not re-implement it in the end.