-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce JitterRng size #251
Conversation
368648a
to
3964474
Compare
3964474
to
36093a7
Compare
I updated the first commit and the last two to make the CI pass (glad there exists something like CI to catch my mistakes) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm uneasy about this. For a start, it's supposed to be a crypto-strength entropy generator, but we're straying further from the reference implementation, so I'm less happy implementing CryptoRng
— but this is required for e.g. ChaChaRng::new()
to be secure.
I think I'd prefer to just put everything inside a static mutex, similar to os::ReadRng
. This might require making JitterRng
a shim around inner data and make it difficult for JitterRng::new
and JitterRng::new_with_timer
to return the same type. Let me know if you want me to try this, but I'd rather hear your opinion before trying anything.
// Memory for the Memory Access noise source FIXME | ||
mem_prev_index: u16, | ||
// Make `next_u32` not waste 32 bits | ||
data_half_used: bool, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure whether Rustc does in fact re-order fields for better packing yet. If that's the reason for using small integers, it's probably worth re-ordering them here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I believe it reorders for a couple of versions now, but I measured that the size actually decreased with size_of::JitterRng<Timer>
.
last_delta: i64, | ||
last_delta2: i64, | ||
last_delta: i32, | ||
last_delta2: i32, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why make these smaller if they're not being saved anyway?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. I made this change while exploring various idea's, and saw no harm in keeping them as i32
.
state.set_rounds(rounds); | ||
|
||
// Fill `data` with a non-zero value. | ||
state.gen_entropy(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why bother? Doesn't look like it gets used.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It gets used, see #251 (comment)
@@ -48,25 +48,53 @@ const MEMORY_SIZE: usize = MEMORY_BLOCKS * MEMORY_BLOCKSIZE; | |||
// Note: the C implementation relies on being compiled without optimizations. | |||
// This implementation goes through lengths to make the compiler not optimise | |||
// out what is technically dead code, but that does influence timing jitter. | |||
pub struct JitterRng { | |||
pub struct JitterRng<T: JitterTimer> { | |||
data: u64, // Actual random number |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, outside of gen_entropy
calls, this is either unused or the more significant half is used. So would it be better to move data
to EcState
and store an Option<u32>
here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, this is what is called the 'entropy pool' in previous versions of jitter_entropy. I wondered if it was necessary to preserve across runs. Suppose we initialize it with some constant with each round of gen_entropy
. In theory, if the timer does not collect the expected number of bits of entropy per round once in a while, the result will be a little closer to that constant. Preserving data
across runs will instead make it just a little closer to the previously generated value.
// Timer used by `measure_jitter` | ||
timer: T, | ||
// Memory for the Memory Access noise source FIXME | ||
mem_prev_index: u16, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So, is it necessary to use a different index every time when the whole block gets reallocated? I'm not really sure about the memory access bit — e.g. if the block is freshly allocated each time, what's to stop the compiler just working out the result in advance?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using a different index each time will keep the code working as before, I did not want to change the algorithm. Preserving the index across runs is the reason the compiler can not work out the result. But I should have updated the comment (I use a lot of FIXME
s while making changes 😄)
prev_time: 0, | ||
last_delta: 0, | ||
last_delta2: 0, | ||
timer: timer.clone(), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't be necessary to clone
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
👍, leftover from a previous version.
prev_time: self.timer.get_nstime(), | ||
last_delta: 0, | ||
last_delta2: 0, | ||
mem: [0; MEMORY_SIZE], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is statically initialised to zero each time, what's to stop the compiler optimising the memory access out? Even if it doesn't, this will be in the lowest-level cache it fits in already.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If this is statically initialised to zero each time, what's to stop the compiler optimising the memory access out?
The black_box(ec.mem[0]);
sprinkled around placed strategically.
Even if it doesn't, this will be in the lowest-level cache it fits in already.
Good point, I will do some measurements.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I collected measurements with timer_stats
for 100_000 rounds, with the CPU timer source in timer_stats
disabled. So I only measured the mem_access
noise source.
The result of the NIST Entropy Estimation test on the current version is: 0.449745 (on my PC).
And with new version it is: 0.465139. So no difference.
let mut ec = EcState { | ||
prev_time: self.timer.get_nstime(), | ||
last_delta: 0, | ||
last_delta2: 0, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not really sure about these deltas — you're sure they don't need to be preserved?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They are only necessary for the stuck test. Now we set prev_time
when initializing EcState
, and set it again together with last_delta
in the first dummy round of measure_jitter
. That dummy round was already part of the algorithm.
We are only missing the previous value of last_delta2
, making the stuck test on the first real round of measure_jitter
slightly less reliable. But using a last_delta2
that could have been set seconds or minutes earlier was not all that reliable already anyway.
So I see no loss here, but it takes some thinking through...
Yes, good to be careful! What has changed is that the The second change is the location of the memory area. In
This was one of the variants I made, but though this was the nicer option. |
Thanks of the detailed review! |
Well, I guess your replies are enough to convince me this is probably the way to go. I'll leave this open to further comments for a few days before merging. |
@@ -48,12 +48,12 @@ const MEMORY_SIZE: usize = MEMORY_BLOCKS * MEMORY_BLOCKSIZE; | |||
// Note: the C implementation relies on being compiled without optimizations. | |||
// This implementation goes through lengths to make the compiler not optimise | |||
// out what is technically dead code, but that does influence timing jitter. | |||
pub struct JitterRng { | |||
pub struct JitterRng<T: JitterTimer> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Turns out, this is a problem: JitterRng
is no longer a concrete type, so we can't use it in the EntropyRng
enum. I'm going to revert this commit: 3aa580a
This PR contains various changes to
JitterRng
to get it's size down from 2112 bytes to 12 bytes (rounded up to 16).The largest change is splitting the state in a part that is preserved across runs, and a part that is reinitialized with every call to
gen_entropy
. The memory for thememaccess
noise source does not need to be preserved. Alsoprev_time
was already initialized again for every round ofgen_entropy
. And the two delta's for the stuck test are questionable to preserve across runs.Various fields could be reduced in size,
index
is less than64*32=2048
so it fits in anu16
.rounds
as at most128
,JitterRng
does work if the entropy we can collect per round is less than half a bit. And we can expect the delta's to be of sub-second size, so they always fit in ani32
.If
JitterRng
was used withnext_u32
, we would preserve the unused 32 bits in anOption
. This duplicates the bits indata
. Now I just used abool
to indicate half of it is still available. This makes the case where someone alternately usesnext_u32
andnext_u64
withJitterRng
slightly less efficient, but I don't imagine that is a realistic scenario.Finally I have replaced the pointer to a timer function with a zero-sized type with a trait that implements the timer. It feels a little bit more like the Rust way of doing things, but it doesn't look great to me, because
JitterRng
gets an extra parameterJitterTimer
.Back when
JitterRng
was first merged there were some problems with how I tried to avoid rounding errors in the estimation of the number of rounds intest_timer
. An alternative I did not use at the time (going for simplicity) was to use a lookup table for small values. The last commit implements that.