-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement Rng::fill
#35
Conversation
src/lib.rs
Outdated
#[inline] | ||
pub fn fill(&self, slice: &mut [u8]) { | ||
for item in slice { | ||
*item = self.u8(..); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This implementation could be more efficient. There's probably a way to generate a u64
at a time and serialize that into the byte slice instead of repeatedly generating u8
's.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, was thinking about that. Let me look into it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ok, check 5e87fd0
Much faster. I don't know if there's a faster way without using unsafe
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Benchmarks:
test fill ... bench: 29 ns/iter (+/- 5)
test fill_naive ... bench: 343 ns/iter (+/- 33)
src/lib.rs
Outdated
|
||
/// Return a byte array of the given size. | ||
#[inline] | ||
pub fn bytes<const N: usize>(&self) -> [u8; N] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't this be an MSRV bump?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm, if it is, we'll have to decide whether it's worth it or not I guess.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this function for now, can open another PR fot it.
Fills a byte slice with random data. Benchmarks: test fill ... bench: 29 ns/iter (+/- 5) test fill_naive ... bench: 343 ns/iter (+/- 33)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like you probably don't even need the copy_from_slice
here. You could probably use try_into
to cast the slices to &mut [u8; 8]
and then just set that to the result of to_ne_bytes
. You may also want to align the chunks to the 8-byte boundary to maybe take advantage of aligned writes.
src/lib.rs
Outdated
// Filling the buffer in chunks of 8 is much faster. | ||
let mut chunks = slice.chunks_exact_mut(8); | ||
for items in chunks.by_ref() { | ||
let r = self.u64(..); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't we use gen_u64()
here instead of going through the u64(..)
adaptor?
src/lib.rs
Outdated
let mut chunks = slice.chunks_exact_mut(8); | ||
for items in chunks.by_ref() { | ||
let r = self.u64(..); | ||
items.copy_from_slice(&r.to_le_bytes()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use to_ne_bytes()
instead in order to avoid a performance hit on big endian systems.
Ah, yes indeed possible, however after trying it out, the benchmark doesn't read any performance difference, so I think the generated code is the same: /// Fill a byte slice with random data.
#[inline]
pub fn fill(&self, slice: &mut [u8]) {
// Filling the buffer in chunks of 8 is much faster.
let mut chunks = slice.chunks_exact_mut(8);
for items in chunks.by_ref() {
let r = self.gen_u64();
let sl: &mut [u8; 8] = items.try_into().unwrap();
*sl = r.to_ne_bytes();
}
for item in chunks.into_remainder() {
*item = self.u8(..);
}
} What do you think? The only nit with the above code is the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks good to me. That being said, it might be above my pay grade to merge this.
@smol-rs/admins Any thoughts on this?
I'm not sure if the |
Yeah I agree, I had it there originally when the function was small and unoptimized, but I would remove it now. For the record, the benchmark isn't affected when |
Don't have time to look into the LLVM stuff, sorry! |
Fills a byte slice with random data.
It's too bad there's no generic gen function, but I think in most cases we want to fill byte slices.