Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Rng::fill #35

Merged
merged 2 commits into from
Sep 13, 2022
Merged

Implement Rng::fill #35

merged 2 commits into from
Sep 13, 2022

Conversation

cloudhead
Copy link
Contributor

Fills a byte slice with random data.

It's too bad there's no generic gen function, but I think in most cases we want to fill byte slices.

src/lib.rs Outdated
#[inline]
pub fn fill(&self, slice: &mut [u8]) {
for item in slice {
*item = self.u8(..);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This implementation could be more efficient. There's probably a way to generate a u64 at a time and serialize that into the byte slice instead of repeatedly generating u8's.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, was thinking about that. Let me look into it.

Copy link
Contributor Author

@cloudhead cloudhead Sep 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok, check 5e87fd0

Much faster. I don't know if there's a faster way without using unsafe.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Benchmarks:

  test fill             ... bench:          29 ns/iter (+/- 5)
  test fill_naive       ... bench:         343 ns/iter (+/- 33)

src/lib.rs Outdated

/// Return a byte array of the given size.
#[inline]
pub fn bytes<const N: usize>(&self) -> [u8; N] {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Won't this be an MSRV bump?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, if it is, we'll have to decide whether it's worth it or not I guess.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this function for now, can open another PR fot it.

Fills a byte slice with random data.

Benchmarks:

  test fill             ... bench:          29 ns/iter (+/- 5)
  test fill_naive       ... bench:         343 ns/iter (+/- 33)
Copy link
Member

@notgull notgull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like you probably don't even need the copy_from_slice here. You could probably use try_into to cast the slices to &mut [u8; 8] and then just set that to the result of to_ne_bytes. You may also want to align the chunks to the 8-byte boundary to maybe take advantage of aligned writes.

src/lib.rs Outdated
// Filling the buffer in chunks of 8 is much faster.
let mut chunks = slice.chunks_exact_mut(8);
for items in chunks.by_ref() {
let r = self.u64(..);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't we use gen_u64() here instead of going through the u64(..) adaptor?

src/lib.rs Outdated
let mut chunks = slice.chunks_exact_mut(8);
for items in chunks.by_ref() {
let r = self.u64(..);
items.copy_from_slice(&r.to_le_bytes());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use to_ne_bytes() instead in order to avoid a performance hit on big endian systems.

@cloudhead
Copy link
Contributor Author

Ah, yes indeed possible, however after trying it out, the benchmark doesn't read any performance difference, so I think the generated code is the same:

    /// Fill a byte slice with random data.
    #[inline]
    pub fn fill(&self, slice: &mut [u8]) {
        // Filling the buffer in chunks of 8 is much faster.
        let mut chunks = slice.chunks_exact_mut(8);
        for items in chunks.by_ref() {
            let r = self.gen_u64();
            let sl: &mut [u8; 8] = items.try_into().unwrap();

            *sl = r.to_ne_bytes();
        }

        for item in chunks.into_remainder() {
            *item = self.u8(..);
        }
    }

What do you think? The only nit with the above code is the unwrap() I guess.

Copy link
Member

@notgull notgull left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me. That being said, it might be above my pay grade to merge this.

@smol-rs/admins Any thoughts on this?

@fogti
Copy link
Member

fogti commented Sep 12, 2022

I'm not sure if the #[inline] tagging of the function is appropriate, given that it is relatively large. Can we get an LLVM IR of it?

@cloudhead
Copy link
Contributor Author

I'm not sure if the #[inline] tagging of the function is appropriate, given that it is relatively large. Can we get an LLVM IR of it?

Yeah I agree, I had it there originally when the function was small and unoptimized, but I would remove it now. For the record, the benchmark isn't affected when #[inline] is removed.

@cloudhead
Copy link
Contributor Author

cloudhead commented Sep 13, 2022

#[inline] now removed.

Don't have time to look into the LLVM stuff, sorry!

@fogti fogti merged commit 6fe2c33 into smol-rs:master Sep 13, 2022
@notgull notgull mentioned this pull request Feb 12, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

3 participants