Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement faster thread local rng for scheduler #55501

Merged
merged 9 commits into from
Sep 9, 2024
Merged

Conversation

gbaraldi
Copy link
Member

@gbaraldi gbaraldi commented Aug 15, 2024

Implement optimal uniform random number generator using the method proposed in swiftlang/swift#39143 based on OpenSSL's implementation of it in https://github.com/openssl/openssl/blob/1d2cbd9b5a126189d5e9bc78a3bdb9709427d02b/crypto/rand/rand_uniform.c#L13-L99

This PR also fixes some bugs found while developing it. This is a replacement for #50203 and fixes the issues found by @IanButterworth with both rngs

C rng
image

New scheduler rng
image

On my benchmarks the julia implementation seems to be almost 50% faster than the current implementation.
With oscars suggestion of removing the debiasing this is now almost 5x faster than the original implementation. And almost fully branchless

We might want to backport the two previous commits since they technically fix bugs.

base/partr.jl Outdated
# this process, so each each word beyond the first has a probability
# of 2^-32 of not terminating the process. That is, we're extremely
# likely to stop very rapidly.
for _ in 1:10
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like we should be able to delete this loop entirely. Without it, we have a bias of 2^-32 which seems like it should be plenty low for the purposes of guaranteeing uniform scheduling

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's a fair point
image
This seems uniform enough.

@nsajko nsajko added the domain:randomness Random number generation and the Random stdlib label Aug 15, 2024
@gbaraldi
Copy link
Member Author

gbaraldi commented Aug 15, 2024

Hmm, this seems to have made i686 very unhappy. Ok the issue is that pointer load. 32 bit has different alignment which is what is blowing this up

@giordano
Copy link
Contributor

We might want to backport the two previous commits since they technically fix bugs.

Sounds like that should be a separate PR

@IanButterworth
Copy link
Sponsor Member

Potentially naive question, but do we have a good benchmark to demonstrate that a faster scheduler rng actually speeds up scheduling?

I ask because in my testing of #50203 I saw the surprising behavior that the threaded fib got slower the faster the rng used (with no changes to allocations)

function fib(n::Int)
    n < 2 && return n
    t = Threads.@spawn fib(n - 2)
    return fib(n - 1) + fetch(t)
end

@gbaraldi
Copy link
Member Author

I don't think it will affect it too too much. Because if the rand call is super hot then things aren't going so well elsewhere :)

@giordano giordano changed the title Implement faster thread local rng for scheduler. Also related fixes Implement faster thread local rng for scheduler Aug 17, 2024
@gbaraldi gbaraldi added status:merge me PR is reviewed. Merge when all tests are passing and removed status:merge me PR is reviewed. Merge when all tests are passing labels Aug 26, 2024
base/partr.jl Outdated Show resolved Hide resolved
base/partr.jl Outdated Show resolved Hide resolved
base/partr.jl Outdated Show resolved Hide resolved
base/partr.jl Outdated Show resolved Hide resolved
@giordano
Copy link
Contributor

There are already merge conflicts

@vchuravy
Copy link
Member

@nanosoldier runtests(ALL, vs = ":master")

@nanosoldier
Copy link
Collaborator

The package evaluation job you requested has completed - possible new issues were detected.
The full report is available.

@giordano
Copy link
Contributor

giordano commented Sep 1, 2024

Do we need to run benchmarks, too?

@gbaraldi
Copy link
Member Author

gbaraldi commented Sep 3, 2024

I don't think any benchmark is gonna show this. At least no current one. Calling the function is 5x faster that's all I can say

@gbaraldi gbaraldi merged commit 169e9e8 into master Sep 9, 2024
5 of 7 checks passed
@gbaraldi gbaraldi deleted the gb/fast-tls-rng branch September 9, 2024 15:10
kshyatt pushed a commit that referenced this pull request Sep 12, 2024
Implement optimal uniform random number generator using the method
proposed in swiftlang/swift#39143 based on
OpenSSL's implementation of it in
https://github.com/openssl/openssl/blob/1d2cbd9b5a126189d5e9bc78a3bdb9709427d02b/crypto/rand/rand_uniform.c#L13-L99

This PR also fixes some bugs found while developing it. This is a
replacement for #50203 and fixes
the issues found by @IanButterworth with both rngs

C rng
<img width="1011" alt="image"
src="https://github.com/user-attachments/assets/0dd9d5f2-17ef-4a70-b275-1d12692be060">

New scheduler rng
<img width="985" alt="image"
src="https://github.com/user-attachments/assets/4abd0a57-a1d9-46ec-99a5-535f366ecafa">

~On my benchmarks the julia implementation seems to be almost 50% faster
than the current implementation.~
With oscars suggestion of removing the debiasing this is now almost 5x
faster than the original implementation. And almost fully branchless

We might want to backport the two previous commits since they
technically fix bugs.

---------

Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
domain:randomness Random number generation and the Random stdlib
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants