-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement faster thread local rng for scheduler #55501
Conversation
base/partr.jl
Outdated
# this process, so each each word beyond the first has a probability | ||
# of 2^-32 of not terminating the process. That is, we're extremely | ||
# likely to stop very rapidly. | ||
for _ in 1:10 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like we should be able to delete this loop entirely. Without it, we have a bias of 2^-32 which seems like it should be plenty low for the purposes of guaranteeing uniform scheduling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
7223ddf
to
261ec6e
Compare
Hmm, this seems to have made i686 very unhappy. Ok the issue is that pointer load. 32 bit has different alignment which is what is blowing this up |
Sounds like that should be a separate PR |
9fe0fb2
to
6aa9a35
Compare
Potentially naive question, but do we have a good benchmark to demonstrate that a faster scheduler rng actually speeds up scheduling? I ask because in my testing of #50203 I saw the surprising behavior that the threaded fib got slower the faster the rng used (with no changes to allocations)
|
I don't think it will affect it too too much. Because if the rand call is super hot then things aren't going so well elsewhere :) |
There are already merge conflicts |
@nanosoldier |
The package evaluation job you requested has completed - possible new issues were detected. |
Do we need to run benchmarks, too? |
I don't think any benchmark is gonna show this. At least no current one. Calling the function is 5x faster that's all I can say |
Implement optimal uniform random number generator using the method proposed in swiftlang/swift#39143 based on OpenSSL's implementation of it in https://github.com/openssl/openssl/blob/1d2cbd9b5a126189d5e9bc78a3bdb9709427d02b/crypto/rand/rand_uniform.c#L13-L99 This PR also fixes some bugs found while developing it. This is a replacement for #50203 and fixes the issues found by @IanButterworth with both rngs C rng <img width="1011" alt="image" src="https://github.com/user-attachments/assets/0dd9d5f2-17ef-4a70-b275-1d12692be060"> New scheduler rng <img width="985" alt="image" src="https://github.com/user-attachments/assets/4abd0a57-a1d9-46ec-99a5-535f366ecafa"> ~On my benchmarks the julia implementation seems to be almost 50% faster than the current implementation.~ With oscars suggestion of removing the debiasing this is now almost 5x faster than the original implementation. And almost fully branchless We might want to backport the two previous commits since they technically fix bugs. --------- Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
Implement optimal uniform random number generator using the method proposed in swiftlang/swift#39143 based on OpenSSL's implementation of it in https://github.com/openssl/openssl/blob/1d2cbd9b5a126189d5e9bc78a3bdb9709427d02b/crypto/rand/rand_uniform.c#L13-L99
This PR also fixes some bugs found while developing it. This is a replacement for #50203 and fixes the issues found by @IanButterworth with both rngs
C rng
New scheduler rng
On my benchmarks the julia implementation seems to be almost 50% faster than the current implementation.With oscars suggestion of removing the debiasing this is now almost 5x faster than the original implementation. And almost fully branchless
We might want to backport the two previous commits since they technically fix bugs.