-
-
Notifications
You must be signed in to change notification settings - Fork 5.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement cong in Julia #50203
base: master
Are you sure you want to change the base?
Implement cong in Julia #50203
Conversation
Can we do something that doesn't need the PTLS? |
I think the challenge is that we want randomness that is independent of the task state, since we don't have "scheduler state" outside of the ptls that's the only option. I am more annoyed how slow this is due to the mod operation. |
Can we guarantee that the size of the heaps are powers of 2? If so, you can just use an |
It's the number of heaps and not the size of them, and that is equal to the number of threads. Which often is a power of two, but not guaranteed |
What about using |
Is this a valid benchmark? Update: ** not a fair test because the base commit is different ** this PR
master b88f64f (via juliaup)
|
Probably. I haven't checked the type-stability of this code after the recent rewrite. Since the allocation increased I assume we have one. @IanButterworth do you want to take this over? |
base/partr.jl
Outdated
cong(max::UInt32) = iszero(max) ? UInt32(0) : jl_rand_ptls(max) + UInt32(1) | ||
|
||
function jl_rand_ptls(max::UInt32) | ||
rngseed = Base.unsafe_load(Base.unsafe_convert(Ptr{UInt64}, Core.getptls()), 2) # TODO, less horrid |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@vtjnash any ideas on how to do this nicer?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I made it so we only unsafe_convert plts once. Is that nice enough?
Doesn't appear to be an allocation within
master
However I just locally updated this branch from master and it regressed to
|
Co-authored-by: Ian Butterworth <i.r.butterworth@gmail.com>
b5d065a
to
f1ea54e
Compare
f1ea54e
to
5176f22
Compare
So looking into this together with Ian we discovered that our rng is just very bad. I also discovered that function _cong(max::UInt64, seed::UInt64)
if max < 2
return UInt64(0), seed
end
mask = typemax(UInt64)
zeros = leading_zeros(max)
mask >>= zeros
bits = 8 * sizeof(UInt64) - zeros
while true
value = UInt64(69069) * seed + UInt64(362437)
seed = value
x = seed & mask
if (x < max)
return x, seed
end
bits_left = zeros
while bits_left >= bits
value >>= bits
x = value & mask
if x < max
return x, seed
end
bits_left -= bits
end
end
return x, seed
end |
Implement optimal uniform random number generator using the method proposed in swiftlang/swift#39143 based on OpenSSL's implementation of it in https://github.com/openssl/openssl/blob/1d2cbd9b5a126189d5e9bc78a3bdb9709427d02b/crypto/rand/rand_uniform.c#L13-L99 This PR also fixes some bugs found while developing it. This is a replacement for #50203 and fixes the issues found by @IanButterworth with both rngs C rng <img width="1011" alt="image" src="https://github.com/user-attachments/assets/0dd9d5f2-17ef-4a70-b275-1d12692be060"> New scheduler rng <img width="985" alt="image" src="https://github.com/user-attachments/assets/4abd0a57-a1d9-46ec-99a5-535f366ecafa"> ~On my benchmarks the julia implementation seems to be almost 50% faster than the current implementation.~ With oscars suggestion of removing the debiasing this is now almost 5x faster than the original implementation. And almost fully branchless We might want to backport the two previous commits since they technically fix bugs. --------- Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
Implement optimal uniform random number generator using the method proposed in swiftlang/swift#39143 based on OpenSSL's implementation of it in https://github.com/openssl/openssl/blob/1d2cbd9b5a126189d5e9bc78a3bdb9709427d02b/crypto/rand/rand_uniform.c#L13-L99 This PR also fixes some bugs found while developing it. This is a replacement for #50203 and fixes the issues found by @IanButterworth with both rngs C rng <img width="1011" alt="image" src="https://github.com/user-attachments/assets/0dd9d5f2-17ef-4a70-b275-1d12692be060"> New scheduler rng <img width="985" alt="image" src="https://github.com/user-attachments/assets/4abd0a57-a1d9-46ec-99a5-535f366ecafa"> ~On my benchmarks the julia implementation seems to be almost 50% faster than the current implementation.~ With oscars suggestion of removing the debiasing this is now almost 5x faster than the original implementation. And almost fully branchless We might want to backport the two previous commits since they technically fix bugs. --------- Co-authored-by: Valentin Churavy <vchuravy@users.noreply.github.com>
Moving cong to Julia safes a tiny bit in overhead,
but accessing the PTLS rngseed for the scheduler is a bit... ugly.