-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: non-locking FastRand #18514
Comments
Is "sometimes" enough to warrant it being in the standard library? See https://golang.org/doc/faq#x_in_std Could this live outside the standard library? |
If there is a way to obtain thread-local value in a fast way, then yes. Thread-local value could be safely accessed without lock (given function doesn't call any other function). Currently there is no way to write-access any shared value without lock outside |
Most of "code" could be implemented outside of standard library. But the need of fast non-locking random could not be thrown away. Probably, if |
Your RPC example isn't motivating me. |
I much rather see #8281 be solved and see something like this appear as a third-party package. |
As I understand it, the "Spliterator" let's you take one random source and
bifurcate it such that each half returns a unique sequence of numbers.
I don't see why this has to be wrapped up with notions of per thread or per
core; just split the random source, keep half for yourself for the next
split and give the other half away.
This would also play well with the race detector as it will spot when a
random sou XE is shared accidentally.
…On Thu, 5 Jan 2017, 07:58 Joe Tsai ***@***.***> wrote:
I much rather see #8281 <#8281> be
solved and see something like this appear as a third-party package.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18514 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA7hre6QFZF_qP2erObeCwUPPo8M6ks5rPAf8gaJpZM4LbBWd>
.
|
Is a Spliterator anything like a Deterministic Parallel RNG? I think a DPRNG might be what we want, not sure it needs to be in the library. |
Yeah, the correct solution to this problem is not per-cpu state, but
splittable RNG (essentially explicit per-goroutine RNG.)
Search, for example, SPLITMIX used by Java.
|
@minux In example implementation I used construction similar to SPLITMIX (I even add link to Java's SplittableRandom). Problem with "splittable random" without per-state cpu is how to get that "split" without lock contention? How to get safely "per-goroutine" RNG? Common techinuque to reduce lock contention is sharding. But there is no way to choose "random" shard in Golang without another lock or atomic operation. Probably, this original proposal is overkill. What about proposal this way:
Given this methods are exposed, it will be quite easy to perform light-weight lock-sharding. It will be certainly enough for my usecase. @dsnet perhaps, it could reduce need for real cpu-local storage of #8281 by light-weight lock-sharding. PS. Why runtime.fastrand() is still in assembly? My version of FastRand in inlined, and I think, fastrand will be inlined too if implemented in Go. Assembly contains "conditional load" instruction, so perhaps main barrier is ability of Go optimiser to generate this instruction, yes? |
Good question. Want to try rewriting it in Go and benchmark the results? Having it in pure Go would make this optimization less ugly as well. |
If there will be a way to get cpu number, as suggested in #8281 , it will also help to reduce lock-contention. |
I've implemented fastrand in go, and it looks to be faster on i386 and x86_64. |
A DPRNG ought to require no locks at all, nor thread-local storage if
you're willing to expose the state explicitly. At any place in your code
where you'd spawn a goroutine, you split the RNG and pass the child in as
an explicit parameter to the go'd func. Each DPRNG is thus private to a
single thread and requires no locking at all. Besides lock-freedom, a
DPRNG also gets you reproducible-by-start-seed runs of random numbers in a
multithreaded program.
Random number generation is not my poster child for thread local state.
…On Thu, Jan 5, 2017 at 12:42 AM, Sokolov Yura ***@***.***> wrote:
If there will be a way to get cpu number, as suggested in #8281
<#8281> , it will also help to reduce
lock-contention.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub
<#18514 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AB1vJ33aMX9EB2lDfFMYoo3GJGBE3cN0ks5rPILRgaJpZM4LbBWd>
.
|
Just to add context to David's call for DPRNG here is a paper discussing
the approach.
http://supertech.csail.mit.edu/papers/dprng.pdf
Charles E. Leiserson, Tao B. Schardl, and Jim Sukha. 2012. Deterministic
parallel random-number generation for dynamic-multithreading
platforms. In *Proceedings
of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel
Programming* (PPoPP '12). ACM, New York, NY, USA, 193-204. DOI=
http://dx.doi.org/10.1145/2145816.2145841
…On Thu, Jan 5, 2017 at 9:08 AM, dr2chase ***@***.***> wrote:
A DPRNG ought to require no locks at all, nor thread-local storage if
you're willing to expose the state explicitly. At any place in your code
where you'd spawn a goroutine, you split the RNG and pass the child in as
an explicit parameter to the go'd func. Each DPRNG is thus private to a
single thread and requires no locking at all. Besides lock-freedom, a
DPRNG also gets you reproducible-by-start-seed runs of random numbers in a
multithreaded program.
Random number generation is not my poster child for thread local state.
On Thu, Jan 5, 2017 at 12:42 AM, Sokolov Yura ***@***.***>
wrote:
> If there will be a way to get cpu number, as suggested in #8281
> <#8281> , it will also help to reduce
> lock-contention.
>
> —
> You are receiving this because you commented.
> Reply to this email directly, view it on GitHub
> <#18514 (comment)>, or
mute
> the thread
> <https://github.com/notifications/unsubscribe-auth/
AB1vJ33aMX9EB2lDfFMYoo3GJGBE3cN0ks5rPILRgaJpZM4LbBWd>
> .
>
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18514 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AA7Wnxg4fPfPwcf8OEo1esAa46ufDT3fks5rPPlegaJpZM4LbBWd>
.
|
@robpike has a pending CL for a PCG-based generator (https://go-review.googlesource.com/#/c/10161/), which would be very lightweight and maybe cheap enough to duplicate and create a new one per-goroutine. @dr2chase's suggestion is also, as I understand it, lightweight and cheap to pass around. There is a separate question in this proposal about whether this is important enough to have a per-thread instance hiding in the runtime to avoid even needing to thread one through your code explicitly. That's more compelling with the giant rand state we have today, but if you're using PCG, it's much less necessary. In general it seems like we should avoid new magical per-thread state and prefer explicit notation in the code for what is going on. Like @randall77 said, it's hard to believe that threading one of these through the code (or a single atomic increment) is more expensive than the other work required in an RPC. It doesn't seem like we should do anything to support these cases in the runtime. |
@rsc @robpike If choose lightweight general purpose prng than I'd prefer xoroshiro: In this proposal I used SPLITMIX variant cause it could be easily tweaked to produce non-intersecting values from splitted states. But either way, now I think it is better to close this proposal, and open new: for exporting |
Closing per OP's request. |
Sometimes there is a need to generate short-living non-ovelapping ids from several goroutines ie. request_id for some kind of RPC protocol.
"Obvious" way is to use atomic increment. But it forces write contention on cache-line.
Better way could be to allow to use set of per-thread non-overlapping generators with relatively long period.
Example implementation is here https://go-review.googlesource.com/#/c/34781/ . It implements non-overlapping per-thread generators with period 1<<44 of non-repeating values. So one may safely rely on generation of 1<<44 non-repeating values.
Implementation puts method into
runtime
package, but it certainly should be inmath/rand
.The text was updated successfully, but these errors were encountered: