proposal: non-locking FastRand #18514

funny-falcon · 2017-01-04T20:14:52Z

Sometimes there is a need to generate short-living non-ovelapping ids from several goroutines ie. request_id for some kind of RPC protocol.

"Obvious" way is to use atomic increment. But it forces write contention on cache-line.

Better way could be to allow to use set of per-thread non-overlapping generators with relatively long period.

Example implementation is here https://go-review.googlesource.com/#/c/34781/ . It implements non-overlapping per-thread generators with period 1<<44 of non-repeating values. So one may safely rely on generation of 1<<44 non-repeating values.

Implementation puts method into runtime package, but it certainly should be in math/rand.

BenchmarkFastRand-4             500000000                3.49 ns/op
BenchmarkFastRand-4             500000000                3.49 ns/op
BenchmarkMathRand-4             10000000               162 ns/op
BenchmarkMathRand-4             10000000               159 ns/op
BenchmarkRandSource-4           300000000                5.26 ns/op
BenchmarkRandSource-4           300000000                5.27 ns/op
BenchmarkRandAtomic-4           50000000                32.0 ns/op
BenchmarkRandAtomic-4           50000000                31.3 ns/op

The text was updated successfully, but these errors were encountered:

bradfitz · 2017-01-04T20:17:57Z

Is "sometimes" enough to warrant it being in the standard library?

See https://golang.org/doc/faq#x_in_std

Could this live outside the standard library?

funny-falcon · 2017-01-04T20:27:23Z

If there is a way to obtain thread-local value in a fast way, then yes.

Thread-local value could be safely accessed without lock (given function doesn't call any other function).

Currently there is no way to write-access any shared value without lock outside runtime package (but this access could be transferred to other package in standard library, like math/rand).

funny-falcon · 2017-01-04T20:35:40Z

Most of "code" could be implemented outside of standard library.

But the need of fast non-locking random could not be thrown away.

Probably, if runtime.fastrand() will be runtime.FastRand() it will be already enough, despite it doesn't provide "long non-overlapping period".

randall77 · 2017-01-04T20:50:46Z

Your RPC example isn't motivating me.
I find it hard to believe that doing a single atomic increment is going to be noticeable compared to the cost of an RPC.
I also don't think that random is really what you want here - RPC IDs typically need to be distinct, at least within a session or whatever. It sounds like you're relying on the non-repeating of this generator, but that's not "random". And if it isn't random, it's hard to fix a requirement for what the non-random requirements are if we are to bake it into the standard library.

dsnet · 2017-01-04T20:58:35Z

I much rather see #8281 be solved and see something like this appear as a third-party package.

davecheney · 2017-01-04T21:36:35Z

As I understand it, the "Spliterator" let's you take one random source and bifurcate it such that each half returns a unique sequence of numbers. I don't see why this has to be wrapped up with notions of per thread or per core; just split the random source, keep half for yourself for the next split and give the other half away. This would also play well with the race detector as it will spot when a random sou XE is shared accidentally.

…

On Thu, 5 Jan 2017, 07:58 Joe Tsai ***@***.***> wrote: I much rather see #8281 <#8281> be solved and see something like this appear as a third-party package. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#18514 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AAAcA7hre6QFZF_qP2erObeCwUPPo8M6ks5rPAf8gaJpZM4LbBWd> .

dr2chase · 2017-01-04T21:43:54Z

Is a Spliterator anything like a Deterministic Parallel RNG? I think a DPRNG might be what we want, not sure it needs to be in the library.

minux · 2017-01-04T21:55:58Z

Yeah, the correct solution to this problem is not per-cpu state, but splittable RNG (essentially explicit per-goroutine RNG.) Search, for example, SPLITMIX used by Java.

funny-falcon · 2017-01-05T05:09:44Z

@minux In example implementation I used construction similar to SPLITMIX (I even add link to Java's SplittableRandom).

Problem with "splittable random" without per-state cpu is how to get that "split" without lock contention? How to get safely "per-goroutine" RNG?

Common techinuque to reduce lock contention is sharding. But there is no way to choose "random" shard in Golang without another lock or atomic operation.

Probably, this original proposal is overkill.

What about proposal this way:

expose runtime.fastrand() as runtime.FastRand() with comment about "low-quality" of this generator (short period, predictable dependency between values).
expose getg().m.id as runtime.MID() or runtime.ThreaID().

Given this methods are exposed, it will be quite easy to perform light-weight lock-sharding. It will be certainly enough for my usecase.

@dsnet perhaps, it could reduce need for real cpu-local storage of #8281 by light-weight lock-sharding.

PS. Why runtime.fastrand() is still in assembly? My version of FastRand in inlined, and I think, fastrand will be inlined too if implemented in Go. Assembly contains "conditional load" instruction, so perhaps main barrier is ability of Go optimiser to generate this instruction, yes?

josharian · 2017-01-05T05:15:04Z

Why runtime.fastrand() is still in assembly?

Good question. Want to try rewriting it in Go and benchmark the results? Having it in pure Go would make this optimization less ugly as well.

funny-falcon · 2017-01-05T05:42:30Z

If there will be a way to get cpu number, as suggested in #8281 , it will also help to reduce lock-contention.

funny-falcon · 2017-01-05T06:49:04Z

I've implemented fastrand in go, and it looks to be faster on i386 and x86_64.
Changeset: https://go-review.googlesource.com/34782

dr2chase · 2017-01-05T14:08:22Z

A DPRNG ought to require no locks at all, nor thread-local storage if you're willing to expose the state explicitly. At any place in your code where you'd spawn a goroutine, you split the RNG and pass the child in as an explicit parameter to the go'd func. Each DPRNG is thus private to a single thread and requires no locking at all. Besides lock-freedom, a DPRNG also gets you reproducible-by-start-seed runs of random numbers in a multithreaded program. Random number generation is not my poster child for thread local state.

…

On Thu, Jan 5, 2017 at 12:42 AM, Sokolov Yura ***@***.***> wrote: If there will be a way to get cpu number, as suggested in #8281 <#8281> , it will also help to reduce lock-contention. — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#18514 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AB1vJ33aMX9EB2lDfFMYoo3GJGBE3cN0ks5rPILRgaJpZM4LbBWd> .

RLH · 2017-01-05T14:54:42Z

Just to add context to David's call for DPRNG here is a paper discussing the approach. http://supertech.csail.mit.edu/papers/dprng.pdf Charles E. Leiserson, Tao B. Schardl, and Jim Sukha. 2012. Deterministic parallel random-number generation for dynamic-multithreading platforms. In *Proceedings of the 17th ACM SIGPLAN symposium on Principles and Practice of Parallel Programming* (PPoPP '12). ACM, New York, NY, USA, 193-204. DOI= http://dx.doi.org/10.1145/2145816.2145841

…

On Thu, Jan 5, 2017 at 9:08 AM, dr2chase ***@***.***> wrote: A DPRNG ought to require no locks at all, nor thread-local storage if you're willing to expose the state explicitly. At any place in your code where you'd spawn a goroutine, you split the RNG and pass the child in as an explicit parameter to the go'd func. Each DPRNG is thus private to a single thread and requires no locking at all. Besides lock-freedom, a DPRNG also gets you reproducible-by-start-seed runs of random numbers in a multithreaded program. Random number generation is not my poster child for thread local state. On Thu, Jan 5, 2017 at 12:42 AM, Sokolov Yura ***@***.***> wrote: > If there will be a way to get cpu number, as suggested in #8281 > <#8281> , it will also help to reduce > lock-contention. > > — > You are receiving this because you commented. > Reply to this email directly, view it on GitHub > <#18514 (comment)>, or mute > the thread > <https://github.com/notifications/unsubscribe-auth/ AB1vJ33aMX9EB2lDfFMYoo3GJGBE3cN0ks5rPILRgaJpZM4LbBWd> > . > — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#18514 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/AA7Wnxg4fPfPwcf8OEo1esAa46ufDT3fks5rPPlegaJpZM4LbBWd> .

rsc · 2017-01-09T21:43:59Z

@robpike has a pending CL for a PCG-based generator (https://go-review.googlesource.com/#/c/10161/), which would be very lightweight and maybe cheap enough to duplicate and create a new one per-goroutine. @dr2chase's suggestion is also, as I understand it, lightweight and cheap to pass around.

There is a separate question in this proposal about whether this is important enough to have a per-thread instance hiding in the runtime to avoid even needing to thread one through your code explicitly. That's more compelling with the giant rand state we have today, but if you're using PCG, it's much less necessary. In general it seems like we should avoid new magical per-thread state and prefer explicit notation in the code for what is going on.

Like @randall77 said, it's hard to believe that threading one of these through the code (or a single atomic increment) is more expensive than the other work required in an RPC.

It doesn't seem like we should do anything to support these cases in the runtime.

funny-falcon · 2017-01-10T06:18:32Z

@rsc @robpike If choose lightweight general purpose prng than I'd prefer xoroshiro:
http://xoroshiro.di.unimi.it
http://xoroshiro.di.unimi.it/xoroshiro128plus.c
It is faster than PCG, has 128 bit state, and has jump function to effectively split state.

In this proposal I used SPLITMIX variant cause it could be easily tweaked to produce non-intersecting values from splitted states.

But either way, now I think it is better to close this proposal, and open new: for exporting getg().m.id as runtime.MID(), and probably exposing runtime.fastrand() as runtime.Fastrand() (mixed with secure key to hide its original value).

bradfitz · 2017-01-10T07:32:58Z

Closing per OP's request.

bradfitz added this to the Proposal milestone Jan 4, 2017

bradfitz added the Proposal label Jan 4, 2017

bradfitz closed this as completed Jan 10, 2017

funny-falcon mentioned this issue Jan 10, 2017

proposal: runtime: expose current thread id or processor id #18590

Closed

golang locked and limited conversation to collaborators Jan 10, 2018

gopherbot added the FrozenDueToAge label Jan 10, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

proposal: non-locking FastRand #18514

proposal: non-locking FastRand #18514

funny-falcon commented Jan 4, 2017

bradfitz commented Jan 4, 2017

funny-falcon commented Jan 4, 2017

funny-falcon commented Jan 4, 2017 •

edited

Loading

randall77 commented Jan 4, 2017

dsnet commented Jan 4, 2017

davecheney commented Jan 4, 2017 via email

dr2chase commented Jan 4, 2017

minux commented Jan 4, 2017 via email

funny-falcon commented Jan 5, 2017

josharian commented Jan 5, 2017

funny-falcon commented Jan 5, 2017

funny-falcon commented Jan 5, 2017

dr2chase commented Jan 5, 2017 via email

RLH commented Jan 5, 2017 via email

rsc commented Jan 9, 2017

funny-falcon commented Jan 10, 2017

bradfitz commented Jan 10, 2017

proposal: non-locking FastRand #18514

proposal: non-locking FastRand #18514

Comments

funny-falcon commented Jan 4, 2017

bradfitz commented Jan 4, 2017

funny-falcon commented Jan 4, 2017

funny-falcon commented Jan 4, 2017 • edited Loading

randall77 commented Jan 4, 2017

dsnet commented Jan 4, 2017

davecheney commented Jan 4, 2017 via email

dr2chase commented Jan 4, 2017

minux commented Jan 4, 2017 via email

funny-falcon commented Jan 5, 2017

josharian commented Jan 5, 2017

funny-falcon commented Jan 5, 2017

funny-falcon commented Jan 5, 2017

dr2chase commented Jan 5, 2017 via email

RLH commented Jan 5, 2017 via email

rsc commented Jan 9, 2017

funny-falcon commented Jan 10, 2017

bradfitz commented Jan 10, 2017

funny-falcon commented Jan 4, 2017 •

edited

Loading