-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: runtime: expose current thread id or processor id #18590
Comments
Historically we have always rejected goroutine-local information. Your proposal is incomplete: you need to define exactly when the value is permitted to change, and you need to do so without overly constraining the runtime. Right now a g is permitted to move to a different m or p at any function call, which includes several implicit function calls inserted by the compiler for things like map opertions. In order to address #10958 it is likely that in future a releases a g will be permitted to move to a different m or p in any loop. It's not enough to define The discussion in #17973 may be of interest. |
Result of |
If runtime.Mid changes (e.g., because the goroutine is now being run on a new OS thread), how do you make sure a goroutine continues to use the correct bucket? In case the answer is that each goroutine only calls runtime.Mid once, then why not just use math/rand.Int? |
It is just hint to select a bucket. Per-shard mutex still will be acquired, so there is no "fail" if runtime.Mid will be changed after call. But most of time it will be not changed, and this will help to reduce mutex contention. Problem with math/rand.Int is single global mutex around this call. See my previous closed proposal: #18514
|
Does |
I bet |
Somewhat related note. One way to make a thread safe and reasonably
scalable random generator using currently available in the standard library
functionality is by pooling rand.Rand instances in sync.Pool:
https://github.com/kostya-sh/talk-golangsg-scalability/blob/master/benchmarks/rand_test.go
…On Jan 10, 2017 4:25 PM, "Sokolov Yura" ***@***.***> wrote:
make sure a goroutine continues to use the correct bucket?
It is just hint to select a bucket. Per-shard mutex still will be
acquired, so there is no "fail" if runtime.Mid will be changed after call.
But most of time it will be not changed, and this will help to reduce mutex
contention.
Problem with math/rand.Int is single global mutex around this call. See my
previous closed proposal: #18514
<#18514>
BenchmarkFastRand-4 500000000 3.49 ns/op
BenchmarkFastRand-4 500000000 3.49 ns/op
BenchmarkMathRand-4 10000000 162 ns/op
BenchmarkMathRand-4 10000000 159 ns/op
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18590 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AGy9A73TrfgcihZjqfmkzuspR9S9FgBLks5rQ7D5gaJpZM4LfTNe>
.
|
Probably, |
@kostya-sh Here is example of usage of results are:
It shows that using |
Looks like exposing |
Looks like So why not expose it to regular Go users? (of cause, |
Because Go considers simplicity a virtue. When there are trade-offs between simplicity and performance, Go considers simplicity. You seem to have a different value trade-off, and it's obvious you prefer performance above all else. That's probably why many of your proposals to Go end up getting rejected. Please keep that in mind for future proposals. We understand that things can be always be faster, and we want things to be fast, but only if things are also simple and have clean APIs. We do not want to document and support and export all possible innards. |
But such thing as P.id is very simple thing. Does anyone expect changing of Go runtime scheduler core schema? Will |
"just keep pushing" :-) |
Pushing the same ideas over and over is actually more likely to annoy people. Please try to understand the space of trade-offs that Go strives for.
That's not the point. We don't expose every "simple thing" just because it's simple. We also have to then document it, support it, test it, and keep it available forever. And we have to consider whether it's redundant with other API, or will be misused, etc.
It has in the past. |
@funny-falcon My main impression from this thread is it's not obvious that the design space has been exhausted. E.g., you could already locklessly assign a unique ID to each worker yourself. Why have you ruled that out? |
@mdempsky single lockless approach awailable in Go is atomic.AddUintxx , but it is also doesn't scale with a lot of cpu cores cause of cacheline contention. Any "lock" or "lockless" technique will fail to scale without hint about current scheduling item. That is why it were used for @jonhoo uses CPUID instruction and parsing '/proc/cpuinfo' , but it works only on linux x86 : https://github.com/jonhoo/drwmutex |
The Go runtime is happy to use per-P or per-M data in support of specific, simple, well-defined functionality that can, if necessary, be provided in other ways. If you look back at the history of To me your proposal seems very loosely defined. It's nearly indistinguishable from "give me a random number in the range 0 to |
I pushed different ideas. Yes, I am annoying a bit. But someone still should push strange ideas. At least, there will be documented rejection of such idea.
That is why here is discussion of proposal: to decide worthwhile it or not.
Given two most successful current green thread runtimes (Go and Erlang) has similare structure (at least, they both has per-core schedulers - P ), it is quite unfortunately for next big change. I could be mistaken. |
It's been shown that you do not need this feature for your original use
case of a random number generator.
Why do you need this feature? Please give specific examples.
…On Wed, 11 Jan 2017, 08:31 Sokolov Yura ***@***.***> wrote:
Pushing the same ideas over and over is actually more likely to annoy
people.
I pushed different ideas. Yes, I am annoying a bit. But someone still
should push strange ideas. At least, there will be documented rejection of
such idea.
We don't expose every "simple thing" just because it's simple.
That is why here is discussion of proposal: to decide worthwhile it or not.
It has in the past.
Given two most successful current green thread runtimes (Go and Erlang)
has similare structure (at least, they both has per-core schedulers - P ),
it is quite unfortunately for next big change. I could be mistaken.
—
You are receiving this because you are subscribed to this thread.
Reply to this email directly, view it on GitHub
<#18590 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcA6MNszY11TBDnJ2wCJ_kIqz3zQriks5rQ_iggaJpZM4LfTNe>
.
|
@davecheney Original my use-case is still rpc. I just realized that I saw "per-cpu local info" requests at least several times during discussion of this and previous proposal (walking by links), so it doesn't look like it is just my crazy need. And sharding by |
Api for "Pid()" is "give me a number, that approximates scheduling internals: cpu number or scheduler (P), - so that two goroutines that calls this method simultaneously (ie in parallel) will receive different numbers with very high probability, so if I use it for choosing mutex or other synchronization primitive, it will likely to be non-contended. Preferable it should be bound with understandable small value (num-of-cpu cores or GOMAXPROCS)." |
It sounds like you're proposing a feature because it might be useful. IMO
this is a short road to C++17. Can you give a specific example where you
need this feature today, maybe we can suggest a better way to achieve this
in Go without adding a low level runtime hook.
…On Wed, Jan 11, 2017 at 8:50 AM, Sokolov Yura ***@***.***> wrote:
@davecheney <https://github.com/davecheney> Original my use-case is still
rpc. I just realized that runtime.Pid() is simply better.
Database, for which I'm maintaining connector, may handle 1Mrps easily,
and at this rate every possibility for improvement is worthwhile.
I saw "per-cpu local info" requests at least several times during
discussion of this and previous proposal (walking by links), so it doesn't
look like it is just my crazy need. And sharding by runtime.Pid() looks
like very close approximation, even if Mutex is still involved (ie no
pinning like for sync.Pool).
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18590 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AAAcAwAxTq8neS9c3PU6VUA2tHZJSym0ks5rQ_01gaJpZM4LfTNe>
.
|
Currently I choose shard for putting request into by atomic increment. But it means, even if goroutine is not preempted, it will spread its requests across many shards. (Shard - is datastructure that aggregates requests before network writer writes them into connection). With Another obvious use-case for Pid is statistic collector: without such hint even atomic operations will fail to scale on multi-core system. With such hint statistic will scale smoothly. |
Please, can we talk about your specific problem, not other problems that other people may be able to solve with this feature. |
Let it be called not // ProcHint gives a hint about current scheduler item
// (for example, cpu nunber, or id of internal scheduler).
// ProcHint is bounded by 0 <= ProcHint < ProcHintMax
// You may use it to reduce lock contention if you may allocate
// ProcHintMax independent mutexes or channels and
// then choose them using ProcHint.
func ProcHint() int
// ProcHintMax gives upper bound of ProcHint.
// Note: it could be changed in runtime, for example,
// if you change GOMAXPROCS or if you hot-swap cpu.
func ProcHintMax() int |
On Jan 10, 2017 5:27 PM, "Sokolov Yura" <notifications@github.com> wrote:
Let it be called not Pid but ProcHint() :
// ProcHint gives a hint about current scheduler item (for example,
cpu nunber, or id of internal scheduler).// ProcHint is bounded by 0
<= ProcHint < ProcHintMax// You may use it to reduce lock contention
if you may allocate ProcHintMax independent mutexes or channels and
then choose them using ProcHint.func ProcHint() int
// ProcHintMax gives upper bound of ProcHint.
func ProcHintMax.// Note: it could be changed in runtime, for example,
if you change GOMAXPROCS or if you hot-swap cpu.
func ProcHintMax() int
This API can't be used safely.
You didn't define when each value could change.
Consider this:
a, b = runtime.ProcHint(), runtime.ProcHintMax()
According to the docs, there is no guarantee that a < b because ProcHintMax
can change at any time (before or after the other call)
You might get better result if you make it a single function returning two
values simultaneously, or just return a floating point in range [0, 1).
But still, the problem doesn't say that the hint should be dense (and how
dense?), that is, nothing prevent the runtime to return 1<<31-1 as max, and
only values < GOMAXPROCS as hint. Yet, that value is almost useless to
reduce lock contention.
I'm just giving one example why such an API is easy to informally propose
but hard to formally design and get accepted.
|
I understood, fixed allocator api could be simpler: //NewFixSharded preallocates all values by calling alloc function, and returns new FixSharded.
//FixSharded never changes its size, ie never allocates new value after construction.
NewFixShareded(alloc func() interface) *FixSharded {}
//NewFixShardedN preallocates exactly n values by calling alloc function, and returns new FixSharded.
NewFixSharededN(n int, alloc func() interface) *FixSharded {}
func (a *FixSharded) Get() interface{} {} |
Edit: |
I still don't know what that means. How about some pseudocode, or (better still) a link to some real code? |
Don't those steps introduce cross-thread contention anyway? (How are you keeping those calls colocated to the same thread as the putFuture call?) At any rate: that's very complicated, heavily asynchronous code. I would argue that it's not idiomatic for a Go program, and we should be optimizing for making idiomatic code more efficient rather than speeding up complicated code. A much simpler example (e.g. a self-contained benchmark with a similar request flow) would be helpful. |
"Simple should be simple, and complex should be possible". If you will not add possibilities for "complex cases", they still will be written in C++. |
Yes. But they are sharded by incoming response id, and it is much shorter than sending request, cause doesn't contain deserialization of response (and sending contains serialization of request). Cause of serialization into reusable batch buffer, CPU alignment is more important on request sending than on filling request with response bytes. |
A language trying too hard to be loved by everyone ends loved by few. |
@cznic two simple methods lowlevel could not destroy love for Golang. Impossibility to do something efficiently can. Number of those, who will directly write code using proposed methods, will be small. And this people actually knows what they wants. For those people low-level methods are usually better. But number of indirect users are much larger, cause number of users who will use libraries written by former people is more countable. |
I rest my case 😉 |
The example you've provided is so complex that it's very difficult to see whether your use-case is truly impossible to address efficiently (both with the standard library as it stands today and with the alternative proposals). Which is to say: one of the great strengths of Go is that it is reasonably efficient for programs implemented in a direct, simple style. In many cases, attempts to "optimize" Go programs using idioms from other languages (e.g. futures and callbacks) end up fighting against optimizations in the runtime (escape analysis, goroutine scheduling, the transaction-oriented collector, etc.). If you want to argue that something is necessary on efficiency grounds (and, let's be clear, the Go compiler and runtime have lots of room for improvement!), the first step is to describe the problem you're trying to solve as idiomatic code and show where the bottlenecks occur (e.g. using |
About idiomatic Go code:
People choose Go cause it is simpler and faster. But it is faster not only cause it is "compiled and statically typed", but because there is possibility to write fast libraries in... and usage of More possibilities to write faster libraries => more possibilities to write fast applications => more new users who want or need performance (hint: most of Go users). |
@bcmills show me RPC written in idiomatic Go and capable to perform 1M requests per second. Then I will never write any new silly proposal. I will just use/copy code that you show me. |
@funny-falcon There's no need to get hostile. The point of asking for idiomatic code is not that it does, today, what you want it to do. The point is that it becomes much easier to see where and why it does not do what you want it to do, and the solution that enables your use-case is more likely to benefit the Go community as a whole rather than just one niche application. And I disagree with what you wrote about The use of mostly synchronous functions is particularly important in that regard: synchronicity tends to produce good cache locality for server-style programs (with lots of hot "per-request" data, relatively little "background" data, and relatively few different kinds of tasks being performed), makes the behavior of the program clearer to the goroutine scheduler, and makes it a bit easier to thread parameters down through the call tree without introducing aliasing bugs (leading to, for example, the ability to allocate the "per thread" data at the top of a goroutine and pass it down through that goroutine rather than scattering map lookups throughout the code). |
@bcmills I'm not even-tempered, but I'm not hostile. |
I'm closing now. May be I'm mistaken somewhere. |
I'd just expose a single function in // PID returns an id currently associated with the CPU core.
//
// The function may return distinct values for the same CPU core in the long run.
// There are no guarantees on the maximum and minimum id values.
func PID() uint32 {
return getg().m.p.ptr().id
} This functionality is the core of timers scalability CL - https://go-review.googlesource.com/#/c/34784/ . |
@valyala, you did it! thank you for timers!
That is what I named I agree, single functions could be already enough. |
@bradfitz @bcmills technique used by @valyala to improve timers at https://go-review.googlesource.com/#/c/34784/ exactly shows why I agree with @valyala that |
The valid uses of this function are far outweighed by the buggy invalid uses. There are of course uses of per-P data inside the runtime. That's not an argument for exposing P numbers outside the runtime. I would rather add a few higher-level APIs that are easier to use correctly, like the ShardedValue proposal, than add one low-level API that is very hard to use correctly. |
Hey guys,
|
@happilymarrieddad , no it doesn't . There is only GOMAXPROCS of possible running schedulers (unless one use runtime.LockOsThread), and thousands of native threads (if some syscall or C library locks for a long time). |
Often there is a need to reduce lock contention in algorithms.
Usual way to acomplish this is sharding, ie several separate locks protects non-intersecting parts of greater state.
But then there is a need for fast (and preferably non-locking) way to determine shard id.
And it will be great, if such shard id will further reduce lock contention.
CPUID it returns APIC-ID which are not consecutive, so probably runtime should analyze CPU topology to convert APIC-ID to consecutive cpu-number. On the other way, given some architectures allows hot cpu swapping, probably it is better to return APIC-ID as is.
getg().m.id
asruntime.Mid()
. Cause program could contain a lot of threads (due to system calls, or cgo calls), there is a need to additional fast random value, so I propose to additionally exposeruntime.fastrand()
asruntime.Fastrand()
(mixing it with secure random key to hide originalfastrand
value)."Best way" could be implemented in some "golang/x/" package. "Simple way" needs to implement in "runtime" at least wrapper around "getg().m.id", to link with this wrapper from external library (if there is a way to call private "runtime" function from external package).
The text was updated successfully, but these errors were encountered: