-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: sync: add GetLocal() method to sync.Pool
in order to return only P-local objects
#65104
Comments
To me it looks like a worst version of #18802 because it unnecessarily tie this logic to |
The current proposal is aimed at solving a single practical task - to have an ability to efficiently use The #18802 proposal, on the other hand, has completely different purpose - to provide access to per-CPU objects. This proposal has no clear vision yet, not saying about the implementation. For example, as I understood from the #18802 proposal, it wants creating some magic sharded object, which will have exactly one shard per P. On the other hand, the |
@valyala thx for the explanation in what way it is different. About your proposal I don't think returning |
Note that if sync.Pool also supports stealing objects from other P cache, a goroutine Local store or P store may be required, to prevent one goroutine from getting objects in a P-local and another goroutine from stealing objects from that P-local., so that {Get,Put}Local does not require atomic operation synchronization, because this way, the same block of memory will not have unlocked or not have atomic synchronized reads and writes. |
You've identified an interesting issue, but I don't think putting the problem on the user by adding an API is the right answer. I think it would be really unclear to users when to call Instead, I think we could improve |
The
So I don't think that adding var p sync.Pool
func f() {
v := p.Get()
doSomeCPUBoundWork(v)
p.Put(v)
} Then this case can be optimized to: var p sync.Pool
func f() {
v := p.GetLocal()
doSomeCPUBoundWork(v)
p.PutLocal(v)
} The code with |
Is there any situation where someone would mix |
In theory such situations can occur, while they are pretty rare. For example, if the pool is simultaneously used in CPU-bound and IO-bound cases: var p sync.Pool
func f1() {
v := p.GetLocal()
doSomeCPUBoundWork(v)
p.PutLocal(v)
}
func f2() {
v := p.Get()
doSomeIOBoundWork(v)
p.Put(v)
} The func f() {
v := p.Get()
workerCh <- v
}
func worker() {
for v := range workerCh {
doSomeWork(v)
p.Put(v)
}
}
var workerCh = func() chan any {
ch := make(chan any)
go worker()
return ch
}() |
After closer look at the current func f() {
v := p.GetLocal()
doSomeIOBoundWork(v)
p.PutLocal(v)
}
var p sync.Pool If this code is executed concurrently by all the P workers, then it is faster to get P-local object instead of stealing an object from other P caches in the pool - there are high chances that P-local cache already contains an object put there via So, probably, it is good idea to disable stealing objects from other P caches at var p = &sync.Pool{
// Explicitly allow stealing an object from other P caches
AllowStealing: true,
}
func f() {
v := p.Get()
workCh <- v
}
var workCh = func() chan any {
ch := make(chan any)
go func() {
for v := range ch {
doSomeWork(v)
p.Put(v)
}
}()
return ch
}() If the |
It looks like the following idea may automatically cover the
Then
This algorithm prefers returning P-local objects and returning The question is which
|
Good idea. We have met some similiar problem. When put rate is slower than get rate, the sync.pool would hold much memory until next gc. This method |
Proposal Details
Currently
sync.Pool.Get()
works in the following way:This logic works great on systems with small number of CPU cores, but it may significantly slow down on systems with many CPU cores because of the following reasons:
sync.Pool.Get()
, its contents may be still missing in local CPU cache. So further work with this object may be much slower comparing to the work with P-local object.sync.Pool
increases with GOMAXPROCS (e.g. the number of CPU cores), since all the P-local caches will have at least a single object only after all the P workers simultaneously execute the same CPU-bound code between Get() and Put() calls. E.g. the problem withsync.Pool
inefficiency increases with the number of CPU cores.It looks like that the way to go is to disallow stealing objects from other P workers. But this may slow down some valid use cases for
sync.Pool
, whenPool.Get()
andPool.Put()
are called for the same pool from different sets of goroutines, which run on different sets of P. Disallowing stealing objects from other P workers will result in excess memory allocations atPool.Get()
call side, since P-local cache will be always empty.Given these facts, it would be great to add
GetLocal()
method tosync.Pool
, which must return nil if the object isn't found in P-local cache, without an attempt to steal the object from other P caches. Then theGetLocal()
method can be used instead ofGet()
in performance-critical code, which callsPut()
for the object returned from the pool on the same goroutine (and P) most of the time.Update: the current implementation of
sync.Pool.Put()
already puts the object only to P-local storage, but this isn't documented and can be changed in the future, so it may be worth addingPutLocal()
method in order to make the API more consistent.Summary
sync.Pool
scalability and performance on systems with high number of CPU cores, by switching fromPool.Get
toPool.GetLocal
at CPU-bound code where the object is returned to the pool at the same goroutine where it has been obtained from the pool.The text was updated successfully, but these errors were encountered: