-
Notifications
You must be signed in to change notification settings - Fork 17.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: sync: add PLocalCache #69229
Comments
Related Issues and Documentation
(Emoji vote if this was helpful or unhelpful; more detailed feedback welcome in this discussion.) |
Sounds like a dup of #8281. |
This idea is interesting, but I think it stops just short of being able to cover many more use-cases, without much of an increase in complexity. For example, with a hook into where two values are in conflict (similar to the Some other notes:
I think this proposal is closely related to #18802 actually, and not far from where I wanted to explore next with being able to access local values without synchronization. |
I'd prefer leaving p-local cache as simple as possible, so it solves the original issue described above - to provide an efficient and cpu-scalable mechanism for caching the state of various CPU-bound parsers and encoders. It is expected that the state may be lost at any time (for example, when GOMAXPROCS changes, when the goroutine is re-scheduled to other P or when somebody forgets returning the state to the cache), so it could be re-constructed when needed. Other use cases like #8281 and #18802 should be covered separately, since they are more complicated and they have no clear solution yet. |
Proposal Details
The issue
High-performance code which scales linearly with the number of CPU cores usually needs per-CPU caches for holding some per-CPU state in order to avoid costly inter-CPU synchronization. The state can be re-computed at any time, but the computation may take additional CPU time and other resources, so it is more efficient to cache the computed state per each CPU core and then re-use it.
The
sync.Pool
can be used as per-CPU cache, but it has the following issues in this context:sync.Pool.Get()
tries stealing an object from other CPUs if the object is missing in the current P. This leads to costly inter-CPU synchronization. The cost of this synchronization increases with the number of available CPU cores.sync.Pool.Put()
may store multiple objects at the same P. This leads to excess memory usage when at most one object is needed per P.sync.Pool.Put()
also triggers expensive inter-CPU synchronization if P already contains an object.sync.Pool
may drop cached objects at every GC cycle, so the caller needs to spend additional CPU time for re-creating the object.The solution
To add
sync.PLocalCache
struct with the following API:Implementation details
sync.PLocalCache
may be implemented in the way similar tosync.Pool
, but without the following abilities:Put()
is called on the storage with already existing P-local object, then just ignore the new object.sync.PLocalCache
doesn't exceedGOMAXPROCS
, e.g. it is bounded, and it is expected that the user continuously accesses the cached objects. So there is little sense in periodic cleanup of the cache. All the cached objects will be removed after the correspondingsync.PLocalCache
is destroyed by garbage collector.The property of having at most one P-local object in the cache narrows down the applicability of the
Get() ... Put()
to CPU-bound code without context switches (e.g. without IO, expensive syscalls and CGO calls). This minimizes chances of context switch during the execution of the code betweenGet()
andPut()
, so the cached objects will be successfully re-used by this code. For example, it is great to usesync.PLocalCache
for scalable random number generator with per-P (aka per-CPU) state. It is also great to usesync.PLocalCache
for various CPU-bound parsers, encoders and compressors with some non-trivial state, which can be cached in the P-local cache.On the other hand, if the chances of context switch between
Get()
andPut()
calls are high, then this increases chances thatGet()
will returnnil
most of the time. This forces the user's code to spend additional CPU time on object re-creation. The re-created object will be dropped most of the time onPut()
call, since there are high chances that there is another P-local object is put in the cache by concurrently running goroutines. In such cases it is better to usesync.Pool
instead ofsync.PLocalCache
.Example usage
See also #65104 . Now I think it is better to provide a separate entity with clear semantics than complicating the semantics of
sync.Pool
and/or trying to efficiently cover multiple different cases withsync.Pool
.Generics
It may be good providing generic-based
sync.PLocalCache[T any]
, but it is OK to provide non-generic implementation to be consistent withsync.Pool
.The text was updated successfully, but these errors were encountered: