-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: cacheline-diverse allocation #19025
Comments
We've been getting along OK by just making things >=1 cache line in size. It would help here to have concrete examples where this more general operation is necessary. |
A |
What I wrote two weeks ago is still true:
@bcmills, do you have data showing the need here? If not, I would suggest we close this issue and leave it for when a real need arises, so that we can do a more accurate cost-benefit analysis. What's here today seems to be all costs (new, complex, subtle API) and no benefits. It's also not clear that a general solution is needed. As Keith said, we already have a strategy for avoiding shared cache lines in a struct. Someone who wants N bytes.Buffers can already do
and allocate an array of those. For the specific case of bytes.Buffer, the tmp [64]byte may end up serving well enough as padding anyway, assuming these goroutines are writing enough to overflow into new memory (maybe we should move lastRead up above tmp to be more cache-friendly). Any large enough allocation is cache-line-aligned already, so most map buckets are already in separate cache lines (for example a map[*T]*U bucket is 8*(8+8) = 128 bytes on amd64, where we believe a cache line is 64 bytes). A map header is 48 bytes; we could possibly just add two words to every header if we knew there were important performance wins. Similarly, a channel header is 96 bytes, so 1.5 cache lines. We could expand it to 2 cache lines if that had important benefits. All of these possible changes could be done directly, given a clearly demonstrated need, without new API. |
I had thought that I had a good use-case, but when I dug into the details it turned out not to be necessary for the moment. I'm fine with closing this for now (and reopening if/when there is more concrete data supporting a need). |
For certain applications (e.g. most applications of #18802, some possible implementations for #18177, and basically anything that farms out work to multiple goroutines in parallel), it is useful to be able to allocate a set of values with the constraint that they occupy disjoint cache lines.
For example, one might want to allocate
runtime.GOMAXPROCS(0)
worker goroutines each encoding to abytes.Buffer
. In that case, we want to ensure that none of those buffers induce false-sharing across caches.I can think of two fairly simple APIs for this: one using finalizers and maps and supporting a resizable set of values, and another simpler API using slices but supporting only a fixed set of allocations.
or
I have an implementation of the former using a
map[uintptr]bool
to record the first and last cache-line for each value allocated (and setting finalizers to retire entries in the map). It resolves collisions by keeping the colliding values live in a slice and allocating in a loop until it gets a non-collision; with runtime support, it could avoid that overallocation entirely.The latter API is trivial to implement in terms of the former, but could be more memory-efficient with a standalone implementation (wouldn't need to store the collision map over the long term).
With runtime support, it would be possible to extend this API to resizable built-in types (maps and slices). I do not see how to make that extension without the cooperation of the runtime.
The text was updated successfully, but these errors were encountered: