-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
proposal: sync: support for sharded values #18802
Comments
My own inclination is towards the non-blocking API with a bounded overflow list. A blocking API seems antithetical to the goal of reducing contention and may lead to performance anomalies if a goroutine or OS thread is descheduled while it has a shard checked out and a non-blocking API with a required combiner may prevent certain use cases (e.g., large structures, or uses that never read the whole sharded value.) It also devolves to the blocking API if the bound is 0. |
The proposal as written is rather abstract. I think it would help to examine the specific use cases that people have for such a thing. For example, it's clear that one use case is collecting metrics. Presumably the idea is that you have some sort of server, and it wants to log various metrics for each request. The metrics only need to be accumulated when they are reported, and reporting happens much less often than collection. Using a lock;update;unlock sequence will lead to lock contention. But (let's say) we need the metrics to be accurate. So the idea of sharding for this case is a lock;update;unlock sequence with a sharded lock, and an accumulate step that does lock;collect;zero;unlock for each sharded metric. That gives us the values we need while minimizing lock contention. One way to implement this use case is for the
For typical metrics the With this outline, we see that there is no need for What other uses are there for |
I had been considering a somewhat narrower API containing only The semantics would be similar to the non-blocking proposal: Because of the lack of exclusiveness, application code would still have to use That approach two a few advantages over the alternatives in the current proposal.
It has one disadvantage that I'm aware of:
Are there other tradeoffs for or against the narrower |
It doesn't even require immutability: "externally synchronized" and/or "atomic" would suffice, although "externally synchronized" carries the risk of lock-ordering issues. |
Anything that reduces values seems tricky to get right: you'd have to ensure that I don't immediately see how to provide that property for |
For a consistent |
That essentially makes Ideally, I guess that means I'm in favor of an inconsistent |
For some usages there should be strict knowledge of bounding number of allocated "values", ie number of allocated values should not change. And preferrably, values should be allocated at predictable time, for example, at container ( Probably, it should be separate container: //NewFixSharded preallocates all values by calling alloc function, and returns new FixSharded.
//FixSharded never changes its size, ie never allocates new value after construction.
NewFixShareded(alloc func() interface) *FixSharded {}
//NewFixShardedN preallocates exactly n values by calling alloc function, and returns new FixSharded.
NewFixSharededN(n int, alloc func() interface) *FixSharded {}
func (a *FixSharded) Get() interface{} {} If size never changes, there is no need in Rational: GOMAXPROCS changes rarely (almost never), so dynamic allocation excessive. I could be mistaken about GOMAXPROCS constantness. |
@bcmills Well, as I said earlier, I think we need to look at specific use cases. For the specific use case I was discussing, I assert that the cost of a consistent What specific use case do you have in mind? |
@ianlancetaylor I'm specifically thinking about counting (as in #8281) and CPU-local caching (e.g. buffering unconditional stores to a shared map, a potential optimization avenue for #18177). |
I'm thinking about stat-collectors and high-performance RPC. |
@bcmills For counting, it seems to me you would use an inconsistent I don't actually see how to write a consistent One approach for buffering stores to a shared map would be a @funny-falcon Can you expand on what you mean by "high-performance RPC"? I don't see why you need a global distributed value for RPC. |
Perhaps stating the obvious, but one slightly tricky thing is when to GC stale per-thread values when GOMAXPROCs is decreased. For some use-cases (e.g. distributed mutexes), they will presumably have a reference keeping the stale values alive. For others (e.g. counters), you'd need to keep around the value until its been accumulated. Also, in the pony category: if I want a distributed int64 counter, they would have sufficient padding to avoid false-sharing, but if I allocate multiple such counters, they could be instantiated within the padding, so to speak. I think this could maybe be built in user-space on top of a more low-level API, but if its possible for the API to provide it directly, that'd be great. |
@ianlancetaylor I maintain connector to in-memory transactional database capable to serve more than 1M requests per second. To be able to send that rate of requests, and to be able to scale smoothly with CPU cores, I have to shard internal data structures of connector. (And I need to build custom hash table, and build custom timers. But sharding is a base of improvement). Without sharding there is too many lock contention. If there will be shard-to-cpu alignment (even if it will be not strict), it will help further reduce lock contention and improve CPU-cache utilization. As I understood, most of users doesn't change GOMAXPROCS on the fly, so I'd prefer fixed number of preallocated shards, cause then I can easily map responses from server back to shard. I still think, simple low-level "ProcHint" api (as proposed in #18590 (comment) ) will be sufficient. But if want for api to look "higher level", then I'd be satisfied with |
Link to improved |
Excuse me for a bit offtopic: |
Programs might change GOMAXPROCS in response to getting more or less of a machine as co-tenancy changes. |
I'll document some concrete use case examples:
I believe with @aclements 's API we would implement this as:
I believe with @aclements 's API we would implement this as:
@aclements does that all look right? |
... allocating an integer for every increment? (ints into interfaces cause an allocation) |
And my experience with So we'd probably actually want to write it as: p := c.shards.Get()
if p == nil {
p = new(int)
}
*(p.(*int)) += n
c.shards.Put(p) (Or else we'll want to fix the extra allocation for the |
I wonder if this could also be used to build a better multi-core RWLock similar to my drwmutex. From the proposal thus far, it sound like it might be tricky to implement something like "as a writer, take all locks, and disallow new locks to be added while you hold those locks". |
Tricky but possible, I think. You can add a The harder part is that if you want to satisfy the existing A sketch with the blocking version of type readerLock struct {
locked bool
mu sync.Mutex
}
func (m *RWMutex) RLock() {
i := m.readers.Get()
l, _ := i.(*readerLock)
if l != nil && !l.locked {
l.Lock()
return
}
if l.locked {
m.readers.Put(i) // Put this one back and allocate a new one.
}
l = &readerLock{locked: true}
l.Lock()
m.add.Lock()
m.readers.Put(l)
m.add.Unlock()
}
func (m *RWMutex) RUnlock() {
i := m.readers.Get()
l, _ := i.(*readerLock)
if l != nil && l.locked {
l.Unlock()
return
}
unlocked := false
m.readers.Do(func(i interface{}) {
if unlocked {
return
}
l := i.(*readerLock)
if l.locked {
l.Unlock()
unlocked = true
}
})
} |
Technique used by @valyala to improve timers at https://go-review.googlesource.com/#/c/34784/ exactly shows why I agree with @valyala that So, even if you decided to stick with |
That approach looks like it would be just as easy to implement (and perhaps with less cross-core contention) using Note that there's nothing stopping a user of the proposed |
@bcmills, well, yes: if allocator function returns pointers to preallocated values, then it looks like FixSharded. You are right. |
Please don't call this sharding, which is either an obscure English term or a term of art for distributed computing. Neither fits. It's a per-CPU thing, so call it something like perCPU or percpu. |
I find sharding to be a useful term here since the operations on these
shards are similar to those we see in parallel data processing systems like
mapreduce: shards are partitions of a larger dataset; they can be operated
on in parallel; they can be combined into fewer shards; and they can be
reduced into a final value. But this may just be my distributed systems
background influencing the way I think about this problem. If the only
meaningful implementation of this functionality is one value per CPU, then
naming it PerCPU is fine.
…On Mon, Jan 30, 2017 at 4:12 PM Rob Pike ***@***.***> wrote:
Please don't call this sharding, which is either an obscure English term
or a term of art for distributed computing. Neither fits. It's a per-CPU
thing, so call it something like perCPU or percpu.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#18802 (comment)>, or mute
the thread
<https://github.com/notifications/unsubscribe-auth/AJSK3ShVkC1_aZopzn9i24K9DqIn0Spxks5rXlJVgaJpZM4Lu_e6>
.
|
I think the discussion of whether there is a merge method actually implies this question: What is the number of Shards?I recommend that the number of Shards be within the GOMAXPROCS, because then the number of Shards will not be too large, so there is no need to merge. |
https://github.com/jonhoo/drwmutex does communicate the general idea. But to summarize: if you have a read-mostly, but contended use-case, and are willing to pay the additional memory cost, you can replace a single The approach also works if the number of shards is dynamically growing, but is a bit more complicated. When a shard is first allocated, the first reader must notice that its assigned lock is previously unused and actually perform effectively a Some more use-cases that come to mind:
For a performance-oriented API, |
Thanks, that's helpful.
Right, I suppose what I never really stated is that I think a dynamically-sized pool is the right choice. Ps in the runtime have so many different types of caches because it's scalable and effective as a performance tool. But because they're tied to Ps, which are very fixed resources, there is a ton of complexity around making sure you have a P, what to do when you don't have one, and trying to prevent resources from getting "stuck" to Ps that don't run. A dynamic pool would help with all those things, because shards can be created and destroyed at will, though most of the time they would stay exactly as they are, still retaining the vast majority of the scalability benefit. I admit, this is a bit abstract, and how well that translates into regular Go code remains to be seen. I have a gut feeling that offering a more powerful and flexible choice (dynamic pool management) is going to be the right choice in the long term without sacrificing expressivity for the immediate use-cases. On that note, https://github.com/jonhoo/drwmutex actually made me realize something, and I think there's actually a generalization we can make to allow the effective implementation of statically sized pools where it makes sense. (See further down.)
Thanks, that makes sense. I agree that is an awkward fit to the API I proposed.
Right. If you tried to use the API for this use-case very directly, it's going to force you to do the much more complicated thing. I sympathize with that. However (and this is the compromise I mentioned above) I think you can always implement a statically-sized shard pool using the dynamic pool by using the dynamic pool to manage indices into the static pool. The core problem with implementing the distributed read-write mutex today is trying to associate a read-lock with a P/thread/whatever. When I look at https://github.com/jonhoo/drwmutex, the main thing it seems to struggle with is finding an index that has some affinity to the current scheduling resource. What
Thanks, I think those are good use-cases and something that should definitely be considered in the future as part of a corpus of possible use-cases that designs here are tested against. The ID allocation example made me go "oh, duh"; there are least two of those in the runtime.
I think it depends a lot on the semantics of Value. I agree that the way I proposed it, where the pool is actually drained, is probably too much. |
Is there a maximum dynamic size? |
If the use case is an http server. |
Not one that would be documented, but I think in any reasonable implementation the hard upper-bound would be the number of threads in the program (so long as the application didn't recursively call Update within the callback[1]). This might happen if every goroutine blocks in a syscall in the Update callback that cannot be deduplicated with netpoll. I think in general my advice would be "try to avoid that," but I also don't think blocking in the Update callback should be disallowed, necessarily. In practice, however, it would stay quite close to GOMAXPROCS, though, if it's being used actively. [1] You could create an unbounded number of shards if you called Update recursively, but the reentrancy is really just supposed to allow for one or two levels deep at most. If you can guarantee that it only goes one level deep, then the worst case maximum would be 2x the number of threads. |
It seems that this means that the Merge method can be opt-in. Because usage scenario of sync.SharedValue often does not require recursive calls to Update in callback. |
By summarizing my point of view, I will list some areas where there are different opinions on this proposal, so as to help people who see this proposal to more easily understand the progress of this proposal.
By the way, with so much discussion, it seems that all we need to do is come up with a sufficiently universal API to solve the problem. This proposal can be reached through consensus and accepted. |
The Merge method really goes beyond recursive calls; that's just one example. It's really about giving the implementation and user-level code flexibility. What happens when a goroutine gets preempted in the Update function? We don't want to pin the P or thread the entire time it's in there, which means that another goroutine might run on the same P later and call back into Update. What happens then? If the runtime is able to Merge shards, it can just generate new shards on the fly and we can be confident everything will be accounted for later. Otherwise calls into Update may need to block or spin or something, all of which is going to be complex and difficult to make really efficient. A rare slow path to create a new shard is going to be much simpler to reason about, and the Update function never has to block or spin (depending on the semantics of Range). IMO this also makes the API simpler to use, because you don't have to be quite so careful about keeping your Update callback small. It's certainly better if you do, but the only downside in most cases is slightly (and transiently) more memory use. Lastly, IMO, making the Merge method opt-in seems like the most complicated choice. Two different sets of semantics and limitations in the same data structure seems like it'll make the API confusing.
I don't really know what you mean by this. In the log buffer scenario, I'm imagining sharding fixed-size buffers. Merging them would not involve actually concatenating them; that seems like it would be too expensive regardless. To be totally clear, Merge does not mean that you have to actually merge the two values in some way. It's purely conceptual, giving you the choice as to how to handle the fact that the implementation will only let you put back one of two values into the pool. In the log scenario I'm imagining you would flush a buffer to a global list and keep the other, for example. |
Thank you for your reply. This is very useful information. Maybe our use cases are different, so our perspectives are different. |
See #51317 (comment) , I found a new use case that uses sync.ShardedValue , it need by using the dynamic pool to manage indices into the static pool. This path currently has all goroutine competing for atomic addition operations on the same memory. Combined with other hot paths, this results in performance dropping from about 11ns for a single goroutine to about 32ns at 16P. |
Considering the https://github.com/jonhoo/drwmutex and https://github.com/jonhoo/drwmutex type Int int
func (a Int) Merge(b Int) Int {
return a+b
}
type value[T any] truct{
v T
lock Mutex // Ensure the check-out/check-in model
_ [64]byte // Separate in different cacheline
}
// StaticShard is equivalent ShardedValue is Static size
type StaticShard[T any] struct{
index ShardedValue[Int]
pool []value
count atomic.Int64
}
func NewStaticShard[T any](Size uint)*StaticShard[T]{
if Size==0{
panic("size ==0")
}
ret:=new(StaticShard[T])
ret.pool=make([]value,Size)
return ret
}
func (s *StaticShard[T]) index() int{
var index int
s.index.Update(func (i Int)Int{
if i!=0{
index=i
return i
}
index=s.count.Add(1)%len(s.pool)
return index
})
}
func (s *StaticShard[T]) Update(f func(value T) T){
index:=s.index()
ref:=&s.pool[index]
ref.lock.Lock()
defer ref.lock.Unlock()
ref.v=f(ref.v)
}
func (s *StaticShard[T])Range(f func(value T) T){
for i:=range s.pool{
s.pool[i].lock.Lock()
}
defer func(){
for i:=range s.pool{
s.pool[i].lock.Unlock()
}
}()
for i:=range s.pool{
ref.v=f(ref.v)
}
}
}
|
Implementation golang#18802 (comment) This CL is for a better understanding of the API based on the check-out/check-in model. Change-Id: I7fdef164291cbb064f593faabee53e5221d008da
Implementation golang#18802 (comment) This CL is for a better understanding of the API based on the check-out/check-in model. Change-Id: I7fdef164291cbb064f593faabee53e5221d008da
An alternative to |
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/go#109 Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/go#109 Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/go#109 Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
ShardedInt provides an int type expvar.Var that supports more efficient writes at high frequencies (one order of magnigude on an M1 Max, much more on NUMA systems). There are two implementations of ShardValue, one that abuses sync.Pool that will work on current public Go versions, and one that takes a dependency on a runtime.TailscaleP function exposed in Tailscale's Go fork. The sync.Pool variant has about 10x the throughput of a single atomic integer on an M1 Max, and the runtime.TailscaleP variant is about 10x faster than the sync.Pool variant. Neither variant have perfect distribution, or perfectly always avoid cross-CPU sharing, as there is no locking or affinity to ensure that the time of yield is on the same core as the time of core biasing, but in the average case the distributions are enough to provide substantially better performance. See golang/go#18802 for a related upstream proposal. Updates tailscale/go#109 Updates tailscale/corp#25450 Signed-off-by: James Tucker <james@tailscale.com>
Per-CPU sharded values are a useful and common way to reduce contention on shared write-mostly values. However, this technique is currently difficult or impossible to use in Go (though there have been attempts, such as @jonhoo's https://github.com/jonhoo/drwmutex and @bcmills' https://go-review.googlesource.com/#/c/35676/).
We propose providing an API for creating and working with sharded values. Sharding would be encapsulated in a type, say
sync.Sharded
, that would haveGet() interface{}
,Put(interface{})
, andDo(func(interface{}))
methods.Get
andPut
would always have to be paired to makeDo
possible. (This is actually the same API that was proposed in #8281 (comment) and rejected, but perhaps we have a better understanding of the issues now.) This idea came out of off-and-on discussions between at least @rsc, @hyangah, @RLH, @bcmills, @Sajmani, and myself.This is a counter-proposal to various proposals to expose the current thread/P ID as a way to implement sharded values (#8281, #18590). These have been turned down as exposing low-level implementation details, tying Go to an API that may be inappropriate or difficult to support in the future, being difficult to use correctly (since the ID may change at any time), being difficult to specify, and as being broadly susceptible to abuse.
There are several dimensions to the design of such an API.
Get
andPut
can be blocking or non-blocking:With non-blocking
Get
andPut
,sync.Sharded
behaves like a collection.Get
returns immediately with the current shard's value or nil if the shard is empty.Put
stores a value for the current shard if the shard's slot is empty (which may be different from whereGet
was called, but would often be the same). If the shard's slot is not empty,Put
could either put to some overflow list (in which case the state is potentially unbounded), or run some user-provided combiner (which would bound the state).With blocking
Get
andPut
,sync.Sharded
behaves more like a lock.Get
returns and locks the current shard's value, blocking furtherGet
s from that shard.Put
sets the shard's value and unlocks it. In this case,Put
has to know which shard the value came from, soGet
can either return aput
function (though that would require allocating a closure) or some opaque value that must be passed toPut
that internally identifies the shard.It would also be possible to combine these behaviors by using an overflow list with a bounded size. Specifying 0 would yield lock-like behavior, while specifying a larger value would give some slack where
Get
andPut
remain non-blocking without allowing the state to become completely unbounded.Do
could be consistent or inconsistent:If it's consistent, then it passes the callback a snapshot at a single instant. I can think of two ways to do this: block until all outstanding values are
Put
and also block furtherGet
s until theDo
can complete; or use the "current" value of each shard even if it's checked out. The latter requires that shard values be immutable, but it makesDo
non-blocking.If it's inconsistent, then it can wait on each shard independently. This is faster and doesn't affect
Get
andPut
, but the caller can only get a rough idea of the combined value. This is fine for uses like approximate statistics counters.It may be that we can't make this decision at the API level and have to provide both forms of
Do
.I think this is a good base API, but I can think of a few reasonable extensions:
Provide
Peek
andCompareAndSwap
. If a user of the API can be written in terms of these, thenDo
would always be able to get an immediate consistent snapshot.Provide a
Value
operation that uses the user-provided combiner (if we go down that API route) to get the combined value of thesync.Sharded
.The text was updated successfully, but these errors were encountered: