-
Notifications
You must be signed in to change notification settings - Fork 17.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sync: Map method suggestions #21199
Comments
OK, my apologies for not having checked the previous discussions. I understand your motivation not to include something like Modify, but DeleteOrStore is something we actually really need... |
|
So let's focus on I suspect it's more likely that we would add the more general func RemoveOrStore(m *sync.Map, k, v interface{}) (prev interface{}, deleted bool) {
for {
prev, ok := m.Load(k)
if ok {
if m.CompareAndDelete(k, prev) {
return prev, true
}
} else {
if m.CompareAndSwap(k, nil, v) {
return nil, false
}
}
{
} That said, |
It's quite late to have this thought, but I wonder whether we should rename sync.Map to sync.Cache or something else narrower. That might reduce these common feature requests and also leaves open the door for a future sync.Map with a different feature set. |
@josharian The I do want to try to encourage more-specialized maps outside the standard library, though: the range of "concurrent map" use-cases is very wide, and |
Yep. In fact, that'd be really helpful, to get a sense for what the set of common needs are, for future stdlib/language additions. |
That's not so different from having a function with a primary return value, and a secondary error value. If the secondary value is not nil, the primary return value is probably meaningless.
I'm not sure I understand this remark correctly. I care that the storage can be reclaimed by the garbage collector, so the map shouldn't keep a reference to the value that prevents this from happening. (If there are other references to the value being held by the code that invoked DeleteOrStore, that's fine, it should just be in control of that code.) This is also why a solution based on some compare-and-swap functionality is not ideal, because compare-and-swap cannot observe concurrent deletes, it can only observe swaps to some conventional "value-missing" value which doesn't allow the value to be garbage collected. Sidenote: I can live with the functionality not being added to sync.Map, because it's not too hard to define a custom Map that implements what is needed. However, my understanding was that sync.Map is supposed to be a concurrent version of the Go's built-in map type, and then I believe it should add the functionality I describe above to be complete. Maybe it's easier to rename sync.Map to avoid the false impression? |
Here are more details about the use case: DNA sequencers use chemical processes to create smaller DNA sequences, which can then be read and stored in files. Each "read" consists of two fragments, one at the left end and one at the right end of the read, which are initially stored in FASTQ files, and eventually in SAM files. For some processing steps in DNA sequencing pipelines, it is important to find the two fragments that belong to a single read. They can be identified by two fields, most importantly the QNAME field (see section 1.4 in the SAM file format). It is known that a particular QNAME only occurs twice in a SAM file, so one way to identify a pair of fragments is by using the QNAME as key in a map and store it the first time you encounter it, and retrieve and delete it again the second time you encounter it. By that time you have both fragments that belong to the pair. A typical SAM file for human DNA samples can be around 50 GB compressed and 500 GB uncompressed data, so ensuring that values that are not used anymore can be gc'ed is quite important. |
The built-in
|
If you know that the QNAME only occurs twice, func Lookup(qname string, offset int64) (start, end int64, ok bool) {
prev, ok := m.LoadOrStore(qname, offset)
if !ok {
return -1, -1, false
}
m.Delete(qname)
if prev > offset {
start, end = offset, prev
} else {
start, end = prev, offset
}
return start, end, true
} If you've already seen the second occurrence, then you know that there will not be any subsequent |
Just out of curiosity, what would happen if one did an atomic.AddUint64(&length, 1) on add, and atomic.AddUint64(&length, -1) upon delete? Seems quite efficient. Return the length when queried with an atomic read. Would that not be correct? |
@bcmills Yes, this solution would work for this particular use case. I'd still be in favor to add DeleteOrStore, if not for performance reasons (potentially less synchronization necessary). But I have to admit I'm running out of good arguments. @glycerine An issue with a length method, or similar, on concurrent data structures is that the information is potentially obsolete once you have it, so it's difficult to defend it's usefulness. |
@glycerine It is not obvious to me that Even if you could find a correct algorithm, every |
Correctness: everyone knows that length is at a single snapshot in time, that it is subject to races, and that it may be stale, but this is the right of the user to decide how useful this is. For many applications this snapshot may be absolutely correct and exactly what is needed. Efficiency: What's the alternative? When one needs the length, one must iterate through all available keys of the sync.Map and count them manually? An atomic store would be radically faster that all that data copying. Reading all stored keys from all shards will incur the cost of much more cache coherency logic than copying a single number, or even a single number per shard. Assuming you did a lazy update of the global length, and only updated shard lengths upon insert/delete, this would be a "only-pay-if-you-use-it" design for Len(). |
Your efficiency argument assumes that you need the length. It's clear from the current uses that many uses of this data structure do not need the length. You need to compare, not the efficiency of obtaining the length one way or another, but the efficiency of obtaining the length versus not obtaining the length. For example, people who need the length could always insert and delete values using a helper function that maintains the length. |
That's my point, though: without additional synchronization, it is not obviously a valid snapshot in time. If you have two concurrent Similarly, if you have two concurrent It is not obvious to me how you intend to maintain that property without additional locking (and the associated cache contention). |
The current implementation does not use sharding at all (see https://golang.org/issue/20360#issuecomment-301372644). (Or, you can equivalently think of it as having a shard per key, in which case your "lazy update of the global length" is exactly what you get from using |
From #22247 func (m *Map) Len() int The following is a reference: https://github.com/henrylee2cn/goutil/blob/master/map.go#L541 |
We aren't going to make any changes here, so closing. Please comment if you disagree, but please address the above comments. |
Please answer these questions before submitting your issue. Thanks!
What version of Go are you using (
go version
)?1.9rc1
What operating system and processor architecture are you using (
go env
)?macOS, x86_64
What did you do?
If possible, provide a recipe for reproducing the error.
A complete runnable program is good.
A link on play.golang.org is best.
This is not an error report, but a report that I am convinced there are methods that still need to be added to the sync.Map data structure.
It's important that the key-value-pair currently stored in the map is not concurrently changed while the user-supplied function is being executed.
actual, loaded := m.DeleteOrStore(key, value interface{})
// loaded = true means there was already a value in the map, which is now deleted
// loaded = false means there wasn't a value in the map, and value is now stored (== actual)
DeleteOrStore is useful if you are looking for pairs of entries that share the same key in a stream of entries: When the first entry arrives, it is stored in the map because no entry in the map with the same value already existed; when the second entry arrives, the first entry is retrieved and deleted from the map, and the pair of entries can now be processed outside of the map. A real-world use case is the processing of read pairs in DNA sequencing pipelines. (I can provide more details and code if there is interest.)
m.LoadOrCompute(key, func() interface{} { return some computed value })
This could be useful if computing a new value is expensive for some reason. Likewise, a DeleteOrCompute is potentially useful.
I don't know good use cases for LoadOrCompute and DeleteOrCompute, though, unlike for Modify and DeleteOrStore.
What did you expect to see?
I expected to see at least methods Modify and DeleteOrStore on sync.Map
What did you see instead?
These methods are not defined.
The text was updated successfully, but these errors were encountered: