-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CompareAndSet overrides the value when passed counter is 0 #113
Comments
I looked at code and understand And looking at newCASCounter, I understand cas counters never be 0. As a side point, I noticed the difference in error handlings in |
CAS can fail if the counter doesn't match. Set would only fail if there's some internal issue, like disk write failure etc. Does that clarify things? |
The current CAS implementation uses a random number between 0 and 65535. But what happens if there are multiple writers racing using the same CAS, and the random number generator picks the same number (1 in 65535 chance of that)? This would allow multiple writers to set their value, even though only one writer should win. |
Your comment clarify most of things except the difference in error handling in I looked at the code and learned the reason for the difference by rewriting the code and run the test. I looked at the code of Lines 739 to 747 in 2fb2716
and learned the error returned by BatchSet is the last error set on reqs .
I tried to rewrite
in
like code in Then I run the test
And I learned the |
@cardbot Just a nitpick, actually it is a random number between 1 and 65535. Lines 151 to 153 in 5ae0851
I am also worried about an accidental
Is this low enough?
|
Using [1, 2**32-1] is also no good for the same reason. Using [1, 2**64-1] would be fine. Instead of random generation you could also use a single global 64-bit counter, though the choice around that is questionable. Another option is to get rid of CompareAndSwap. It looks like we store the CASCounter in the value log anyway. So if users want CAS they can do it at the application layer without performance loss, if Badger exposes something like |
Passing the old value does not cause writing the old value to log, which is good, however the application must pass the old value to badger, so if the old value size is large (say, 1GB) it causes performance loss, right? Off the topic, what is the recommended max size for keys and values which does not cause serious performance loss? How about using time.UnixNano() for the casCounter? |
If you use time.UnixNano() to generate the casCounter, and if you have two threads writing the key once per millisecond, it will only take 1000 seconds for them to generate the same CAS value, with big potential for bugs because they happen at the same time. If we could guarantee that two threads aren't generating a casCounter at the same time, you could instead just increment a counter. For all I know, that might be possible, it might be a super-easy fix. |
This is just an idea, how about using atomic.AddUint64 for casCounter? |
The reason we didn't just go with an increasing counter per key, is because we want to avoid doing a read for every write. Note that for CAS to work, every Set must also have stored a counter. Hence, every write would then involve at least one read to find the last counter, so we can store a new one. This obviously gets expensive. Using 64 bits for CAS is very expensive as well. Currently, we store the cas counter in the LSM tree; which we suggest folks to store in RAM for great performance. Instead of storing only 2 bytes, if we now store 8 bytes per key, that's definitely going to decrease the number of keys we can store per We could consider doing I don't see much issue with the current approach -- we'd need to have multiple CAS operations for the same key; and the random generator must in this data race, produce the same number; to potentially cause an issue. Note that this random generator is being shared across all the keys, so just spewing out the same number twice won't really cut it. |
The current approach makes it a completely useless feature. An incrementing counter can be done on a per-store basis, so there's no read cost. |
That's an overstatement.
Please explain. Do you mean |
I agree with @srh. Imagine if atomic.CompareAndSwapInt64() did not work correctly every 1 in 65535 times? It'd be a major bug that the golang developers would patch immediately. To address another part of your statement, races can happen in other cases as well, not just when the rng produces the same number twice in a row:
This is just one scenario; many others are possible. In scenarios with a lot of concurrent access, this will fail from time to time and result in data corruption. I believe you need to fix. In the current code, you're already "getting" the value in order to check CAS:
Why can't you move that code up into
That way, you could use an incrementing counter, which would be better (though still not perfect since it can roll over). In the case where CASCounterCheck = 0, I'm not sure that you have to worry about setting a new value of CASCounter. The caller doesn't care about CAS in that case, so I don't see a need to fetch the old value in order to set the new. I'm struggling to think of a scenario where a caller would want to mix CAS calls with non-CAS calls on the same value. |
I think if we want to fix it with no chance of failure, then the only solution I can think of to do this right, would involve doing a read to get the cas counter for every write. Even when you're doing a Set, without CAS, that Set needs to have a counter update. Otherwise, if one uses Set and CAS on the same key, then the counter has to represent the latest value, not some older value.
We can't assume that one would never mix these calls. Even if we avoid this mixing, to know that this particular key should not have a CAS, we'll still have to do a get. I'm still not sure if @srh had a better solution here. But, I think if LSM tree is already in memory, then doing a read before every write might be ok. If not, then this would severely reduce our write throughput. |
Maybe I could imagine a scenario where some sort of lossy CAS operation was acceptable, where corrupting data once every 65535 CAS races was OK and avoiding the corruption was also generally useful. If such a use case exists, it's a very slim portion of them, and even then such users would probably much prefer 32-bit or 64-bit random CAS values, or 32-bit incrementing values. It would be difficult to write a test suite for such a program.
I meant a 32- or 64-bit global incrementing counter (or several partitioned ones). You don't have to read to generate CAS values. A 64-bit rng would have the same effect, but a 64-bit incrementing counter could have ancillary uses (like efficient snapshot retrieval for Dgraph). |
Maybe the best way to resolve this is by enabling your callers to CAS themselves:
Doing it this way also means you don't need a separate SetIfNotExists method. The existing |
@srh makes a great point that given that writes are happening serially in Badger, we can use uint64 counter, which even on a 4 GHz server incrementing at the rate of 1 per nanosecond, would take 146 years to reach uint64 limit. So, using a global incrementing counter should be a great solution here, which can avoid our read overhead, while giving us all the wins expected from CAS. |
Closing this issue, as @srh is tracking the changes required in another. |
We don’t need to wait for the buffer on the storage device to flush on macOS.
First of all, thanks for creating and sharing a great software!
I noticed
CompareAndSet
overrides the value when passed counter is 0.Is this the expected behavior?
If so, I think it should be documented.
And then whether the casCounter ever becomes zero or not by calling
Set
.I created a test code to reproduce this problem.
https://github.com/hnakamur/badger/commit/9b4439446e999cada58ae17869e3ffdc056b5a3e
I got the following test result.
The text was updated successfully, but these errors were encountered: