-
Notifications
You must be signed in to change notification settings - Fork 4.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Replace Batch operation in Cassandra Delete() #4054
Conversation
This fixes failing Cassandra backend tests. It is also probably the better route, since Batch operations aren't the preferred approach unless necessary (https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html).
|
||
for _, bucket := range buckets { | ||
go func(bucket string) { | ||
results <- c.sess.Query(stmt, bucket, key).Exec() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Doing it this way means that the delete operation is not really atomic, but I see that Put()
is also done this way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. The alternative approach, which also fixes the test, is:
batch := c.sess.NewBatch(gocql.LoggedBatch)
for _, bucket := range c.buckets(key) {
batch.Query(stmt, bucket, key)
}
This is the Batch creation process shown in: https://github.com/gocql/gocql/blob/master/batch_test.go
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would prefer the above. I think that batch.Query()
is an append on the list of queries that will be executed, but we should double check.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is. So that change is just leveraging the lib function, but we were doing the same thing I believe. c.sess.NewBatch() instead of gocql.NewBatch() has some significant differences though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What are those differences/why should/would we pick one over the other? It feels way nicer to use a lib's batch operations than goroutines so if this works (one way or another via c.sess.NewBatch or gocql.NewBatch) probably Put should be fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I now understand this more fully. c.sess.NewBatch() creates a batch that uses all of the session default parameters, whereas gocql.NewBatch() is just a Batch zero value (and is deprecated). The critical member is "defaultTimestamp", which defaults to true
in the session, but is false if the Batch is created with gocql.NewBatch(). When true
, the client specifies the query timestamp. All of the other queries (none of which use Batch) would be using client time, but the Batch before this PR will use server time:
If the client does not specify any timestamp, the server will assign one based on the time it receives the query. This can be a problem when the order of the writes matter
(https://docs.datastax.com/en/developer/java-driver/2.1/manual/query_timestamps/)
I agree Batch seems like the simpler approach, but both datastax docs (see link in commit message) and @obeattie have raised concerns about it. Specifically: if there are more than 65535 statements in the batch, it will simply fail. More generally, it places a heavier load on the cluster.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While the statement limit does exist as you pointed out, the more problematic limit I've run into is batch_size_fail_threshold_in_kb
, which by default is set to 50KB.
👋 Hi, I'm the original author of this plugin and we use it extensively in production. What is the specific test failure this fixes? I can't seem to get Travis to show me the output properly. |
Hi @obeattie. The errors were varied, but things like:
They were seen when running I played a little with disabling certain parts of tests and it seemed like Deletes were always the contributor, and that they were effectively happening sometime after their Go invocation (and after other DB ops). For example, disabling a Delete command that should have been a no-op (there was no entry yet) would allow a subsequent Get test to pass. I noticed that both updating the Batch syntax and just switching it out for the syntax you used in Put() allowed tests to pass. |
That is quite odd and I'm not sure I understand how removing batching would fix that specific problem. Cassandra uses wallclock timestamps to sequence operations, and deletes are effectively "just" writes – is it possible there was clock skew between the Docker VM and your host Mac? Separately, I don't think removing the batching here is a bad thing – batching has some other slightly unexpected behaviours (for example, batch sizes are limited aggressively and will fail if the statements sum to larger than the limit) – but I am sceptical that it is a fix for the root cause of whatever behaviour you were seeing. |
Now that you mentioned it, it's very likely that the issue is caused by clock skew between the VM and the host. |
I know @chrishoffman has been seeing massive skew problems on Mac OS lately. It seems like batching isn't the best idea; however, @obeattie thoughts on having delete use the same goroutine based approach as Put? Basically: it is a bad idea, even if it's not usually necessary? Also, I was going to make a comment about Cassandra relying on client-side timestamps but apparently it defaults to server-side, which makes this problem a bit more mystifying to me; even if skew is getting worse, as long as the delete requests are sent in order I would think from the server perspective it'd still maintain that order. |
@jefferai re: the timestamps, what I observed in the tests was effectively: a test may do [DELETE, PUT, GET], but the what is happening is [PUT, DELETE, GET]. Only DELETE is getting server side timestamps, the PUT & GET are using the session, not the batch, so they're getting client timestamp. I'd suspect that a series of DELETEs probably retains order, and a series of {PUT | GET} retain order, but the mix of DELETE into the other ops is skewed. |
I read https://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks about this. I'm not super convinced. The fact that Cassandra does updates column-wise is great -- it's almost like RDBMSes were onto something -- but the fact that conflict resolution is dependent on a mix of server and client timestamps feels like asking for trouble. The article claims timestamps are nice to have but not critical, but it didn't take much to see unintended behavior. At a minimum you could imagine that a server holding open a session could figure out at the start of the session what the expected skew from that client is, and maybe heartbeat regularly to keep that skew value updated. |
This fixes failing Cassandra backend tests. It is also probably the
better route, since Batch operations aren't the preferred approach
unless necessary (https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html).