Replace Batch operation in Cassandra Delete() #4054

kalafut · 2018-02-28T18:38:46Z

This fixes failing Cassandra backend tests. It is also probably the
better route, since Batch operations aren't the preferred approach
unless necessary (https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html).

This fixes failing Cassandra backend tests. It is also probably the better route, since Batch operations aren't the preferred approach unless necessary (https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html).

calvn · 2018-02-28T19:16:18Z

physical/cassandra/cassandra.go

+
+	for _, bucket := range buckets {
+		go func(bucket string) {
+			results <- c.sess.Query(stmt, bucket, key).Exec()


Doing it this way means that the delete operation is not really atomic, but I see that Put() is also done this way.

Yes. The alternative approach, which also fixes the test, is:

batch := c.sess.NewBatch(gocql.LoggedBatch) for _, bucket := range c.buckets(key) { batch.Query(stmt, bucket, key) }

This is the Batch creation process shown in: https://github.com/gocql/gocql/blob/master/batch_test.go

I would prefer the above. I think that batch.Query() is an append on the list of queries that will be executed, but we should double check.

It is. So that change is just leveraging the lib function, but we were doing the same thing I believe. c.sess.NewBatch() instead of gocql.NewBatch() has some significant differences though.

What are those differences/why should/would we pick one over the other? It feels way nicer to use a lib's batch operations than goroutines so if this works (one way or another via c.sess.NewBatch or gocql.NewBatch) probably Put should be fixed.

I now understand this more fully. c.sess.NewBatch() creates a batch that uses all of the session default parameters, whereas gocql.NewBatch() is just a Batch zero value (and is deprecated). The critical member is "defaultTimestamp", which defaults to true in the session, but is false if the Batch is created with gocql.NewBatch(). When true, the client specifies the query timestamp. All of the other queries (none of which use Batch) would be using client time, but the Batch before this PR will use server time:

If the client does not specify any timestamp, the server will assign one based on the time it receives the query. This can be a problem when the order of the writes matter
(https://docs.datastax.com/en/developer/java-driver/2.1/manual/query_timestamps/)

I agree Batch seems like the simpler approach, but both datastax docs (see link in commit message) and @obeattie have raised concerns about it. Specifically: if there are more than 65535 statements in the batch, it will simply fail. More generally, it places a heavier load on the cluster.

While the statement limit does exist as you pointed out, the more problematic limit I've run into is batch_size_fail_threshold_in_kb, which by default is set to 50KB.

obeattie · 2018-02-28T20:13:22Z

👋 Hi, I'm the original author of this plugin and we use it extensively in production. What is the specific test failure this fixes? I can't seem to get Travis to show me the output properly.

kalafut · 2018-02-28T20:27:27Z

Hi @obeattie. The errors were varied, but things like:

=== RUN   TestCassandraBackend
--- FAIL: TestCassandraBackend (0.20s)
    cassandra_test.go:37: bad: <nil> expected: &{foo [116 101 115 116] false}
    cassandra_test.go:37: keys[0] did not equal foo: []
    cassandra_test.go:37: expected 2 keys [foo, foo/]: [foo/]
    cassandra_test.go:38: root expected [foo foo/]: [foo/]
    cassandra_test.go:38: level 1 expected [bar bar/]: [bar/]
=== RUN   TestCassandraBackendBuckets
--- PASS: TestCassandraBackendBuckets (0.00s) FAIL FAIL    github.com/hashicorp/vault/physical/cassandra    0.210s

They were seen when running make test, relying on Docker for Mac. Colleagues using linux didn't see the errors, and I suspect this is due to timing differences.

I played a little with disabling certain parts of tests and it seemed like Deletes were always the contributor, and that they were effectively happening sometime after their Go invocation (and after other DB ops). For example, disabling a Delete command that should have been a no-op (there was no entry yet) would allow a subsequent Get test to pass. I noticed that both updating the Batch syntax and just switching it out for the syntax you used in Put() allowed tests to pass.

obeattie · 2018-02-28T22:12:02Z

That is quite odd and I'm not sure I understand how removing batching would fix that specific problem. Cassandra uses wallclock timestamps to sequence operations, and deletes are effectively "just" writes – is it possible there was clock skew between the Docker VM and your host Mac?

Separately, I don't think removing the batching here is a bad thing – batching has some other slightly unexpected behaviours (for example, batch sizes are limited aggressively and will fail if the statements sum to larger than the limit) – but I am sceptical that it is a fix for the root cause of whatever behaviour you were seeing.

calvn · 2018-02-28T22:49:14Z

Now that you mentioned it, it's very likely that the issue is caused by clock skew between the VM and the host.

jefferai · 2018-03-01T16:04:25Z

I know @chrishoffman has been seeing massive skew problems on Mac OS lately.

It seems like batching isn't the best idea; however, @obeattie thoughts on having delete use the same goroutine based approach as Put? Basically: it is a bad idea, even if it's not usually necessary? Also, I was going to make a comment about Cassandra relying on client-side timestamps but apparently it defaults to server-side, which makes this problem a bit more mystifying to me; even if skew is getting worse, as long as the delete requests are sent in order I would think from the server perspective it'd still maintain that order.

kalafut · 2018-03-02T05:05:40Z

@jefferai re: the timestamps, what I observed in the tests was effectively: a test may do [DELETE, PUT, GET], but the what is happening is [PUT, DELETE, GET]. Only DELETE is getting server side timestamps, the PUT & GET are using the session, not the batch, so they're getting client timestamp. I'd suspect that a series of DELETEs probably retains order, and a series of {PUT | GET} retain order, but the mix of DELETE into the other ops is skewed.

jefferai · 2018-03-02T07:14:15Z

I read https://www.datastax.com/dev/blog/why-cassandra-doesnt-need-vector-clocks about this. I'm not super convinced. The fact that Cassandra does updates column-wise is great -- it's almost like RDBMSes were onto something -- but the fact that conflict resolution is dependent on a mix of server and client timestamps feels like asking for trouble. The article claims timestamps are nice to have but not critical, but it didn't take much to see unintended behavior. At a minimum you could imagine that a server holding open a session could figure out at the start of the session what the expected skew from that client is, and maybe heartbeat regularly to keep that skew value updated.

Replace Batch operation in Cassandra Delete()

ad1585f

This fixes failing Cassandra backend tests. It is also probably the better route, since Batch operations aren't the preferred approach unless necessary (https://docs.datastax.com/en/cql/3.3/cql/cql_using/useBatch.html).

kalafut added this to the 0.9.6 milestone Feb 28, 2018

kalafut requested review from tyrannosaurus-becks and calvn February 28, 2018 18:38

calvn reviewed Feb 28, 2018

View reviewed changes

jefferai modified the milestones: 0.9.6, 0.10 Mar 20, 2018

Merge branch 'master' into cassandra-delete-no-batch

1c5f1b1

calvn approved these changes Mar 23, 2018

View reviewed changes

Merge branch 'master' into cassandra-delete-no-batch

6107805

kalafut merged commit d396328 into master Mar 23, 2018

kalafut deleted the cassandra-delete-no-batch branch March 23, 2018 16:44

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Replace Batch operation in Cassandra Delete() #4054

Replace Batch operation in Cassandra Delete() #4054

kalafut commented Feb 28, 2018

calvn Feb 28, 2018

kalafut Feb 28, 2018

calvn Feb 28, 2018

kalafut Feb 28, 2018

jefferai Mar 1, 2018

kalafut Mar 1, 2018

obeattie Mar 1, 2018

obeattie commented Feb 28, 2018

kalafut commented Feb 28, 2018

obeattie commented Feb 28, 2018 •

edited

Loading

calvn commented Feb 28, 2018

jefferai commented Mar 1, 2018

kalafut commented Mar 2, 2018

jefferai commented Mar 2, 2018

Replace Batch operation in Cassandra Delete() #4054

Replace Batch operation in Cassandra Delete() #4054

Conversation

kalafut commented Feb 28, 2018

calvn Feb 28, 2018

Choose a reason for hiding this comment

kalafut Feb 28, 2018

Choose a reason for hiding this comment

calvn Feb 28, 2018

Choose a reason for hiding this comment

kalafut Feb 28, 2018

Choose a reason for hiding this comment

jefferai Mar 1, 2018

Choose a reason for hiding this comment

kalafut Mar 1, 2018

Choose a reason for hiding this comment

obeattie Mar 1, 2018

Choose a reason for hiding this comment

obeattie commented Feb 28, 2018

kalafut commented Feb 28, 2018

obeattie commented Feb 28, 2018 • edited Loading

calvn commented Feb 28, 2018

jefferai commented Mar 1, 2018

kalafut commented Mar 2, 2018

jefferai commented Mar 2, 2018

obeattie commented Feb 28, 2018 •

edited

Loading