storage/engine: introduce a datadriven framework to run MVCC tests #42250

knz · 2019-11-06T21:23:30Z

Previously MVCC tests were hand-coded in Go.
This was making it hard(er) to introduce new tests or modify existing
tests.

This commit improves upon this situation by introducing a new
datadriven test, TestMVCCHistories, which runs MVCC
tests written using a DSL:

begin_txn      t=<name> [ts=<int>[,<int>]]
remove_txn     t=<name>
resolve_intent t=<name> k=<key> [status=<txnstatus>]
restart_txn    t=<name>
update_txn     t=<name> t2=<name>
step_txn       t=<name> [n=<int>]
advance_txn    t=<name> ts=<int>[,<int>]
txn_status     t=<name> status=<txnstatus>

check_intent   k=<key> [none]

put       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> v=<string> [raw]
cput      [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> v=<string> [raw] [cond=<string>]
increment [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [inc=<val>]
del       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key>
get       [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [inconsistent] [tombstones]
scan      [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [end=<key>] [inconsistent] [tombstones] [reverse]

merge     [ts=<int>[,<int>]] k=<key> v=<string> [raw]

clear_range    k=<key> end=<key>

Where <key> can be a simple string, or a string
prefixed by the following characters:

=foo means exactly key foo
+foo means Key(foo).Next()
-foo means Key(foo).PrefixEnd()

Additionally, the pseudo-command with enables sharing
a group of arguments between multiple commands, for example:

with t=A
   begin_txn
   with k=a
     put v=b
     resolve_intent

Release note: None

cockroach-teamcity · 2019-11-06T21:23:38Z

This change is

petermattis

This is much more fleshed out that I was expecting. Thanks for getting the ball rolling!

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @nvanbenschoten, and @tbg)

pkg/storage/engine/testdata/mvcc_histories/clear_range, line 12 at r1 (raw file):

  mvcc_put  k=b/123 v=abc
  mvcc_put  k=c v=abc
  commit_txn

The functionality here is nifty. I'm going to propose an alternate syntax: we can make this look Go-like. The advantage of doing so is that syntax will be more familiar to the casual reader. Something like:

txnA = NewTxn()
txnA.MVCCPut(k="a", v="abc")
txnA.MVCCPut(...)
txnA.Commit()

I did this in the Pebble metamorphic test. See https://github.com/cockroachdb/pebble/blob/master/internal/metamorphic/testdata/parser#L36-L39 and https://github.com/cockroachdb/pebble/blob/master/internal/metamorphic/parser.go. The parser uses Go's scanner, and then adds a simple bit of parsing on top.

We've been thinking about adding an MVCC metamorphic test which I was imagining would have a similar structure to the Pebble one. Seems feasible to share the syntax between such an MVCC metamorphic test and MVCC datadriven test framework. Thoughts?

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, @petermattis, and @tbg)

pkg/storage/engine/testdata/mvcc_histories/clear_range, line 12 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The functionality here is nifty. I'm going to propose an alternate syntax: we can make this look Go-like. The advantage of doing so is that syntax will be more familiar to the casual reader. Something like:
txnA = NewTxn()
txnA.MVCCPut(k="a", v="abc")
txnA.MVCCPut(...)
txnA.Commit()
I did this in the Pebble metamorphic test. See https://github.com/cockroachdb/pebble/blob/master/internal/metamorphic/testdata/parser#L36-L39 and https://github.com/cockroachdb/pebble/blob/master/internal/metamorphic/parser.go. The parser uses Go's scanner, and then adds a simple bit of parsing on top.

We've been thinking about adding an MVCC metamorphic test which I was imagining would have a similar structure to the Pebble one. Seems feasible to share the syntax between such an MVCC metamorphic test and MVCC datadriven test framework. Thoughts?

Thank you to pointing me to that code. It's interesting. Funnily I even briefly considered something looking like Go myself initially.
But I think you over-engineered that. I rejected the Go idea after thinking hard about it (and not knowing about your code), and I believe I have found the clearly superior design.

Consider:

Aspect	Simple DSL defined here	Pebble's metamorphic DSL
Familiarity to the CRL eng reader	✓ (unix shell)	✓ (Go)
Reuses existing parsing code	✓ (`datadriven`)	✓ (Go)
Supports arguments in arbitrary order, makes it easier to edit	✓	❌
Supports argument reuse across commands, makes it easier to type and read	✓	❌
Single case, makes it easier to type and read	✓	❌
No punctuation, makes it easier to type and read	✓	❌
No unnecessary mandatory quoting, makes it easier to type and read	✓	❌

Moreover, your solution has created a false sense of simplicity. Because the basic operators in your language have a 1-to-1 mapping to Go function/method calls, you've placed the burden on the test implementer (and also the reader) to ensure the call protocol is correct. The test spec has to allocate (New) and Close explicitly, even if the test runner could be able to do that automatically from context. All the necessary arguments must be present, even if the test runner could be able to derive them automatically.
So I'd argue I have succeeded in eliminating more boilerplate in my test language than you did.

That being said, I'd gladly volunteer looking at your Pebble test code and apply the best practices I've applied here to there as well (i.e. instead of converting my code to use your ideas, which I think would be a mistake, I'd be OK converting your code to use my ideas, which I think would be an improvement).

petermattis

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @ajkr, @itsbilal, @nvanbenschoten, @sumeerbhola, and @tbg)

pkg/storage/engine/testdata/mvcc_histories/clear_range, line 12 at r1 (raw file):

Previously, knz (kena) wrote…

Thank you to pointing me to that code. It's interesting. Funnily I even briefly considered something looking like Go myself initially.
But I think you over-engineered that. I rejected the Go idea after thinking hard about it (and not knowing about your code), and I believe I have found the clearly superior design.

Consider:

Aspect Simple DSL defined here Pebble's metamorphic DSL

Familiarity to the CRL eng reader ✓ (unix shell) ✓ (Go)

Reuses existing parsing code ✓ (datadriven) ✓ (Go)

Supports arguments in arbitrary order, makes it easier to edit ✓ ❌

Supports argument reuse across commands, makes it easier to type and read ✓ ❌

Single case, makes it easier to type and read ✓ ❌

No punctuation, makes it easier to type and read ✓ ❌

No unnecessary mandatory quoting, makes it easier to type and read ✓ ❌

Moreover, your solution has created a false sense of simplicity. Because the basic operators in your language have a 1-to-1 mapping to Go function/method calls, you've placed the burden on the test implementer (and also the reader) to ensure the call protocol is correct. The test spec has to allocate (New) and Close explicitly, even if the test runner could be able to do that automatically from context. All the necessary arguments must be present, even if the test runner could be able to derive them automatically.
So I'd argue I have succeeded in eliminating more boilerplate in my test language than you did.

That being said, I'd gladly volunteer looking at your Pebble test code and apply the best practices I've applied here to there as well (i.e. instead of converting my code to use your ideas, which I think would be a mistake, I'd be OK converting your code to use my ideas, which I think would be an improvement).

Heh, the unix shell-like syntax is exactly what initially caught my eye here. I would respectfully disagree about your familiarity argument for that syntax.

I'd like to take a step away from arguing that either solution is perfect. Are there useful ideas we can take from both? I very much like the use of Go lexical constructs as it provides built-in support for comments, quoted strings, and other literals. I think you have a good point about keyword arguments which is even more powerful in the context of MVCC operations where the API is filled with methods containing numerous parameters. The with construct is also interesting in reducing boilerplate, though that runs against the explicitness philosophy that frequently runs through Go. Of course, we don't need to adopt that philosophy for the syntax here.

If I were to adopt this test case to Go-like syntax but with an extension for keyword arguments it would look like:

a = BeginTxn(ts=44)
a.MVCCPut(k:"a", v:"abc")
a.MVCCPut(k:"a/123", v:"abc")
a.MVCCPut(k:"b", v:"abc")
a.MVCCPut(k:"b/123", v:"abc")
a.MVCCPut(k:"c", v:"abc")
a.Commit()

I'm not sure how I'd make the with construct fit into this scheme. I'm not fond of some of the magic there (e,g. I'd always want it to be clear what txn is being operated on). Perhaps something like:

a = BeginTxn(ts=44)
args = {v:"abc"}
a.MVCCPut(k:"a", args...)
...

That doesn't help much if there is only one argument, but it could if there are many common ones.

Note that I'm not wedded to the syntax used by the Pebble metamorphic test, but I think you're overstating your claims of superiority for the syntax here. Also, the two of us are likely too invested in our proposals. Some of the other #storage folks should weigh in on this topic as they'll be supporting and extending these tests in the future. Cc @ajkr, @itsbilal, @sumeerbhola.

Lastly, I want to reiterate my appreciation for you getting the ball rolling here. I'm quibbling about surface details, but find the overall direction of this PR to be super exciting.

knz · 2019-11-08T15:22:25Z

I'd still like to point out from the specific example snippet:

a = BeginTxn(ts=44)
a.MVCCPut(k:"a", v:"abc")
a.MVCCPut(k:"a/123", v:"abc")
a.MVCCPut(k:"b", v:"abc")
a.MVCCPut(k:"b/123", v:"abc")
a.MVCCPut(k:"c", v:"abc")
a.Commit()

This snippet contains 169 characters.

there are 67 non-whitespace punctuation characters, that's about 40% of visual noise
a. is repeated 6 times, none of which is strictly necessary
v:"abc" is repeated 5 times, just one of which is strictly necessary
with would thus save you 57 more characters, so in reality you actually have 34% more visual noise.

74% of visual noise that's a bit much.

petermattis · 2019-11-08T18:00:19Z

This snippet contains 169 characters.

Is optimizing for the number of characters a goal? It wasn't for me. Your arguments here are also knocks on the Go syntax. On the one hand, those arguments are fine and reasonable, but the Go syntax is also the one CRL engineers spend the most time interacting with. Surely that day-to-day familiarity counts for something.

Is there a way we can get this discussion out of bikeshed territory? I'm going to reiterate my desire to hear what others think of the two alternatives or if there are ideas for something even better. I don't want to over-engineer this syntax, but I also don't want to quickly adopt something that we live with for years.

knz · 2019-11-08T18:40:37Z

I agree let's let the discussion of specifics sleep for a while.

In the meantime I extended the PR with the tests for intent histories, as needed by savepoints. I'll rebase the MVCC seqnum ignore list PR on top of this now.

nvanbenschoten · 2019-11-11T18:23:05Z

@knz one of my bigger confusions with this is why it contains transaction state management. In other words, why is this testing begin_txn, commit_txn, etc? None of that lives at the MVCC level, and that shows because the handlers for these operations are completely mocked out.

Is the idea there just to register transactions with the data-driven framework? If so, I would stay as far away from the names used by the corresponding Requests as possible.

This all does point towards an opportunity to introduce datadriven tests into storage/batcheval as well.

knz · 2019-11-11T20:18:38Z

why is this testing begin_txn, commit_txn, etc?

begin_txn is needed because the MVCC operations take a roachpb.Transaction as argument.

After checking I agree that we can dump abort_txn/commit_txn and have the mvcc tests resolve their intents explicitly.

knz · 2019-11-12T11:20:25Z

Simplified the code as suggested.

knz

I have responded to the review comments, see below. Will welcome your reactions and willing to change the code further based on them.

I also tentatively rebased this on top of #42736. Depending on the outcome of cockroachdb/datadriven#8 I might cancel this rebase.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @petermattis, @sumeerbhola, and @tbg)

pkg/storage/engine/mvcc_history_test.go, line 193 at r2 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

May wanna change the example now that commit_txn does not exist.

Done.

pkg/storage/engine/mvcc_history_test.go, line 48 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Does this belong in datadriven.Walk?

Done.

pkg/storage/engine/mvcc_history_test.go, line 72 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The meta: and data: prefixes feel redundant to me as the key itself also contains this info.

I agree it's redundant with the RHS, but it's also helpful to disambiguate this trace output from the individual commands' output, for example the get: ... prefix.

pkg/storage/engine/mvcc_history_test.go, line 148 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

The txn is only updated in response to the *_txn commands. Seems a little odd to print out its value as that is deterministic on the *_txn command implementations which are not MVCC code. Am I missing something here? (Very possible, this is a large change).

I introduced this dump to emphasize the current state of the txn object at each step of a script. This was in reaction to a troubleshooting session where I had mistakenly started with a txn in the wrong state, but I hadn't realized. My MVCC operation wasn't behaving like I wanted, and I wasn't sure where the problem came from. By adding the txn's currebt state to the trace, the reader can verify their assumptions about the MVCC op's input at each step.

pkg/storage/engine/mvcc_history_test.go, line 233 at r3 (raw file):

Was tracing too verbose to always leave on?

I found that it was, but that was subjective. I don' t have much arguments either way on this point.

have each operation output its results, which may be an error. This unifies the ok and error cases and allows progress to be made after an error occurs.

I think for expected errors this is reasonable. However from experience I have found our team too eager to accept a change in reference output when using TESTFLAGS=-rewrite. I think it is good hygiene to strongly distinguish when a test expects an error to occur (and have that error's in the reference output), from when no error is expected (in which case the error's occurrence will prevent datadriven from proceeding even when -rewrite is given).

pkg/storage/engine/mvcc_history_test.go, line 248 at r3 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Rather than calling reportResults here, perhaps there should be an explicit dump_data command. Btw, these suggestions are just that. This PR is so large that it is hard to know the repercussions of all of the suggestions.

I have considered this. Again the choice to dump the current state at the end of every directive was the result of a learning experience, where I found in many cases that a previous test was incorrectly written and was leaving the storage in a bad initial state for the next state. Without this automatic state dump, it's possible to miss that mistake and spend (too much) time focusing on the wrong place.

pkg/storage/engine/mvcc_history_test.go, line 368 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This seems to get status from the command invocation itself, not from the transaction passed in. This has the unintended side effect of calling MVCCResolveWriteIntent with Status: COMMITTED by default, even if the transaction itself is PENDING. Why not use status = txn.Status here?

Because the most common use case is to resolve intents (and remove them) as if the txn had committed. Mandating what you say would force the trace to use one more command in the common case.

pkg/storage/engine/mvcc_history_test.go, line 424 at r3 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

It would be nice if we could get the MVCCKey timestamp at which this read is happening, but the way the MVCCGet API works, that's not too straightforward.

I think it works? the value's timestamp is available in val. Done, PTAL?

petermattis

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @knz, @nvanbenschoten, @sumeerbhola, and @tbg)

pkg/storage/engine/mvcc_history_test.go, line 233 at r3 (raw file):

Previously, knz (kena) wrote…

Was tracing too verbose to always leave on?

I found that it was, but that was subjective. I don' t have much arguments either way on this point.

have each operation output its results, which may be an error. This unifies the ok and error cases and allows progress to be made after an error occurs.

I think for expected errors this is reasonable. However from experience I have found our team too eager to accept a change in reference output when using TESTFLAGS=-rewrite. I think it is good hygiene to strongly distinguish when a test expects an error to occur (and have that error's in the reference output), from when no error is expected (in which case the error's occurrence will prevent datadriven from proceeding even when -rewrite is given).

Ack. I don't feel strongly about this. I haven't noticed a problem with integrating the ok and error cases in Pebble, but that might be accidental. The more likely occurrence in my experience is that a test case generates wrong data that we encode into the expectation and nobody notices. Nothing much can be done about that with this style of testing.

pkg/storage/engine/mvcc_history_test.go, line 248 at r3 (raw file):

Previously, knz (kena) wrote…

I have considered this. Again the choice to dump the current state at the end of every directive was the result of a learning experience, where I found in many cases that a previous test was incorrectly written and was leaving the storage in a bad initial state for the next state. Without this automatic state dump, it's possible to miss that mistake and spend (too much) time focusing on the wrong place.

Ack.

petermattis

Looks good to me modulo a few minor comments.

Big remaining question: how did you convert the tests? How should I review the new test cases to make sure that you accurately converted them? Or perhaps @itsbilal already did this.

I'm looking forward to seeing this merged.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @knz, @nvanbenschoten, @sumeerbhola, and @tbg)

pkg/storage/engine/mvcc_history_test.go, line 34 at r4 (raw file):

// TestMVCCHistories verifies that sequences of MVCC reads and writes
// perform properly.

I think you should copy the commit message explaining the format of the test cases here. That message is super helpful, but commit messages are easy to miss after code is checked in.

pkg/storage/engine/mvcc_history_test.go, line 319 at r4 (raw file):

// commands is the list of all supported script commands.
var commands = map[string]cmd{
	"begin_txn":      {true, false, cmdBegin},

Nit: the two booleans here are a bit hard to read. There seem to be 3 states: txn,!data, !txn,data and !txn,!data. I wonder if the two booleans should be collapsed to a single enum:

type cmdType int

const (
  cmdTypeNone cmdType = iota
  cmdTypeTxn
  cmdTypeData
)

On the other hand, this is the only place the booleans are used, so this isn't terribly important.

pkg/storage/engine/mvcc_history_test.go, line 326 at r4 (raw file):

	"update_txn":     {true, false, cmdUpdateTxn},
	"restart_txn":    {true, false, cmdRestartTxn},
	"step_txn":       {true, false, cmdStepTxn},

All of the *txn* commands are special in the they aren't really MVCC operations. I wonder if they should be grouped by prefixing them with txn_. So instead of step_txn we'd have txn_step.

pkg/storage/engine/mvcc_history_test.go, line 335 at r4 (raw file):

	"scan":           {false, false, cmdScan},
	"increment":      {false, true, cmdIncrement},
	"check_intent":   {false, false, cmdCheckIntent},

Nit: I'd sort these members, and the corresponding functions below, to make it clear where additions should be added. That might just be my OCD speaking, though.

pkg/storage/engine/mvcc_history_test.go, line 481 at r4 (raw file):

	vals, _, intents, err := MVCCScan(e.ctx, e.engine, key, endKey, max, ts, opts)
	if err != nil {
		return err

Here and elsewhere there is an error, I think you should fmt.Fprintf the error, and fall through and allow it to be returned below. This will allow the tests to verify that when an error is returned from MVCCScan, nothing else is.

pkg/storage/engine/mvcc_history_test.go, line 529 at r4 (raw file):

	resolve, resolveStatus := e.getResolve()

	return e.withWriter("put", func(engine ReadWriter) error {

Shouldn't this be "increment" or "inc"?

pkg/storage/engine/testdata/mvcc_histories/clear_range, line 12 at r1 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Heh, the unix shell-like syntax is exactly what initially caught my eye here. I would respectfully disagree about your familiarity argument for that syntax.

I'd like to take a step away from arguing that either solution is perfect. Are there useful ideas we can take from both? I very much like the use of Go lexical constructs as it provides built-in support for comments, quoted strings, and other literals. I think you have a good point about keyword arguments which is even more powerful in the context of MVCC operations where the API is filled with methods containing numerous parameters. The with construct is also interesting in reducing boilerplate, though that runs against the explicitness philosophy that frequently runs through Go. Of course, we don't need to adopt that philosophy for the syntax here.

If I were to adopt this test case to Go-like syntax but with an extension for keyword arguments it would look like:
a = BeginTxn(ts=44)
a.MVCCPut(k:"a", v:"abc")
a.MVCCPut(k:"a/123", v:"abc")
a.MVCCPut(k:"b", v:"abc")
a.MVCCPut(k:"b/123", v:"abc")
a.MVCCPut(k:"c", v:"abc")
a.Commit()
I'm not sure how I'd make the with construct fit into this scheme. I'm not fond of some of the magic there (e,g. I'd always want it to be clear what txn is being operated on). Perhaps something like:
a = BeginTxn(ts=44)
args = {v:"abc"}
a.MVCCPut(k:"a", args...)
...
That doesn't help much if there is only one argument, but it could if there are many common ones.

Note that I'm not wedded to the syntax used by the Pebble metamorphic test, but I think you're overstating your claims of superiority for the syntax here. Also, the two of us are likely too invested in our proposals. Some of the other #storage folks should weigh in on this topic as they'll be supporting and extending these tests in the future. Cc @ajkr, @itsbilal, @sumeerbhola.

Lastly, I want to reiterate my appreciation for you getting the ball rolling here. I'm quibbling about surface details, but find the overall direction of this PR to be super exciting.

Apparently nobody else feels strongly about the syntax, so I'm going to give the thumbs up with proceeding as is.

andy-kimball · 2019-12-04T19:20:48Z

I love the datadriven approach to MVCC testing. Thoroughly testing at multiple component layers is a critical part of improving the stability and correctness of our product (the "onion model" of testing). And the datadriven approach encourages writing lots of tests, since it has nice syntax, and since you can easily keep results up-to-date by using --rewrite.

knz

how did you convert the tests?

I created one test file per converted test. The name of the test file is the same name as the original Go test, snake-cased. The order of operations in the test file is the same as the original go code.

How should I review the new test cases to make sure that you accurately converted them?

Put them side-by-side I suppose. If that makes your life easier, I can split the PR to have one commit per converted test.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @petermattis, @sumeerbhola, and @tbg)

pkg/storage/engine/mvcc_history_test.go, line 34 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I think you should copy the commit message explaining the format of the test cases here. That message is super helpful, but commit messages are easy to miss after code is checked in.

Done.

pkg/storage/engine/mvcc_history_test.go, line 319 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: the two booleans here are a bit hard to read. There seem to be 3 states: txn,!data, !txn,data and !txn,!data. I wonder if the two booleans should be collapsed to a single enum:
type cmdType int

const (
  cmdTypeNone cmdType = iota
  cmdTypeTxn
  cmdTypeData
)
On the other hand, this is the only place the booleans are used, so this isn't terribly important.

Done.

pkg/storage/engine/mvcc_history_test.go, line 326 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

All of the *txn* commands are special in the they aren't really MVCC operations. I wonder if they should be grouped by prefixing them with txn_. So instead of step_txn we'd have txn_step.

Good idea. Done.

pkg/storage/engine/mvcc_history_test.go, line 335 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: I'd sort these members, and the corresponding functions below, to make it clear where additions should be added. That might just be my OCD speaking, though.

Good idea. Done.

pkg/storage/engine/mvcc_history_test.go, line 481 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Here and elsewhere there is an error, I think you should fmt.Fprintf the error, and fall through and allow it to be returned below. This will allow the tests to verify that when an error is returned from MVCCScan, nothing else is.

I agree with returning the error below for the reason you explain. But there is no need to print it here - the main runner already prints the error when an error is expected by the test.

pkg/storage/engine/mvcc_history_test.go, line 529 at r4 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Shouldn't this be "increment" or "inc"?

d'oh. Yes. Fixed.

petermattis

Put them side-by-side I suppose. If that makes your life easier, I can split the PR to have one commit per converted test.

Splitting the PR up isn't necessary. I think this PR should be merged so we can make progress on reviewing @itsbilal's PR. Concurrently, I'll check the tests which I expect to take a day or two. The expectation is that if I find anything we can fix that up in a follow-on PR. Sounds good?

Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @sumeerbhola, and @tbg)

itsbilal

I did review tests a while back (not side-by-side but on the basis of what they seemed to be doing). Side-by-side with Go would get us more confidence in being able to remove the existing Go-written tests. Let me know if you're doing this exercise @petermattis, I will probably be doing it either way as a follow up.

Reviewed 24 of 26 files at r2, 1 of 1 files at r3.
Reviewable status: complete! 1 of 0 LGTMs obtained (and 1 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @sumeerbhola, and @tbg)

knz · 2019-12-05T15:48:15Z

I'll rebase after #42987 merges otherwise CI will never succeed.

Previously MVCC tests were hand-coded in Go. This was making it hard(er) to introduce new tests or modify existing tests. This commit improves upon this situation by introducing a new datadriven test, `TestMVCCHistories`, which runs MVCC tests written using a DSL: ``` begin_txn t=<name> [ts=<int>[,<int>]] remove_txn t=<name> resolve_intent t=<name> k=<key> [status=<txnstatus>] restart_txn t=<name> update_txn t=<name> t2=<name> step_txn t=<name> [n=<int>] advance_txn t=<name> ts=<int>[,<int>] txn_status t=<name> status=<txnstatus> check_intent k=<key> [none] put [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> v=<string> [raw] cput [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> v=<string> [raw] [cond=<string>] increment [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> [inc=<val>] del [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> get [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> [inconsistent] [tombstones] scan [t=<name>] [ts=<int>[,<int>]] [resolve [status=<txnstatus>]] k=<key> [end=<key>] [inconsistent] [tombstones] [reverse] merge [ts=<int>[,<int>]] k=<key> v=<string> [raw] clear_range k=<key> end=<key> ``` Where `<key>` can be a simple string, or a string prefixed by the following characters: - `=foo` means exactly key `foo` - `+foo` means `Key(foo).Next()` - `-foo` means `Key(foo).PrefixEnd()` Additionally, the pseudo-command `with` enables sharing a group of arguments between multiple commands, for example: ``` with t=A begin_txn with k=a put v=b resolve_intent ``` Release note: None

knz · 2019-12-05T22:06:15Z

bors r=itsbilal,petermattis

42250: storage/engine: introduce a datadriven framework to run MVCC tests r=itsbilal,petermattis a=knz Previously MVCC tests were hand-coded in Go. This was making it hard(er) to introduce new tests or modify existing tests. This commit improves upon this situation by introducing a new datadriven test, `TestMVCCHistories`, which runs MVCC tests written using a DSL: ``` begin_txn t=<name> [ts=<int>[,<int>]] remove_txn t=<name> resolve_intent t=<name> k=<key> [status=<txnstatus>] restart_txn t=<name> update_txn t=<name> t2=<name> step_txn t=<name> [n=<int>] advance_txn t=<name> ts=<int>[,<int>] txn_status t=<name> status=<txnstatus> check_intent k=<key> [none] put [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> v=<string> [raw] cput [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> v=<string> [raw] [cond=<string>] increment [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [inc=<val>] del [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> get [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [inconsistent] [tombstones] scan [t=<name>] [ts=<int>[,<int>]] [resolve] k=<key> [end=<key>] [inconsistent] [tombstones] [reverse] merge [ts=<int>[,<int>]] k=<key> v=<string> [raw] clear_range k=<key> end=<key> ``` Where `<key>` can be a simple string, or a string prefixed by the following characters: - `=foo` means exactly key `foo` - `+foo` means `Key(foo).Next()` - `-foo` means `Key(foo).PrefixEnd()` Additionally, the pseudo-command `with` enables sharing a group of arguments between multiple commands, for example: ``` with t=A begin_txn with k=a put v=b resolve_intent ``` Release note: None 42999: colexec: cfetcher errors on setup when indexed column contains unhandled type r=yuzefovich a=rohany Fixes #42994. Indexed columns are always decoded by the fetcher, even if they are unneeded. This PR adds the check to cfetcher initialization to error out if an indexed column is of an unneeded type. Release note (bug fix): Fixed a bug where scanning an index of an unsupported type with the vectorized engine would lead to an internal error. Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>

craig · 2019-12-05T23:07:06Z

Build succeeded

GitHub CI (Cockroach)

petermattis

A few random comments. Still working my way through the tests.

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajkr, @itsbilal, @knz, @nvanbenschoten, @sumeerbhola, and @tbg)

pkg/storage/engine/testdata/mvcc_histories/get_negative_timestamp, line 10 at r6 (raw file):

get k=k ts=-1
----
error: (*withstack.withStack:) cannot write to "k" at timestamp 0.-00000001,0

What's with the oddly formatted timestamp? 0.-0...?

pkg/storage/engine/testdata/mvcc_histories/put_with_txn, line 5 at r6 (raw file):

  txn_begin ts=0,1
  put  v=v
  get

Shouldn't this have a ts=0,1? Or am I misreading TestMVCCPutWithTxn? Or is that timestamp implicit if no timestamp is specified? I think it would be clearer to be explicit here.

pkg/storage/engine/testdata/mvcc_histories/put_with_txn, line 13 at r6 (raw file):

get: "k" -> /BYTES/v @0.000000000,1
>> at end:
txn: "A" meta={id=00000000 key="k" pri=0.00000000 epo=0 ts=0.000000000,1 min=0.000000000,0 seq=0} rw=true stat=PENDING rts=0.000000000,1 wto=false max=0.000000000,0

There are a lot of 0.00000000 values in these tests. It would massively reduce visual clutter if they could be abbreviated somehow.

knz

Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @ajkr, @itsbilal, @nvanbenschoten, @petermattis, @sumeerbhola, and @tbg)

pkg/storage/engine/testdata/mvcc_histories/get_negative_timestamp, line 10 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

What's with the oddly formatted timestamp? 0.-0...?

This is the standard ts printer. I agree it's suspicious but not specific to this PR. Filed #43183 to track.

pkg/storage/engine/testdata/mvcc_histories/put_with_txn, line 5 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Shouldn't this have a ts=0,1? Or am I misreading TestMVCCPutWithTxn? Or is that timestamp implicit if no timestamp is specified? I think it would be clearer to be explicit here.

I left the ts=0,1 out for the first one because it happens to also be the value of ReadTimestamp for the txn (txn1 in the original code). However I agree it's clearer to list it explicitly. I'll send a PR.

pkg/storage/engine/testdata/mvcc_histories/put_with_txn, line 13 at r6 (raw file):

Previously, petermattis (Peter Mattis) wrote…

There are a lot of 0.00000000 values in these tests. It would massively reduce visual clutter if they could be abbreviated somehow.

These are also the standard printers. I agree these could be improved. Also mentioned in #43183.

@petermattis

43184: storage/engine: clarify the MVCC history tests r=knz a=knz Suggested by @petermattis in #42250 (review). This clarifies the `get` timestamps in `put_with_txn`. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

@petermattis

43185: hlc: improve/fix the representation of timestamps r=knz a=knz Requested by @petermattis in #42250 (review) Informs #43183. Previously, a negative timestamp would be printed with a negative sign in the middle, for example `0.-000000123` or `-123.-00000456`. This patch fixes this by only emitting the sign once. Additionally, this patch simplifies zero timestamps to just `"0"`. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

knz requested review from tbg and nvanbenschoten November 6, 2019 21:23

knz force-pushed the 20191106-mvcc-histories branch 2 times, most recently from b837257 to a69d2ef Compare November 6, 2019 21:28

petermattis requested a review from itsbilal November 7, 2019 14:54

knz force-pushed the 20191106-mvcc-histories branch from a69d2ef to 0264f46 Compare November 7, 2019 15:34

petermattis reviewed Nov 7, 2019

View reviewed changes

knz commented Nov 8, 2019

View reviewed changes

knz force-pushed the 20191106-mvcc-histories branch from 0264f46 to ba07d22 Compare November 8, 2019 09:55

knz mentioned this pull request Nov 8, 2019

storage/engine: TestMVCCPutNewEpochLowerSequence does not do what it says #42310

Closed

knz force-pushed the 20191106-mvcc-histories branch 2 times, most recently from 7407590 to 6e909df Compare November 8, 2019 12:01

petermattis reviewed Nov 8, 2019

View reviewed changes

knz force-pushed the 20191106-mvcc-histories branch 2 times, most recently from 652e133 to 96942da Compare November 8, 2019 16:48

knz force-pushed the 20191106-mvcc-histories branch from 96942da to 5f997d5 Compare November 8, 2019 18:37

knz force-pushed the 20191106-mvcc-histories branch from 5f997d5 to f0adb43 Compare November 8, 2019 23:19

knz mentioned this pull request Nov 8, 2019

libroach,storage: extend MVCC to support ignored seqnum ranges #42152

Closed

knz force-pushed the 20191106-mvcc-histories branch from f0adb43 to 67fcad2 Compare November 11, 2019 10:38

knz force-pushed the 20191106-mvcc-histories branch from 67fcad2 to 1690cc6 Compare November 11, 2019 19:12

knz force-pushed the 20191106-mvcc-histories branch from 1690cc6 to 8da7a1f Compare November 12, 2019 11:20

knz force-pushed the 20191106-mvcc-histories branch from 8da7a1f to 037261e Compare November 12, 2019 11:21

knz mentioned this pull request Nov 14, 2019

Misc. datadriven improvements cockroachdb/datadriven#2

Merged

knz requested review from a team as code owners November 25, 2019 20:26

knz commented Nov 25, 2019

View reviewed changes

knz force-pushed the 20191106-mvcc-histories branch 2 times, most recently from 4310860 to 28eeee7 Compare November 26, 2019 15:24

petermattis reviewed Nov 26, 2019

View reviewed changes

knz force-pushed the 20191106-mvcc-histories branch from 28eeee7 to 255a7ae Compare November 28, 2019 09:20

knz mentioned this pull request Nov 28, 2019

Implement subtest sections in input files cockroachdb/datadriven#11

Merged

knz force-pushed the 20191106-mvcc-histories branch from 255a7ae to 79071de Compare November 29, 2019 16:43

petermattis reviewed Dec 4, 2019

View reviewed changes

knz force-pushed the 20191106-mvcc-histories branch from 79071de to ec0a79a Compare December 5, 2019 14:35

knz commented Dec 5, 2019

View reviewed changes

petermattis approved these changes Dec 5, 2019

View reviewed changes

itsbilal approved these changes Dec 5, 2019

View reviewed changes

knz force-pushed the 20191106-mvcc-histories branch from ec0a79a to f7918fd Compare December 5, 2019 15:54

knz force-pushed the 20191106-mvcc-histories branch from f7918fd to 09b8355 Compare December 5, 2019 22:05

craig bot merged commit 09b8355 into cockroachdb:master Dec 5, 2019

petermattis reviewed Dec 16, 2019

View reviewed changes

knz mentioned this pull request Dec 16, 2019

storage: improve the txnmeta and ts pretty-printers #43183

Open

knz commented Dec 16, 2019

View reviewed changes

This was referenced Dec 16, 2019

storage/engine: clarify the MVCC history tests #43184

Merged

hlc: improve/fix the representation of timestamps #43185

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

storage/engine: introduce a datadriven framework to run MVCC tests #42250

storage/engine: introduce a datadriven framework to run MVCC tests #42250

knz commented Nov 6, 2019 •

edited

Loading

cockroach-teamcity commented Nov 6, 2019

petermattis left a comment

knz left a comment

petermattis left a comment

knz commented Nov 8, 2019 •

edited

Loading

petermattis commented Nov 8, 2019

knz commented Nov 8, 2019

nvanbenschoten commented Nov 11, 2019

knz commented Nov 11, 2019

knz commented Nov 12, 2019

knz left a comment

petermattis left a comment

petermattis left a comment

andy-kimball commented Dec 4, 2019

knz left a comment

petermattis left a comment

itsbilal left a comment

knz commented Dec 5, 2019

knz commented Dec 5, 2019

craig bot commented Dec 5, 2019

petermattis left a comment

knz left a comment

storage/engine: introduce a datadriven framework to run MVCC tests #42250

storage/engine: introduce a datadriven framework to run MVCC tests #42250

Conversation

knz commented Nov 6, 2019 • edited Loading

cockroach-teamcity commented Nov 6, 2019

petermattis left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

knz commented Nov 8, 2019 • edited Loading

petermattis commented Nov 8, 2019

knz commented Nov 8, 2019

nvanbenschoten commented Nov 11, 2019

knz commented Nov 11, 2019

knz commented Nov 12, 2019

knz left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

andy-kimball commented Dec 4, 2019

knz left a comment

Choose a reason for hiding this comment

petermattis left a comment

Choose a reason for hiding this comment

itsbilal left a comment

Choose a reason for hiding this comment

knz commented Dec 5, 2019

knz commented Dec 5, 2019

craig bot commented Dec 5, 2019

Build succeeded

petermattis left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz commented Nov 6, 2019 •

edited

Loading

knz commented Nov 8, 2019 •

edited

Loading