Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

libroach,storage: extend MVCC to support ignored seqnum ranges #42152

Closed
wants to merge 0 commits into from

Conversation

knz
Copy link
Contributor

@knz knz commented Nov 4, 2019

Fixes #41612. (At least, aims to)

The MVCC code already had rudimentary understanding of sequence
numbers to allow reads to ignore writes at greater seqnums.

To implement SQL savepoint rollbacks, we must also support
ignoring writes that fall in ignored ranges of seqnums.

To achieve this, this commit extends the mvccScanner for
RocksDB (Pebble code remains to be done) to account for
ignored seqnum ranges, and also extends MVCCResolveWriteIntent
to collapse an intent to the last write that has not been marked
to be ignored by a savepoint rollback.

Release note: None

@cockroach-teamcity
Copy link
Member

This change is Reviewable

@knz
Copy link
Contributor Author

knz commented Nov 4, 2019

Here's an embryo of an approach, with several questions open:

  • unit tests are missing. I will add those shortly
  • is the algorithm correct. The unit tests will help determine this, of course, but an analytical review would not hurt.
  • is the binding with C++ adequate?
  • is the behavior wrt the revert of deletes in MVCCResolveWriteIntent OK?


// Otherwise, we place back the write at that history entry
// back into the intent.
meta.Deleted = len(meta.IntentHistory[i].Value) == 0
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This line in particular makes me a little bit afraid.
The intent history does not distinguish between an empty value array, and a Delete intent. Does this matter?
@nvanbenschoten

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, that doesn't matter. An empty value is a delete.

Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good so far, though I'm less confident about how a rollback affects the intent history. It's possible it works out fine in practice, though.

Reviewed 12 of 12 files at r1.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz and @petermattis)


c-deps/libroach/mvcc.h, line 235 at r1 (raw file):

           // try the search again, using the current position as new
           // upper bound.
           end = intent_pos;

Given the intent history is sorted by sequence numbers anyway, it might be faster and simpler to just iterate back from intent_pos until we hit an unignored sequence number, instead of doing a binary/range search again.


pkg/storage/engine/mvcc.go, line 2776 at r1 (raw file):

	// has been rolled back.
	if i < 0 {
		err := engine.Clear(latestKey)

You would also need to clear the intent, which lives at MVCCMetadataKey(latestKey.Key).


pkg/storage/engine/mvcc.go, line 2783 at r1 (raw file):

	// back into the intent.
	meta.Deleted = len(meta.IntentHistory[i].Value) == 0
	meta.ValBytes = int64(len(meta.IntentHistory[i].Value))

The fact that we just switch up the current value and roll back a part of the intent history, while also leaving the history unmodified, feels a bit weird. It's probably correct, still, but I'd want to explore how this code interacts with getFromIntentHistory more closely. Did you consider either 1) slicing away the history to only include the part up to i (then doing a putMeta to update it), or 2) appending the existing value to the intent history (even if it's an ignored seqnum), putMeta-ing that, then sliding in the new value at latestKey ? Approach 2 is similar to how mvccPuts overwrite existing values in the same txn.


pkg/storage/engine/mvcc.go, line 2784 at r1 (raw file):

	meta.Deleted = len(meta.IntentHistory[i].Value) == 0
	meta.ValBytes = int64(len(meta.IntentHistory[i].Value))
	meta.RawBytes = meta.IntentHistory[i].Value

meta.RawBytes is meant for inlined values at the 0 timestamp. I don't think that's what you want to set here. It may make the read path shortcircuit early and return this even when it's not the right response.


pkg/storage/engine/mvcc.go, line 2786 at r1 (raw file):

	meta.RawBytes = meta.IntentHistory[i].Value
	// And also overwrite whatever was there in storage.
	err = engine.Put(latestKey, meta.RawBytes)

Do a put at the MVCC meta timestamp before this, to update the metadata there. See how MVCCPut updates the metadata, or the one case in mvccResolveWriteIntents that does a putMeta.

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


c-deps/libroach/mvcc.h, line 235 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Given the intent history is sorted by sequence numbers anyway, it might be faster and simpler to just iterate back from intent_pos until we hit an unignored sequence number, instead of doing a binary/range search again.

This was Nathan's idea. I'll let you two optimize this for performance once I get the correctness nailed down.


pkg/storage/engine/mvcc.go, line 2776 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

You would also need to clear the intent, which lives at MVCCMetadataKey(latestKey.Key).

Done.


pkg/storage/engine/mvcc.go, line 2783 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The fact that we just switch up the current value and roll back a part of the intent history, while also leaving the history unmodified, feels a bit weird. It's probably correct, still, but I'd want to explore how this code interacts with getFromIntentHistory more closely. Did you consider either 1) slicing away the history to only include the part up to i (then doing a putMeta to update it), or 2) appending the existing value to the intent history (even if it's an ignored seqnum), putMeta-ing that, then sliding in the new value at latestKey ? Approach 2 is similar to how mvccPuts overwrite existing values in the same txn.

Done.


pkg/storage/engine/mvcc.go, line 2784 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

meta.RawBytes is meant for inlined values at the 0 timestamp. I don't think that's what you want to set here. It may make the read path shortcircuit early and return this even when it's not the right response.

Done.


pkg/storage/engine/mvcc.go, line 2786 at r1 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Do a put at the MVCC meta timestamp before this, to update the metadata there. See how MVCCPut updates the metadata, or the one case in mvccResolveWriteIntents that does a putMeta.

not needed, I rearranged the code to capitalize on the regular path instead.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @nvanbenschoten, and @petermattis)


pkg/storage/engine/rocksdb.go, line 2637 at r2 (raw file):

		r.sequence = C.int32_t(txn.Sequence)
		r.max_timestamp = goToCTimestamp(txn.MaxTimestamp)
		r.ignored_seqnums = goToCIgnoredSeqNums(txn.IgnoredSeqNums)

I'll let @itsbilal continue with the majority of this review. Just wanted to chime in to say I took a look at the way you're passing the ignored seqnum ranges to C++ and everything looks copacetic. Let me know if there are other bits you want me to scrutinize.

@knz knz force-pushed the 20191104-mvcc-rollback branch 2 times, most recently from 2172e5d to eea6608 Compare November 6, 2019 17:23
Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks good to me so far. I assume we'll get more confident with this as it's better tested, but I don't see any obvious issues with the way it's structured now. Good work!

Reviewed 1 of 6 files at r2, 9 of 9 files at r3.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @knz)

@knz
Copy link
Contributor Author

knz commented Nov 8, 2019

The testing I am starting to implement is revealing that

  1. get and scan work fine (yay!)
  2. put and cput are heavily broken (bleh)
  3. haven't tested delete yet

The logic for cput makes all kinds of assumptions for optimization purposes, and these assumptions break if a seqnum has been reverted.

@knz
Copy link
Contributor Author

knz commented Nov 8, 2019

Ok I succeeded in hammering CPut into behaving properly. See the ignored_seq_nums history tests.

This required reworking the "check previous value" path on both Put and CPut.
Also the tests revealed a small mistake in my previous C++ change.

A new thorough review is probably required.

(Still remains to be done: Delete and perhaps other operators that are used transactionally in SQL).

@knz knz force-pushed the 20191104-mvcc-rollback branch 2 times, most recently from 5cc6968 to 011e4b3 Compare November 8, 2019 23:20
@knz
Copy link
Contributor Author

knz commented Nov 8, 2019

@itsbilal: this is about the point where I would enjoy you picking up the remainder of the work.

This is because the PR/branch as-is is now approximately at a point where I can use it as basis for further KV/SQL changes. So I would like to do just this: detach my attention away from storage and simply assume that the branch as-is is "good enough" to start prototyping savepoints elsewhere.

From here onward you could then proceed with pulling the PR forward to a point of completion that makes you happy about correctness, initial performance and pebble compatibility.

(What I can do for you, before you pick it up, is producing reference unit tests for the desired Delete behavior.)

@knz
Copy link
Contributor Author

knz commented Nov 9, 2019

@andreimatei maybe you can help me investigate the TxnCoordSenderRetries error that my PR introduced? That's the test we were talking about yesterday.

@andreimatei
Copy link
Contributor

@andreimatei maybe you can help me investigate the TxnCoordSenderRetries error that my PR introduced?

Well, superficially, the test fails because the CPut below apparently doesn't see the value set by the beforeTxnStart() function above it.

b.CPut("c", "cput", strToValue("value"))

And the CPut should see that value... It's supposed to be written below the transaction's read timestamp, so things are clear - the value should be seen.
I've verified that if the CPut txn doesn't get its write timestamp pushed, then the CPut sees the value fine. (I've verified by both removing the DelRange and separately by removing the afterTxnStart() Get that causes the DelRange to be pushed).

So, I guess, your patch somehow causes a CPut operating at a pushed timestamp to misbehave. Why that is, I haven't looked into. But I can if you want me to.

@knz
Copy link
Contributor Author

knz commented Nov 11, 2019

I was able to repro the TxnCoordSender error inside the mvcc package directly. Investigating.

@knz
Copy link
Contributor Author

knz commented Nov 11, 2019

d'oh. Found it. I was forgetting to reset the list of ignored seqnums upon txn restarts.

@knz
Copy link
Contributor Author

knz commented Nov 11, 2019

Well that was one thing, but there is something else beyond that. Still investigating...

@andreimatei
Copy link
Contributor

andreimatei commented Nov 11, 2019 via email

@knz
Copy link
Contributor Author

knz commented Nov 11, 2019

FWIW, the TCS test was not restarting a transaction.

Yes it was, if my reading is correct (Also I've reproduced the problem, with a restart involved.)

either we keep the epoch number, in which case we don't need to reset the ignored list,

we don't need but it doesn't hurt either (it won't be used if the intent is from a different epoch).

or we overwrite the ignore list with 1.. current seq num and we get rid of the epoch field. Right?

Yes, agreed.

@knz
Copy link
Contributor Author

knz commented Nov 11, 2019

Further discussion with Nathan reveals the PR was incomplete as it did not properly extend mvccGetInternal. Investigating.

@knz knz force-pushed the 20191104-mvcc-rollback branch 2 times, most recently from a946ae5 to fd34e08 Compare November 11, 2019 22:44
Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)


c-deps/libroach/mvcc.h, line 171 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I'd leave a TODO mentioning that you could use binary search here to improve the complexity.

Done


c-deps/libroach/include/libroach.h, line 67 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/TxnMeta_IgnoredSeqNumRange/IgnoredSeqNumRange/

Done.


pkg/roachpb/data.proto, line 484 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Keep this at the bottom.

Done


pkg/roachpb/data.proto, line 492 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: rewrap this line

Done


pkg/storage/engine/mvcc.go, line 1258 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What's up with all of the spacing added to this file?

Where I was doing logging during development.


pkg/storage/engine/mvcc.go, line 1524 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I don't understand what "last write inside txn" means and how that relates to transaction epochs. Do you mean that the last write was written in the current epoch of txn? If so, I'd make that more specific.

Done


pkg/storage/engine/mvcc.go, line 1525 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Make a note that this is the common case.

Done


pkg/storage/engine/mvcc.go, line 2786 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"MVCC read"?

write. Fixed.


pkg/storage/engine/mvcc.go, line 2821 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/no new value any more/mvccRewriteIntentHistory already rewrote the key at the correct timestamp and adjusted the stats/

This brings up the question, did mvccRewriteIntentHistory adjust the stats? It doesn't look like it.

I think it does. I think the control path ensures the stats get adjusted properly in this case. Hopefully Bilal can double check.


pkg/storage/engine/mvcc.go, line 2942 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Add a bit more here. Rolls it back to what?

Clarified.


pkg/storage/engine/mvcc.go, line 2950 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: s/clear/cleared/

In fact, why don't we add a sentence about this to the function's comment.

Done


pkg/storage/engine/mvcc.go, line 2951 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

If we're going to push this in here then the function should be called mvccMaybeRewriteIntentHistory.

Done


pkg/storage/engine/mvcc.go, line 2982 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What if what was in storage is at a different timestamp? I don't think this is correctly handling the case where the transaction commits at a different timestamp than it wrote its intents at.

I conceptually understand what you say but fuzzy on the details. Bilal please pick this up.


pkg/storage/engine/mvcc_history_test.go, line 421 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Is it expected in this framework for commands like these to combine or for them to overwrite each other?

For this command it overwrites.


pkg/storage/engine/enginepb/mvcc3.proto, line 120 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"inclusive on both ends"

Done


pkg/storage/intentresolver/intent_resolver.go, line 372 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We're missing similar logic in resolveLocalIntents, which is a fast-path for resolving intents on a transaction record's local range in the same Raft entry as the one that moves the transaction record to committed.

Fixed

@knz knz force-pushed the 20191104-mvcc-rollback branch 4 times, most recently from 7d0cb02 to 6ce7b1c Compare December 18, 2019 14:30
@knz
Copy link
Contributor Author

knz commented Dec 18, 2019

I have removed the "debugging" field IgnoredSeqNumsInitialized and added the missing tests.

@nvanbenschoten can you check whether the intent resolution and batch eval tests I have added are OK.

In this process I have also deprecated direct construction of roachpb.Intent{} and introduced two new constructors instead, which will constitute a more robust guardrail in the future.

(@itsbilal this is also good for rebase)

@knz knz force-pushed the 20191104-mvcc-rollback branch 3 times, most recently from 5b635a7 to 6fe53e7 Compare December 18, 2019 21:14
Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you check whether the intent resolution and batch eval tests I have added are OK.

The ones you added are good, though I think a few other ones are needed. See my comments below.

Reviewed 36 of 37 files at r7.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)


c-deps/libroach/mvcc.h, line 171 at r6 (raw file):

Previously, knz (kena) wrote…

Done

s/nathan/nvanbenschoten/`


pkg/roachpb/data.go, line 2130 at r7 (raw file):

// MakeIntent makes an intent from the given span and txn.
func MakeIntent(txn Transaction, span Span) Intent {

Let's change all of these functions to take their transaction protos by reference. None of them will cause them to escape. That said, they should all be inlinable anyway, but I think it's still worth doing.


pkg/roachpb/data.go, line 2139 at r7 (raw file):

// span and txn. This is suitable for use when constructing
// WriteIntentError.
func MakeErrorIntent(txn enginepb.TxnMeta, span Span) Intent {

I would rename this to MakePendingIntent. I have no clue what an "error intent" is.


pkg/roachpb/data.proto, line 489 at r7 (raw file):

  // where a range's end seqnum is equal to the next range's start
  // seqnum), and sorted in seqnum order.
  repeated storage.engine.enginepb.IgnoredSeqNumRange ignored_seqnums = 18

We want this to be stored in the transaction record as well, right? So it should be in the TransactionRecord proto as well.

This indicates that we're missing a test where a transaction commits but doesn't resolve its intents. Another transaction then pushes it and is forced to resolve its intents for it. This intent resolution needs to know about the sequence numbers that were rolled back. I believe that most of the tests at this level are in replica_test.go.


pkg/storage/batcheval/cmd_end_transaction_test.go, line 1059 at r7 (raw file):

}

func TestPartialRollbackOnEndTransaction(t *testing.T) {

Give this test a quick description.


pkg/storage/batcheval/cmd_end_transaction_test.go, line 1084 at r7 (raw file):

	// Write a first value at key.
	v.SetString("a")
	txn.Sequence = 0

nit: the first write will always have seq #1, not #0.


pkg/storage/batcheval/cmd_end_transaction_test.go, line 1110 at r7 (raw file):

			desc: &desc,
			canCreateTxnFn: func() (bool, hlc.Timestamp, roachpb.TransactionAbortedReason) {
				//t.Fatal("CanCreateTxnRecord unexpectedly called")

Remove this.


pkg/storage/batcheval/cmd_resolve_intent_test.go, line 239 at r7 (raw file):

}

func TestResolveIntentAfterPartialRollback(t *testing.T) {

Same comment about leaving a short description here.


pkg/storage/engine/mvcc.go, line 2982 at r6 (raw file):

Previously, knz (kena) wrote…

I conceptually understand what you say but fuzzy on the details. Bilal please pick this up.

Reminder that this still needs to be looked at.


pkg/storage/engine/mvcc.go, line 2775 at r7 (raw file):

	// epoch and the state is still in progress but the intent was not pushed
	// to a larger timestamp, and if the rollback code did not modify or collapse
	// the intent.

Do we have any tests that try to resolve an intent with a PENDING status but where the current value has been collapsed? In other words, do we test the inProgress && !pushed && rolledBackVal == nil && collapsedIntent case?


pkg/storage/engine/enginepb/mvcc_test.go, line 78 at r7 (raw file):

func TestTxnSeqIsIgnored(t *testing.T) {
	type s = enginepb.TxnSeq
	type r = enginepb.IgnoredSeqNumRange

I haven't seen that pattern better. I like it!

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)


pkg/roachpb/data.proto, line 489 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

We want this to be stored in the transaction record as well, right? So it should be in the TransactionRecord proto as well.

This indicates that we're missing a test where a transaction commits but doesn't resolve its intents. Another transaction then pushes it and is forced to resolve its intents for it. This intent resolution needs to know about the sequence numbers that were rolled back. I believe that most of the tests at this level are in replica_test.go.

wut watnow

What is this TransactionRecord thing that I've never seen nor heard about before.

I find it informative that I was able to come this far and see so many tests succeed without even knowing about this. Is it possible that that thing is actually useless and could be removed?

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)


c-deps/libroach/mvcc.h, line 171 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

s/nathan/nvanbenschoten/`

Done.


pkg/roachpb/data.go, line 2130 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Let's change all of these functions to take their transaction protos by reference. None of them will cause them to escape. That said, they should all be inlinable anyway, but I think it's still worth doing.

Done.

nit: You using "let's" tells me you want to participate, so I must wait until you push an amend to this commit. I doubt this is really what you want.

If that is what you want, say "please do this". It's more likely than not that I will follow your lead.
If you're in doubt, use "I would prefer" or "I would rather". I would always consider your preference carefully.

Either is clearer for me and more to the point.


pkg/roachpb/data.go, line 2139 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I would rename this to MakePendingIntent. I have no clue what an "error intent" is.

Done.


pkg/roachpb/data.go, line 2148 at r7 (raw file):

// SetTxn updates the transaction details in the intent.
func (i *Intent) SetTxn(txn Transaction) {

@nvanbenschoten SetTxn is called in updateIntentTxnStatus (intent_resolver.go) using a Transaction object pulled from pushedTxns map[...]roachpb.Transaction.

Using a reference in the argument risks forcing a heap allocation, wouldn't it?


pkg/roachpb/data.proto, line 489 at r7 (raw file):

Previously, knz (kena) wrote…

wut watnow

What is this TransactionRecord thing that I've never seen nor heard about before.

I find it informative that I was able to come this far and see so many tests succeed without even knowing about this. Is it possible that that thing is actually useless and could be removed?

Ok so I performed the change on TransactionRecord.
I the test about its persistence as part of the testing in cmd_end_transaction_test.go to verify the ignore list is persisted. PTAL at that.

Regarding the other thing you say: "a transaction commits but doesn't resolve its intents. Another transaction then pushes it and is forced to resolve its intents for it."
Can you point me to the code where another txn resolves intents after a push? I want to review this code.

As to how to do this testing. As per the explanation in TestPushTxnAlreadyCommittedOrAborted I see the following 3 cases:

  • the pusher finds the pushee to be aborted. In this case we don't care about anything because the writes will be discarded anyway.

  • the pusher finds the pushee to be committed. In this case:

    • either there is a persisted TransactionRecord, in which case the ignore list will be populated properly. I suppose I could add a test for this (please confirm).
    • or, there is no persisted TransactionRecord, in which case we're fine: this is only possible if the pushee has performed intent resolution synchronously itself (accounting for the ignore list).
  • the pusher finds the pushee to be pending. In this case I believe we're fine too because the pushee will not attempt intent resolution in this case. Can you confirm?


pkg/storage/batcheval/cmd_end_transaction_test.go, line 1059 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Give this test a quick description.

I like the "Give" here. Good 👍

Done.


pkg/storage/batcheval/cmd_end_transaction_test.go, line 1084 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: the first write will always have seq #1, not #0.

Thanks, fixed.


pkg/storage/batcheval/cmd_end_transaction_test.go, line 1110 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Remove this.

I 💖 the imperative.
Done.


pkg/storage/batcheval/cmd_end_transaction_test.go, line 1110 at r8 (raw file):

		// list is properly persisted in the stored transaction record.
		txnKey := keys.TransactionKey(txn.Key, txn.ID)
		if storeTxnBeforeEndTxn {

@nvanbenschoten this is new, PTAL.


pkg/storage/batcheval/cmd_resolve_intent_test.go, line 239 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Same comment about leaving a short description here.

Done.


pkg/storage/batcheval/transaction.go, line 156 at r8 (raw file):

// TxnMeta. Proceeding to KV reads or intent resolution without this
// information would cause a partial rollback, if any, to be reverted
// and yield inconsistent data.

@nvanbenschoten can you check this comment and confirm the following: a synthetized Transaction is currently never used to perform intent resolution or KV reads, or is it?


pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums, line 1 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

When are you planning to address this limitation?

This is lifted in #42582.


pkg/storage/intentresolver/intent_resolver.go, line 372 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

This doesn't appear to be tested to me. We have a lot of low-level MVCC tests, but I think we also need some replica level tests that issue EndTransaction requests and verify that intents are partially rolled back as expected. These can be written either at the pkg/storage level (see replica_test.go) or at the pkg/storage/batcheval level (see cmd_end_transaction_test.go).

Wasn't this precisely what the new test in cmd_end_transaction_test.go is checking?

Copy link
Contributor Author

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itsbilal there are still 3 points below for you to look into for #42582 when you come back. Thanks.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)


pkg/storage/engine/mvcc.go, line 2821 at r6 (raw file):

Previously, knz (kena) wrote…

I think it does. I think the control path ensures the stats get adjusted properly in this case. Hopefully Bilal can double check.

@itsbilal can you double-check this again.


pkg/storage/engine/mvcc.go, line 2982 at r6 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Reminder that this still needs to be looked at.

@itsbilal can you review this.


pkg/storage/engine/mvcc.go, line 2775 at r7 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

Do we have any tests that try to resolve an intent with a PENDING status but where the current value has been collapsed? In other words, do we test the inProgress && !pushed && rolledBackVal == nil && collapsedIntent case?

I don't know. @itsbilal can you look into this? Thanks.

Copy link
Member

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)


pkg/storage/engine/mvcc.go, line 2821 at r6 (raw file):

Previously, knz (kena) wrote…

@itsbilal can you double-check this again.

Updated the control path in #42582 for the collapsedIntent case so that it now falls down to the intent abort case, where stats are also updated.


pkg/storage/engine/mvcc.go, line 2982 at r6 (raw file):

Previously, knz (kena) wrote…

@itsbilal can you review this.

This code is run before the MVCC value moves timestamps, so writing at the old timestamp (i.e. latestKey) is fine. There was one caveat where the caller wasn't seeing restoredVal, so (in #42582) I updated this code to return the restoredVal in that case.


pkg/storage/engine/mvcc.go, line 2775 at r7 (raw file):

Previously, knz (kena) wrote…

I don't know. @itsbilal can you look into this? Thanks.

Added a test for this in #42582, as well as one for collapsedIntent && pushed which we weren't handling well earlier.

craig bot pushed a commit that referenced this pull request Jan 11, 2020
42582: storage: Update MVCC on {Rocks,Pebble} to support ignored seqnum ranges and rollbacks r=itsbilal a=itsbilal

Builds on Raphael's PR #42152 and adds two commits, the second of which improves testing and the first of which adds the pebble MVCC scanner part of the ignore list functionality described in #41612, as well as restructures code in `mvccResolveWriteIntent` for better readability and to make it usable in the intent-history-cleanup-only case.

Fixes #41612 . 

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
@knz knz closed this Jan 13, 2020
@knz knz deleted the 20191104-mvcc-rollback branch January 13, 2020 10:02
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 20, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 20, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 21, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
craig bot pushed a commit that referenced this pull request Feb 21, 2020
45235: roachpb: split Intent proto message into Intent and LockUpdate messages r=nvanbenschoten a=nvanbenschoten

The Intent message type has been a frequent source of complexity and confusion. @knz found this out first-hand in #42152/#42582 when he tried to add an `IgnoredSeqNums` field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents.

This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when #44976 formally introduces unreplicated locks to the key-value layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan-out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for #44976 and I'm feeling a little pressed for time with that change and the impending stability period.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: introduce an "ignore list" for seqnums in MVCC reads
6 participants