Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

storage: Update MVCC on {Rocks,Pebble} to support ignored seqnum ranges and rollbacks #42582

Merged
merged 3 commits into from
Jan 11, 2020

Conversation

itsbilal
Copy link
Member

Builds on Raphael's PR #42152 and adds two commits, the second of which improves testing and the first of which adds the pebble MVCC scanner part of the ignore list functionality described in #41612, as well as restructures code in mvccResolveWriteIntent for better readability and to make it usable in the intent-history-cleanup-only case.

Fixes #41612 .

@cockroach-teamcity
Copy link
Member

This change is Reviewable

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, and @petermattis)


pkg/storage/engine/mvcc.go, line 2982 at r4 (raw file):

	// the read path in MVCCScan does not consult the intent history if the read
	// is at or above this sequence number.
	meta.Txn.Sequence = meta.IntentHistory[i].Sequence

This is most likely bad, and probably needs to be changed. For some reason, it still seems to work from my limited testing. I couldn't find the code where the TxnMeta inside an intent gets cloned, so I'm not sure where else we unintentionally end up modifying the sequence number.

Open to other suggestions on how to do this rollback; one possibility is to include the ith value in the IntentHistory as well, so that the read path does the right thing either way, and trim away all sequences after i.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose we/I should review #42250 first.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, and @petermattis)

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)


pkg/storage/engine/mvcc.go, line 2982 at r4 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

This is most likely bad, and probably needs to be changed. For some reason, it still seems to work from my limited testing. I couldn't find the code where the TxnMeta inside an intent gets cloned, so I'm not sure where else we unintentionally end up modifying the sequence number.

Open to other suggestions on how to do this rollback; one possibility is to include the ith value in the IntentHistory as well, so that the read path does the right thing either way, and trim away all sequences after i.

I would encourage you to double-check this. The function modifies the TxnMeta in-place on purpose. This ensures that the remainder of mvccResolveWriteIntent operates on the modified meta "as if" the intent was in the rolled back state from the start.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@itsbilal Can you rebase this now that #42250 is in?

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)

@itsbilal
Copy link
Member Author

Rebased on the latest commit on Raphael's branch, and updated tests. The tests caught an additional bug in the value rewriting logic that I hadn't observed last month. I'll now go through comments on Raphael's PR to see which ones apply here too.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)


pkg/roachpb/data.proto, line 542 at r5 (raw file):

  // the status is neither PENDING nor ABORTED.
  // This is provided as a guardrail against missing initializations.
  bool ignored_seqnums_initialized = 5 [(gogoproto.customname) = "IgnoredSeqNumsInitialized"];

@itsbilal FYI we're not going to keep this in the final PR. Nathan requested that we add all the missing tests for the errors that this (pseudo-) field has helped me identify.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)


c-deps/libroach/mvcc.h, line 62 at r7 (raw file):

        txn_sequence_(txn.sequence),
        txn_max_timestamp_(txn.max_timestamp),
		txn_ignored_seqnums_(txn.ignored_seqnums),

Nit: reviewable is claiming the above line and some other lines in the C++ code are using tabs instead of spaces. We use spaces for indentation in C++ code.


pkg/roachpb/data.go, line 1121 at r7 (raw file):

	}
	if ni := len(t.IgnoredSeqNums); ni > 0 {
		fmt.Fprintf(&buf, " isn=%d", ni)

Should this print out the ignored seqnum list, or will that be too large? Probably a question for @knz or @nvanbenschoten.


pkg/roachpb/data.proto, line 486 at r7 (raw file):

  //
  // The user code must guarantee this list to be non-overlapping,
  // non-continuous (i.e. it must coalesce ranges to avoid situations

s/continuous/contiguous/g?


pkg/roachpb/data.proto, line 488 at r7 (raw file):

  // non-continuous (i.e. it must coalesce ranges to avoid situations
  // where a range's end seqnum is equal to the next range's start
  // seqnum), and sorted in seqnum order.

I was expecting to find some code to do the coalescing. Did I miss it? Is that coming in a future PR?


pkg/storage/engine/mvcc.go, line 1343 at r7 (raw file):

			// If the previous value at the key wasn't written by this transaction,
			// or it was hidden by a rolled back seqnum,
			// we look at last committed value on the key.

Nit: this comment paragraph should be re-flowed. The line breaking is odd.


pkg/storage/engine/pebble_mvcc_scanner.go, line 226 at r7 (raw file):

	// TODO(itsbilal): Explore if this iteration can be improved through binary
	// search.
	for upIdx > 0 && enginepb.TxnSeqIsIgnored(p.meta.IntentHistory[upIdx-1].Sequence, p.txnIgnoredSeqNums) {

This loop and the one in mvcc.h do not look the same. The one in mvcc.h is doing a binary search of intentHistory on every iteration. This one is not. I haven't determined if the loops are equivalent. Regardless, I feel strongly the loops should mirror each other.

The pebble version of this code seems easy to detach from the rest of MVCC scanning and to test in isolation.


pkg/storage/engine/enginepb/mvcc.go, line 53 at r7 (raw file):

// TxnSeqIsIgnored returns true iff the sequence number overlaps with
// any range in the ignored array.
func TxnSeqIsIgnored(seq TxnSeq, ignored []IgnoredSeqNumRange) bool {

This function deserves a test in the enginepb package.


pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums_commit, line 2 at r7 (raw file):

# Pebble does not support ignored seqnums for now.
skip pebble

Eh? I thought this PR added the Pebble support.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Discussed with Bilal elsewhere:

  • I'll add the missing test on TxnSeqIsIgnored
  • as discussed with Nathan, the new field IgnoredSeqNumsInitialized must be removed before the PR merges, but also all the missing initializations of the other field IgnoredSeqNums that the boolean helped identified must be complemented by suitable unit tests. This remains to be done and I was planning to look at this the remainder of this week.
  • I'm not planning on updating anything else further — in particular nothing in storage/engine — the functionality so far was sufficient to move the other savepoint PRs further.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)


pkg/roachpb/data.go, line 1121 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Should this print out the ignored seqnum list, or will that be too large? Probably a question for @knz or @nvanbenschoten.

We could possibly print out the first few entries? I'd be wary of being indiscriminate. Nathan suggested to (later) use this mechanism in lieu of the epoch field for txn restarts, in which case the list could grow large.


pkg/storage/engine/enginepb/mvcc.go, line 53 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This function deserves a test in the enginepb package.

@itsbilal I'll add it.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/roachpb/data.proto, line 488 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I was expecting to find some code to do the coalescing. Did I miss it? Is that coming in a future PR?

It's in a separate PR https://github.com/cockroachdb/cockroach/pull/43047/files#diff-394ae7af82d24368215610c44809ba42R15
There's also a test in there.

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @nvanbenschoten, and @petermattis)


c-deps/libroach/mvcc.h, line 62 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: reviewable is claiming the above line and some other lines in the C++ code are using tabs instead of spaces. We use spaces for indentation in C++ code.

Resolved in my first commit (i.e. this PR's second commit). Hopefully that won't cause too many rebase conflicts with knz's PR going forward.


pkg/roachpb/data.proto, line 542 at r5 (raw file):

Previously, knz (kena) wrote…

@itsbilal FYI we're not going to keep this in the final PR. Nathan requested that we add all the missing tests for the errors that this (pseudo-) field has helped me identify.

Sounds good. I'll let you get rid of that field then. None of my changes here rely on that field.


pkg/storage/engine/mvcc.go, line 2982 at r4 (raw file):

Previously, knz (kena) wrote…

I would encourage you to double-check this. The function modifies the TxnMeta in-place on purpose. This ensures that the remainder of mvccResolveWriteIntent operates on the modified meta "as if" the intent was in the rolled back state from the start.

Marking as resolved - we looked into this and modifying the TxnMeta in place is correct.


pkg/storage/engine/pebble_mvcc_scanner.go, line 226 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

This loop and the one in mvcc.h do not look the same. The one in mvcc.h is doing a binary search of intentHistory on every iteration. This one is not. I haven't determined if the loops are equivalent. Regardless, I feel strongly the loops should mirror each other.

The pebble version of this code seems easy to detach from the rest of MVCC scanning and to test in isolation.

I resolved it by changing the one in mvcc.h to match this one. /cc @knz @nvanbenschoten , happy to hear thoughts on this otherwise, but since TxnSeqIsIgnored returns just a bool, and the binary search only looks for the first seq that's > transaction's sequence each time, repeating the binary search is unnecessary - all successive iterations will select the rightmost (greatest) element in the search range since the entire search range is less than the transaction's sequence.

There's value in repeating the binary search if TxnSeqIsIgnored returns the start point of the ignored seqnum range that we collided with, and setting that value as the new sequence number to search on (in addition to truncating the range), but we don't do that yet. Plus there's a chance that the current reverse iteration ends up being more efficient either way - especially if the ignore range or intent history is small.


pkg/storage/engine/enginepb/mvcc.go, line 53 at r7 (raw file):

Previously, knz (kena) wrote…

@itsbilal I'll add it.

Thanks!


pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums_commit, line 2 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Eh? I thought this PR added the Pebble support.

Whoops, forgot to remove this here. Fixed.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/roachpb/data.proto, line 486 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

s/continuous/contiguous/g?

Fixed.


pkg/storage/engine/pebble_mvcc_scanner.go, line 226 at r7 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

I resolved it by changing the one in mvcc.h to match this one. /cc @knz @nvanbenschoten , happy to hear thoughts on this otherwise, but since TxnSeqIsIgnored returns just a bool, and the binary search only looks for the first seq that's > transaction's sequence each time, repeating the binary search is unnecessary - all successive iterations will select the rightmost (greatest) element in the search range since the entire search range is less than the transaction's sequence.

There's value in repeating the binary search if TxnSeqIsIgnored returns the start point of the ignored seqnum range that we collided with, and setting that value as the new sequence number to search on (in addition to truncating the range), but we don't do that yet. Plus there's a chance that the current reverse iteration ends up being more efficient either way - especially if the ignore range or intent history is small.

No comment from me. As I explained in my initial PR I'll leave this to the storage/perf teams to optimize.


pkg/storage/engine/enginepb/mvcc.go, line 53 at r7 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

Thanks!

Done.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed the underlying PR #42152 and rebased this one on top. RFAL.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


c-deps/libroach/mvcc.h, line 206 at r10 (raw file):

    auto end = meta_.intent_history().end();
    cockroach::storage::engine::enginepb::MVCCMetadata_SequencedIntent intent;
    // Look for the intent with the sequence number less than or equal to the

@petermattis can you have a look at both the base commit and the 2nd commit in this PR for this file.

The initial implementation (mine) in the 1st commit was performed under @nvanbenschoten 's guidance and (as far as I understand) was an iterated binary search.
I personally think it was overkill (we expect the relevant writes to occur always near the end), but it was simple enough to write.

The second (2nd commit) by Bilal is different and I don't fully understand it. I assume it is correct because it passes the tests, but can you confirm it is also adequate given the problem to solve?


pkg/storage/engine/mvcc.go, line 1343 at r7 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: this comment paragraph should be re-flowed. The line breaking is odd.

Done.


pkg/storage/engine/pebble_mvcc_scanner.go, line 226 at r7 (raw file):

Previously, knz (kena) wrote…

No comment from me. As I explained in my initial PR I'll leave this to the storage/perf teams to optimize.

See my comment above on mvcc.h.

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)


c-deps/libroach/mvcc.h, line 206 at r10 (raw file):

Previously, knz (kena) wrote…

@petermattis can you have a look at both the base commit and the 2nd commit in this PR for this file.

The initial implementation (mine) in the 1st commit was performed under @nvanbenschoten 's guidance and (as far as I understand) was an iterated binary search.
I personally think it was overkill (we expect the relevant writes to occur always near the end), but it was simple enough to write.

The second (2nd commit) by Bilal is different and I don't fully understand it. I assume it is correct because it passes the tests, but can you confirm it is also adequate given the problem to solve?

The key point to note is that the intent history is already sorted by sequence numbers, and the iterative binary search doesn't really change the value you're searching for, just the bounds on which the search is happening. So after the first search and bound truncation, the binary search will always return the last element in the search range.

My implementation just takes that observation and exploits it by iterating backward from the binary search result until we find a value in the intent history that isn't ignored. Hope that helps?

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)


c-deps/libroach/mvcc.h, line 206 at r10 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

The key point to note is that the intent history is already sorted by sequence numbers, and the iterative binary search doesn't really change the value you're searching for, just the bounds on which the search is happening. So after the first search and bound truncation, the binary search will always return the last element in the search range.

My implementation just takes that observation and exploits it by iterating backward from the binary search result until we find a value in the intent history that isn't ignored. Hope that helps?

Ah! It took me 30mn and a subway ride, and eventually it clicked. Yes you're right of course. I will enhance the comment to explain with an example. Thanks for taking the time.

@knz
Copy link
Contributor

knz commented Dec 28, 2019

I performed the requested change and rebased the PR on top of #42152 again.
@itsbilal when you come back to this don't forget to refresh your local copy of your branch, which I have pushed to to amend. There are also points of attention left for you in #42152.

@itsbilal itsbilal force-pushed the mvcc-seqnum-rollback branch 2 times, most recently from b99deec to 0dbd99c Compare January 2, 2020 20:17
@itsbilal
Copy link
Member Author

itsbilal commented Jan 2, 2020

@knz thanks for attending to this branch while I was away! I've addressed the points in #42152 here, and will comment there referencing this PR. The most significant change is that, in the collapsedIntent case, we no longer use the commit path, instead falling to the intent clear / intent abort case that's shared with txn aborts. This fixes a couple minor issues in the previous code:

  1. Stats are calculated correctly, we should have been calling updateStatsOnClear when we were instead calling updateStatsOnResolve earlier
  2. if we had a case where !(commit || pushed) but collapsedIntent == true, we'd call the right stats code, but not if pushed or commit was true.
  3. If pushed && collapsedIntent, we'd be rewriting the intent with a new timestamp, but deleting the versioned value underneath. I've added a test to exercise this combination.

@itsbilal
Copy link
Member Author

itsbilal commented Jan 7, 2020

Rebased and fixed the conflicts, this is ready for another look.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed 7 of 13 files at r16, 2 of 3 files at r17, 3 of 4 files at r18.
Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @nvanbenschoten, and @petermattis)


pkg/storage/batcheval/cmd_end_transaction.go, line 1140 at r13 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

There are cases where this function is called with both true and false, and where the return value is discarded. Maybe the real fix should be to set txnAutoGC to !to, but I'm not too familiar with this code to begin with so I'd lean towards leaving it as-is.

The typical way this is done is something like:

func TestingSetTxnAutoGC(to bool) func() {
        prev := txnAutoGC
	txnAutoGC = to
	return func() { txnAutoGC = prev }
}

I'd at least do that.


pkg/storage/engine/mvcc.go, line 2762 at r13 (raw file):

Previously, itsbilal (Bilal Akhtar) wrote…

"Cleared" seems clearer (hah), but it may hint that the intent has already been deleted? Since "clear" is a storage engine level operation while "collapse" isn't. I don't feel strongly either way, just that we should be consistent with our terminology between here and mvccMaybeRewriteIntentHistory. For now I've updated that method to also say collapsed.

What about "removed"? I personally don't find collapsed to be very useful. In fact, I assumed it meant the exact opposite of what it does multiple times - resolving an intent "collapses" all of the sequences in the history down to just one.


pkg/storage/engine/mvcc.go, line 2786 at r14 (raw file):

can you point me to where the epoch / ignored seq num equivalence is outlined?

It might be in the RFC. The general idea though is that a txn restart is equivalent to a full-txn rollback and so we may eventually be able to implement restarts using only ignored seq nums.

Doesn't that mean we could potentially rewrite a collapsed intent if pushed == true?

I'm not totally sure I understand what you mean. Who could potentially rewrite a collapsed intent? This function? I guess you'd also want to set pushed = false as well so that you hit the abort path.

Copy link
Contributor

@knz knz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/storage/engine/mvcc.go, line 2786 at r14 (raw file):

It might be in the RFC.

It's in the tech note on the TxnCoordSender.

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/storage/engine/mvcc.go, line 2762 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

What about "removed"? I personally don't find collapsed to be very useful. In fact, I assumed it meant the exact opposite of what it does multiple times - resolving an intent "collapses" all of the sequences in the history down to just one.

Replaced it with the present tense remove to denote that it hasn't been removed yet, but has been marked for removal.


pkg/storage/engine/mvcc.go, line 2786 at r14 (raw file):

Previously, knz (kena) wrote…

It might be in the RFC.

It's in the tech note on the TxnCoordSender.

Makes sense. Updated accordingly - simplifies the if condition pretty significantly.

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)


pkg/storage/batcheval/cmd_end_transaction.go, line 1140 at r13 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

The typical way this is done is something like:

func TestingSetTxnAutoGC(to bool) func() {
        prev := txnAutoGC
	txnAutoGC = to
	return func() { txnAutoGC = prev }
}

I'd at least do that.

Done.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

Reviewed 1 of 7 files at r19, 3 of 3 files at r20, 3 of 3 files at r21.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/storage/engine/mvcc.go, line 2783 at r21 (raw file):

	// to a larger timestamp, and if the rollback code did not modify or mark
	// the intent for removal.
	if inProgress && !pushed && rolledBackVal == nil && !removeIntent {

I think it would make sense to also set inProgress to false when removing intents as well. inProgress also takes the epoch into account, so this would be semantically correct. Then you don't need this removeIntent var at all.


pkg/storage/engine/mvcc.go, line 2951 at r21 (raw file):

// stored value, ignoring all values from the history that have an
// ignored seqnum.
// The rempve return value, when true, indicates that

"remove"

Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the review! I'll leave this here for the rest of the day if anyone else wants to chime in, and if comments come in after it's merged later today I'm happy to address them in follow up PRs.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @petermattis)


pkg/storage/engine/mvcc.go, line 2783 at r21 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

I think it would make sense to also set inProgress to false when removing intents as well. inProgress also takes the epoch into account, so this would be semantically correct. Then you don't need this removeIntent var at all.

Done. Thanks for identifying that!


pkg/storage/engine/mvcc.go, line 2951 at r21 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

"remove"

Done. Whoops.

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Still :lgtm:. Nice job on this @itsbilal!

Reviewed 1 of 4 files at r22, 3 of 3 files at r23.
Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @petermattis)

Copy link
Member

@nvanbenschoten nvanbenschoten left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewable status: :shipit: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)


pkg/storage/engine/mvcc.go, line 2763 at r23 (raw file):

	// If only part of the intent history was rolled back, but the intent still
	// remains, the rolledBackVal is set to a non-nil value.
	removeIntent := false

nit: pull this down into the if len(intent.IgnoredSeqNums) > 0 { scope and declare it as var removeIntent bool.

Copy link
Collaborator

@petermattis petermattis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:lgtm:

I mostly scrutinized the new tests. The datadriven tests were a pleasure to read. Mostly nits, with a suggestion for two additional tests (though perhaps I just missed the cases I'm suggesting to add).

Reviewable status: :shipit: complete! 2 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)


c-deps/libroach/mvcc.h, line 173 at r23 (raw file):

    // see if the current intent seqnum is ignored.
    //
    // TODO(nvanbenschoten): this can use use binary search to improve

It should be straightforward to rewrite this using std::lower_bound() (see https://en.cppreference.com/w/cpp/algorithm/lower_bound). There was a time when I could have told you the precise code to write. I'd leave such a change to a future PR, though.


pkg/storage/engine/mvcc.go, line 1552 at r23 (raw file):

			}
			if existingVal == nil {

Nit: there are a handful of cases of spurious blank lines where I normally wouldn't expect them.


pkg/storage/engine/mvcc.go, line 1632 at r23 (raw file):

			// intent, blow away the intent history.
			if txn.Epoch == meta.Txn.Epoch {

Nit: another spurious blank line.


pkg/storage/engine/testdata/mvcc_histories/conditional_put_with_txn, line 67 at r23 (raw file):

error: (*roachpb.WriteTooOldError:) WriteTooOldError: write at timestamp 0.000000123,0 too old; wrote at 0.000000124,1

# Reset for next test

Perhaps we should add an explicit reset command. Doing a clear_range like this requires knowledge of the previous test and what it wrote.


pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums, line 95 at r23 (raw file):

run ok
with t=A
  txn_ignore_seqs seqs=(15-25)

I might have missed it, but I don't see an example where the start range is the same as a seqnum. For example, seqs=(20-21). Ditto for the end range: seqs=(19-20).

knz and others added 3 commits January 10, 2020 19:55
The MVCC code already had rudimentary understanding of sequence
numbers to allow reads to ignore writes at greater seqnums.

To implement SQL savepoint rollbacks, we must also support
ignoring writes that fall in ignored ranges of seqnums.

To achieve this, this commit extends the `mvccScanner` for
RocksDB (Pebble code remains to be done) to account for
ignored seqnum ranges, and also extends `MVCCResolveWriteIntent`
to collapse an intent to the last write that has not been marked
to be ignored by a savepoint rollback.

Release note: None
This commit implements ignoreSeqNums functionality in the pebble
MVCC scanner to mimic that found in the RocksDB one. In addition,
it also restructures code and comments in mvccResolveWriteIntent
to allow for it to be called in cases where just a rollback is
happening (but not a timestamp push or a commit) and the intent
history could be cleaned up for better performance.

There's also a small bugfix in mvccRewriteIntentHistory that
ensures it rewrites the correct value in cases where the
intent timestamp is being pushed forward, and the latest
value is ignored.

Release note: None
The previous change made mvccResolveIntentHistory more complex
in some subtle ways, such as the larger variety of cases where
they can expect to be called (such as around cleaning up intent
histories).

This change tests those cases and ensures that the resolution
code does the appropriate thing in all expected combinations of
those conditions.

Release note: None
Copy link
Member Author

@itsbilal itsbilal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the reviews! Addressed the nits pointed out, as well as the test suggestion. Merging on green CI.

Reviewable status: :shipit: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal, @nvanbenschoten, and @petermattis)


pkg/storage/engine/mvcc.go, line 1552 at r23 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: there are a handful of cases of spurious blank lines where I normally wouldn't expect them.

Done.


pkg/storage/engine/mvcc.go, line 1632 at r23 (raw file):

Previously, petermattis (Peter Mattis) wrote…

Nit: another spurious blank line.

Done.


pkg/storage/engine/mvcc.go, line 2763 at r23 (raw file):

Previously, nvanbenschoten (Nathan VanBenschoten) wrote…

nit: pull this down into the if len(intent.IgnoredSeqNums) > 0 { scope and declare it as var removeIntent bool.

Done.


pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums, line 95 at r23 (raw file):

Previously, petermattis (Peter Mattis) wrote…

I might have missed it, but I don't see an example where the start range is the same as a seqnum. For example, seqs=(20-21). Ditto for the end range: seqs=(19-20).

Done. You were right, those edges weren't being tested. Now they are.

@itsbilal
Copy link
Member Author

bors r+

craig bot pushed a commit that referenced this pull request Jan 11, 2020
42582: storage: Update MVCC on {Rocks,Pebble} to support ignored seqnum ranges and rollbacks r=itsbilal a=itsbilal

Builds on Raphael's PR #42152 and adds two commits, the second of which improves testing and the first of which adds the pebble MVCC scanner part of the ignore list functionality described in #41612, as well as restructures code in `mvccResolveWriteIntent` for better readability and to make it usable in the intent-history-cleanup-only case.

Fixes #41612 . 

Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
@craig
Copy link
Contributor

craig bot commented Jan 11, 2020

Build succeeded

@craig craig bot merged commit 29f34cd into cockroachdb:master Jan 11, 2020
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 20, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 20, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
nvanbenschoten added a commit to nvanbenschoten/cockroach that referenced this pull request Feb 21, 2020
The Intent message type has been a frequent source of complexity and
confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to
add an IgnoredSeqNums field to the message type. This led to a series
of questions that were not easily answered by simply reading about
the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was
documented in its comment. It both referenced on-disk write intents and
served as a way to talk about updates that should be made to these write
intents.

This commit addresses this issue by splitting Intent into two
(wire-compatible) messages with distinct purposes - Intent and LockUpdate.
Intent's new sole purpose is now to reference persistent on-disk write
intents on return paths of MVCC operations. They are only ever created
when on-disk intents are encountered. LockUpdate's sole purpose is to
represent updates that are intended for all locks held by the transaction
within its span. They are only ever created after their transaction's
record has been consulted, either after an EndTxn or a PushTxn request.
This split simplifies the use of each of these types and helps make
the questions above easier to answer or impossible to even ask (e.g.
Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is
used to inform the message's recipient about the durability of the
lock(s) it is being instructed to update. This will be important soon
when cockroachdb#44976 formally introduces unreplicated locks to the key-value
layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb`
and fan out from there. Everything else is essentially fallout from the
typing change. This is still a bit of a WIP as there is more renaming
necessary before this could be merged, but I want to get people's opinions
on this sooner rather than later because it's partly on the critical path
for cockroachdb#44976 and I'm feeling a little pressed for time with that change and
the impending stability period.
craig bot pushed a commit that referenced this pull request Feb 21, 2020
45235: roachpb: split Intent proto message into Intent and LockUpdate messages r=nvanbenschoten a=nvanbenschoten

The Intent message type has been a frequent source of complexity and confusion. @knz found this out first-hand in #42152/#42582 when he tried to add an `IgnoredSeqNums` field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like:
- why would an Intent ever have a non-PENDING status?
- why would an Intent ever span multiple keys?
- where are Intents created?
- is this Intent message containing the authoritative state of the txn?
- does this Intent message have a populated IgnoredSeqNums slice?

The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents.

This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field).

While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when #44976 formally introduces unreplicated locks to the key-value layer.

I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan-out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for #44976 and I'm feeling a little pressed for time with that change and the impending stability period.

Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

storage: introduce an "ignore list" for seqnums in MVCC reads
5 participants