-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: Update MVCC on {Rocks,Pebble} to support ignored seqnum ranges and rollbacks #42582
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, and @petermattis)
pkg/storage/engine/mvcc.go, line 2982 at r4 (raw file):
// the read path in MVCCScan does not consult the intent history if the read // is at or above this sequence number. meta.Txn.Sequence = meta.IntentHistory[i].Sequence
This is most likely bad, and probably needs to be changed. For some reason, it still seems to work from my limited testing. I couldn't find the code where the TxnMeta
inside an intent gets cloned, so I'm not sure where else we unintentionally end up modifying the sequence number.
Open to other suggestions on how to do this rollback; one possibility is to include the ith value in the IntentHistory as well, so that the read path does the right thing either way, and trim away all sequences after i.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suppose we/I should review #42250 first.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz, @nvanbenschoten, and @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, and @nvanbenschoten)
pkg/storage/engine/mvcc.go, line 2982 at r4 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
This is most likely bad, and probably needs to be changed. For some reason, it still seems to work from my limited testing. I couldn't find the code where the
TxnMeta
inside an intent gets cloned, so I'm not sure where else we unintentionally end up modifying the sequence number.Open to other suggestions on how to do this rollback; one possibility is to include the ith value in the IntentHistory as well, so that the read path does the right thing either way, and trim away all sequences after i.
I would encourage you to double-check this. The function modifies the TxnMeta in-place on purpose. This ensures that the remainder of mvccResolveWriteIntent
operates on the modified meta "as if" the intent was in the rolled back state from the start.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@itsbilal Can you rebase this now that #42250 is in?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)
e33b30a
to
ea215a9
Compare
Rebased on the latest commit on Raphael's branch, and updated tests. The tests caught an additional bug in the value rewriting logic that I hadn't observed last month. I'll now go through comments on Raphael's PR to see which ones apply here too. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)
pkg/roachpb/data.proto, line 542 at r5 (raw file):
// the status is neither PENDING nor ABORTED. // This is provided as a guardrail against missing initializations. bool ignored_seqnums_initialized = 5 [(gogoproto.customname) = "IgnoredSeqNumsInitialized"];
@itsbilal FYI we're not going to keep this in the final PR. Nathan requested that we add all the missing tests for the errors that this (pseudo-) field has helped me identify.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)
c-deps/libroach/mvcc.h, line 62 at r7 (raw file):
txn_sequence_(txn.sequence), txn_max_timestamp_(txn.max_timestamp), txn_ignored_seqnums_(txn.ignored_seqnums),
Nit: reviewable
is claiming the above line and some other lines in the C++ code are using tabs instead of spaces. We use spaces for indentation in C++ code.
pkg/roachpb/data.go, line 1121 at r7 (raw file):
} if ni := len(t.IgnoredSeqNums); ni > 0 { fmt.Fprintf(&buf, " isn=%d", ni)
Should this print out the ignored seqnum list, or will that be too large? Probably a question for @knz or @nvanbenschoten.
pkg/roachpb/data.proto, line 486 at r7 (raw file):
// // The user code must guarantee this list to be non-overlapping, // non-continuous (i.e. it must coalesce ranges to avoid situations
s/continuous/contiguous/g
?
pkg/roachpb/data.proto, line 488 at r7 (raw file):
// non-continuous (i.e. it must coalesce ranges to avoid situations // where a range's end seqnum is equal to the next range's start // seqnum), and sorted in seqnum order.
I was expecting to find some code to do the coalescing. Did I miss it? Is that coming in a future PR?
pkg/storage/engine/mvcc.go, line 1343 at r7 (raw file):
// If the previous value at the key wasn't written by this transaction, // or it was hidden by a rolled back seqnum, // we look at last committed value on the key.
Nit: this comment paragraph should be re-flowed. The line breaking is odd.
pkg/storage/engine/pebble_mvcc_scanner.go, line 226 at r7 (raw file):
// TODO(itsbilal): Explore if this iteration can be improved through binary // search. for upIdx > 0 && enginepb.TxnSeqIsIgnored(p.meta.IntentHistory[upIdx-1].Sequence, p.txnIgnoredSeqNums) {
This loop and the one in mvcc.h
do not look the same. The one in mvcc.h
is doing a binary search of intentHistory
on every iteration. This one is not. I haven't determined if the loops are equivalent. Regardless, I feel strongly the loops should mirror each other.
The pebble version of this code seems easy to detach from the rest of MVCC scanning and to test in isolation.
pkg/storage/engine/enginepb/mvcc.go, line 53 at r7 (raw file):
// TxnSeqIsIgnored returns true iff the sequence number overlaps with // any range in the ignored array. func TxnSeqIsIgnored(seq TxnSeq, ignored []IgnoredSeqNumRange) bool {
This function deserves a test in the enginepb
package.
pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums_commit, line 2 at r7 (raw file):
# Pebble does not support ignored seqnums for now. skip pebble
Eh? I thought this PR added the Pebble support.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Discussed with Bilal elsewhere:
- I'll add the missing test on TxnSeqIsIgnored
- as discussed with Nathan, the new field
IgnoredSeqNumsInitialized
must be removed before the PR merges, but also all the missing initializations of the other fieldIgnoredSeqNums
that the boolean helped identified must be complemented by suitable unit tests. This remains to be done and I was planning to look at this the remainder of this week. - I'm not planning on updating anything else further — in particular nothing in storage/engine — the functionality so far was sufficient to move the other savepoint PRs further.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal and @nvanbenschoten)
pkg/roachpb/data.go, line 1121 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Should this print out the ignored seqnum list, or will that be too large? Probably a question for @knz or @nvanbenschoten.
We could possibly print out the first few entries? I'd be wary of being indiscriminate. Nathan suggested to (later) use this mechanism in lieu of the epoch field for txn restarts, in which case the list could grow large.
pkg/storage/engine/enginepb/mvcc.go, line 53 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This function deserves a test in the
enginepb
package.
@itsbilal I'll add it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/roachpb/data.proto, line 488 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I was expecting to find some code to do the coalescing. Did I miss it? Is that coming in a future PR?
It's in a separate PR https://github.com/cockroachdb/cockroach/pull/43047/files#diff-394ae7af82d24368215610c44809ba42R15
There's also a test in there.
ea215a9
to
6bb088e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @nvanbenschoten, and @petermattis)
c-deps/libroach/mvcc.h, line 62 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit:
reviewable
is claiming the above line and some other lines in the C++ code are using tabs instead of spaces. We use spaces for indentation in C++ code.
Resolved in my first commit (i.e. this PR's second commit). Hopefully that won't cause too many rebase conflicts with knz's PR going forward.
pkg/roachpb/data.proto, line 542 at r5 (raw file):
Previously, knz (kena) wrote…
@itsbilal FYI we're not going to keep this in the final PR. Nathan requested that we add all the missing tests for the errors that this (pseudo-) field has helped me identify.
Sounds good. I'll let you get rid of that field then. None of my changes here rely on that field.
pkg/storage/engine/mvcc.go, line 2982 at r4 (raw file):
Previously, knz (kena) wrote…
I would encourage you to double-check this. The function modifies the TxnMeta in-place on purpose. This ensures that the remainder of
mvccResolveWriteIntent
operates on the modified meta "as if" the intent was in the rolled back state from the start.
Marking as resolved - we looked into this and modifying the TxnMeta in place is correct.
pkg/storage/engine/pebble_mvcc_scanner.go, line 226 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
This loop and the one in
mvcc.h
do not look the same. The one inmvcc.h
is doing a binary search ofintentHistory
on every iteration. This one is not. I haven't determined if the loops are equivalent. Regardless, I feel strongly the loops should mirror each other.The pebble version of this code seems easy to detach from the rest of MVCC scanning and to test in isolation.
I resolved it by changing the one in mvcc.h to match this one. /cc @knz @nvanbenschoten , happy to hear thoughts on this otherwise, but since TxnSeqIsIgnored returns just a bool, and the binary search only looks for the first seq that's > transaction's sequence each time, repeating the binary search is unnecessary - all successive iterations will select the rightmost (greatest) element in the search range since the entire search range is less than the transaction's sequence.
There's value in repeating the binary search if TxnSeqIsIgnored
returns the start point of the ignored seqnum range that we collided with, and setting that value as the new sequence number to search on (in addition to truncating the range), but we don't do that yet. Plus there's a chance that the current reverse iteration ends up being more efficient either way - especially if the ignore range or intent history is small.
pkg/storage/engine/enginepb/mvcc.go, line 53 at r7 (raw file):
Previously, knz (kena) wrote…
@itsbilal I'll add it.
Thanks!
pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums_commit, line 2 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Eh? I thought this PR added the Pebble support.
Whoops, forgot to remove this here. Fixed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/roachpb/data.proto, line 486 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
s/continuous/contiguous/g
?
Fixed.
pkg/storage/engine/pebble_mvcc_scanner.go, line 226 at r7 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
I resolved it by changing the one in mvcc.h to match this one. /cc @knz @nvanbenschoten , happy to hear thoughts on this otherwise, but since TxnSeqIsIgnored returns just a bool, and the binary search only looks for the first seq that's > transaction's sequence each time, repeating the binary search is unnecessary - all successive iterations will select the rightmost (greatest) element in the search range since the entire search range is less than the transaction's sequence.
There's value in repeating the binary search if
TxnSeqIsIgnored
returns the start point of the ignored seqnum range that we collided with, and setting that value as the new sequence number to search on (in addition to truncating the range), but we don't do that yet. Plus there's a chance that the current reverse iteration ends up being more efficient either way - especially if the ignore range or intent history is small.
No comment from me. As I explained in my initial PR I'll leave this to the storage/perf teams to optimize.
pkg/storage/engine/enginepb/mvcc.go, line 53 at r7 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
Thanks!
Done.
6bb088e
to
e3a5c38
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fixed the underlying PR #42152 and rebased this one on top. RFAL.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
c-deps/libroach/mvcc.h, line 206 at r10 (raw file):
auto end = meta_.intent_history().end(); cockroach::storage::engine::enginepb::MVCCMetadata_SequencedIntent intent; // Look for the intent with the sequence number less than or equal to the
@petermattis can you have a look at both the base commit and the 2nd commit in this PR for this file.
The initial implementation (mine) in the 1st commit was performed under @nvanbenschoten 's guidance and (as far as I understand) was an iterated binary search.
I personally think it was overkill (we expect the relevant writes to occur always near the end), but it was simple enough to write.
The second (2nd commit) by Bilal is different and I don't fully understand it. I assume it is correct because it passes the tests, but can you confirm it is also adequate given the problem to solve?
pkg/storage/engine/mvcc.go, line 1343 at r7 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: this comment paragraph should be re-flowed. The line breaking is odd.
Done.
pkg/storage/engine/pebble_mvcc_scanner.go, line 226 at r7 (raw file):
Previously, knz (kena) wrote…
No comment from me. As I explained in my initial PR I'll leave this to the storage/perf teams to optimize.
See my comment above on mvcc.h.
e3a5c38
to
dd7709d
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
c-deps/libroach/mvcc.h, line 206 at r10 (raw file):
Previously, knz (kena) wrote…
@petermattis can you have a look at both the base commit and the 2nd commit in this PR for this file.
The initial implementation (mine) in the 1st commit was performed under @nvanbenschoten 's guidance and (as far as I understand) was an iterated binary search.
I personally think it was overkill (we expect the relevant writes to occur always near the end), but it was simple enough to write.The second (2nd commit) by Bilal is different and I don't fully understand it. I assume it is correct because it passes the tests, but can you confirm it is also adequate given the problem to solve?
The key point to note is that the intent history is already sorted by sequence numbers, and the iterative binary search doesn't really change the value you're searching for, just the bounds on which the search is happening. So after the first search and bound truncation, the binary search will always return the last element in the search range.
My implementation just takes that observation and exploits it by iterating backward from the binary search result until we find a value in the intent history that isn't ignored. Hope that helps?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
c-deps/libroach/mvcc.h, line 206 at r10 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
The key point to note is that the intent history is already sorted by sequence numbers, and the iterative binary search doesn't really change the value you're searching for, just the bounds on which the search is happening. So after the first search and bound truncation, the binary search will always return the last element in the search range.
My implementation just takes that observation and exploits it by iterating backward from the binary search result until we find a value in the intent history that isn't ignored. Hope that helps?
Ah! It took me 30mn and a subway ride, and eventually it clicked. Yes you're right of course. I will enhance the comment to explain with an example. Thanks for taking the time.
dd7709d
to
047f5b7
Compare
b99deec
to
0dbd99c
Compare
@knz thanks for attending to this branch while I was away! I've addressed the points in #42152 here, and will comment there referencing this PR. The most significant change is that, in the
|
0dbd99c
to
b531d9d
Compare
Rebased and fixed the conflicts, this is ready for another look. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 7 of 13 files at r16, 2 of 3 files at r17, 3 of 4 files at r18.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @knz, @nvanbenschoten, and @petermattis)
pkg/storage/batcheval/cmd_end_transaction.go, line 1140 at r13 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
There are cases where this function is called with both true and false, and where the return value is discarded. Maybe the real fix should be to set txnAutoGC to
!to
, but I'm not too familiar with this code to begin with so I'd lean towards leaving it as-is.
The typical way this is done is something like:
func TestingSetTxnAutoGC(to bool) func() {
prev := txnAutoGC
txnAutoGC = to
return func() { txnAutoGC = prev }
}
I'd at least do that.
pkg/storage/engine/mvcc.go, line 2762 at r13 (raw file):
Previously, itsbilal (Bilal Akhtar) wrote…
"Cleared" seems clearer (hah), but it may hint that the intent has already been deleted? Since "clear" is a storage engine level operation while "collapse" isn't. I don't feel strongly either way, just that we should be consistent with our terminology between here and mvccMaybeRewriteIntentHistory. For now I've updated that method to also say collapsed.
What about "removed"? I personally don't find collapsed to be very useful. In fact, I assumed it meant the exact opposite of what it does multiple times - resolving an intent "collapses" all of the sequences in the history down to just one.
pkg/storage/engine/mvcc.go, line 2786 at r14 (raw file):
can you point me to where the epoch / ignored seq num equivalence is outlined?
It might be in the RFC. The general idea though is that a txn restart is equivalent to a full-txn rollback and so we may eventually be able to implement restarts using only ignored seq nums.
Doesn't that mean we could potentially rewrite a collapsed intent if pushed == true?
I'm not totally sure I understand what you mean. Who could potentially rewrite a collapsed intent? This function? I guess you'd also want to set pushed = false
as well so that you hit the abort path.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/storage/engine/mvcc.go, line 2786 at r14 (raw file):
It might be in the RFC.
It's in the tech note on the TxnCoordSender.
eb43678
to
f5eea88
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/storage/engine/mvcc.go, line 2762 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
What about "removed"? I personally don't find collapsed to be very useful. In fact, I assumed it meant the exact opposite of what it does multiple times - resolving an intent "collapses" all of the sequences in the history down to just one.
Replaced it with the present tense remove
to denote that it hasn't been removed yet, but has been marked for removal.
pkg/storage/engine/mvcc.go, line 2786 at r14 (raw file):
Previously, knz (kena) wrote…
It might be in the RFC.
It's in the tech note on the TxnCoordSender.
Makes sense. Updated accordingly - simplifies the if condition pretty significantly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten and @petermattis)
pkg/storage/batcheval/cmd_end_transaction.go, line 1140 at r13 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
The typical way this is done is something like:
func TestingSetTxnAutoGC(to bool) func() { prev := txnAutoGC txnAutoGC = to return func() { txnAutoGC = prev } }
I'd at least do that.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 1 of 7 files at r19, 3 of 3 files at r20, 3 of 3 files at r21.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/storage/engine/mvcc.go, line 2783 at r21 (raw file):
// to a larger timestamp, and if the rollback code did not modify or mark // the intent for removal. if inProgress && !pushed && rolledBackVal == nil && !removeIntent {
I think it would make sense to also set inProgress
to false when removing intents as well. inProgress
also takes the epoch into account, so this would be semantically correct. Then you don't need this removeIntent
var at all.
pkg/storage/engine/mvcc.go, line 2951 at r21 (raw file):
// stored value, ignoring all values from the history that have an // ignored seqnum. // The rempve return value, when true, indicates that
"remove"
f5eea88
to
fe17ea1
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the review! I'll leave this here for the rest of the day if anyone else wants to chime in, and if comments come in after it's merged later today I'm happy to address them in follow up PRs.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 1 stale) (waiting on @nvanbenschoten and @petermattis)
pkg/storage/engine/mvcc.go, line 2783 at r21 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
I think it would make sense to also set
inProgress
to false when removing intents as well.inProgress
also takes the epoch into account, so this would be semantically correct. Then you don't need thisremoveIntent
var at all.
Done. Thanks for identifying that!
pkg/storage/engine/mvcc.go, line 2951 at r21 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
"remove"
Done. Whoops.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Still . Nice job on this @itsbilal!
Reviewed 1 of 4 files at r22, 3 of 3 files at r23.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @petermattis)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 1 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)
pkg/storage/engine/mvcc.go, line 2763 at r23 (raw file):
// If only part of the intent history was rolled back, but the intent still // remains, the rolledBackVal is set to a non-nil value. removeIntent := false
nit: pull this down into the if len(intent.IgnoredSeqNums) > 0 {
scope and declare it as var removeIntent bool
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I mostly scrutinized the new tests. The datadriven tests were a pleasure to read. Mostly nits, with a suggestion for two additional tests (though perhaps I just missed the cases I'm suggesting to add).
Reviewable status: complete! 2 of 0 LGTMs obtained (waiting on @itsbilal and @petermattis)
c-deps/libroach/mvcc.h, line 173 at r23 (raw file):
// see if the current intent seqnum is ignored. // // TODO(nvanbenschoten): this can use use binary search to improve
It should be straightforward to rewrite this using std::lower_bound()
(see https://en.cppreference.com/w/cpp/algorithm/lower_bound). There was a time when I could have told you the precise code to write. I'd leave such a change to a future PR, though.
pkg/storage/engine/mvcc.go, line 1552 at r23 (raw file):
} if existingVal == nil {
Nit: there are a handful of cases of spurious blank lines where I normally wouldn't expect them.
pkg/storage/engine/mvcc.go, line 1632 at r23 (raw file):
// intent, blow away the intent history. if txn.Epoch == meta.Txn.Epoch {
Nit: another spurious blank line.
pkg/storage/engine/testdata/mvcc_histories/conditional_put_with_txn, line 67 at r23 (raw file):
error: (*roachpb.WriteTooOldError:) WriteTooOldError: write at timestamp 0.000000123,0 too old; wrote at 0.000000124,1 # Reset for next test
Perhaps we should add an explicit reset
command. Doing a clear_range
like this requires knowledge of the previous test and what it wrote.
pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums, line 95 at r23 (raw file):
run ok with t=A txn_ignore_seqs seqs=(15-25)
I might have missed it, but I don't see an example where the start range is the same as a seqnum. For example, seqs=(20-21)
. Ditto for the end range: seqs=(19-20)
.
fe17ea1
to
75e22d0
Compare
The MVCC code already had rudimentary understanding of sequence numbers to allow reads to ignore writes at greater seqnums. To implement SQL savepoint rollbacks, we must also support ignoring writes that fall in ignored ranges of seqnums. To achieve this, this commit extends the `mvccScanner` for RocksDB (Pebble code remains to be done) to account for ignored seqnum ranges, and also extends `MVCCResolveWriteIntent` to collapse an intent to the last write that has not been marked to be ignored by a savepoint rollback. Release note: None
This commit implements ignoreSeqNums functionality in the pebble MVCC scanner to mimic that found in the RocksDB one. In addition, it also restructures code and comments in mvccResolveWriteIntent to allow for it to be called in cases where just a rollback is happening (but not a timestamp push or a commit) and the intent history could be cleaned up for better performance. There's also a small bugfix in mvccRewriteIntentHistory that ensures it rewrites the correct value in cases where the intent timestamp is being pushed forward, and the latest value is ignored. Release note: None
The previous change made mvccResolveIntentHistory more complex in some subtle ways, such as the larger variety of cases where they can expect to be called (such as around cleaning up intent histories). This change tests those cases and ensures that the resolution code does the appropriate thing in all expected combinations of those conditions. Release note: None
75e22d0
to
29f34cd
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the reviews! Addressed the nits pointed out, as well as the test suggestion. Merging on green CI.
Reviewable status: complete! 0 of 0 LGTMs obtained (and 2 stale) (waiting on @itsbilal, @nvanbenschoten, and @petermattis)
pkg/storage/engine/mvcc.go, line 1552 at r23 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: there are a handful of cases of spurious blank lines where I normally wouldn't expect them.
Done.
pkg/storage/engine/mvcc.go, line 1632 at r23 (raw file):
Previously, petermattis (Peter Mattis) wrote…
Nit: another spurious blank line.
Done.
pkg/storage/engine/mvcc.go, line 2763 at r23 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
nit: pull this down into the
if len(intent.IgnoredSeqNums) > 0 {
scope and declare it asvar removeIntent bool
.
Done.
pkg/storage/engine/testdata/mvcc_histories/ignored_seq_nums, line 95 at r23 (raw file):
Previously, petermattis (Peter Mattis) wrote…
I might have missed it, but I don't see an example where the start range is the same as a seqnum. For example,
seqs=(20-21)
. Ditto for the end range:seqs=(19-20)
.
Done. You were right, those edges weren't being tested. Now they are.
bors r+ |
42582: storage: Update MVCC on {Rocks,Pebble} to support ignored seqnum ranges and rollbacks r=itsbilal a=itsbilal Builds on Raphael's PR #42152 and adds two commits, the second of which improves testing and the first of which adds the pebble MVCC scanner part of the ignore list functionality described in #41612, as well as restructures code in `mvccResolveWriteIntent` for better readability and to make it usable in the intent-history-cleanup-only case. Fixes #41612 . Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Bilal Akhtar <bilal@cockroachlabs.com>
Build succeeded |
The Intent message type has been a frequent source of complexity and confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to add an IgnoredSeqNums field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like: - why would an Intent ever have a non-PENDING status? - why would an Intent ever span multiple keys? - where are Intents created? - is this Intent message containing the authoritative state of the txn? - does this Intent message have a populated IgnoredSeqNums slice? The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents. This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field). While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when cockroachdb#44976 formally introduces unreplicated locks to the key-value layer. I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for cockroachdb#44976 and I'm feeling a little pressed for time with that change and the impending stability period.
The Intent message type has been a frequent source of complexity and confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to add an IgnoredSeqNums field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like: - why would an Intent ever have a non-PENDING status? - why would an Intent ever span multiple keys? - where are Intents created? - is this Intent message containing the authoritative state of the txn? - does this Intent message have a populated IgnoredSeqNums slice? The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents. This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field). While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when cockroachdb#44976 formally introduces unreplicated locks to the key-value layer. I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for cockroachdb#44976 and I'm feeling a little pressed for time with that change and the impending stability period.
The Intent message type has been a frequent source of complexity and confusion. cockroachdb#42152/cockroachdb#42582 found this out first hand when it tried to add an IgnoredSeqNums field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like: - why would an Intent ever have a non-PENDING status? - why would an Intent ever span multiple keys? - where are Intents created? - is this Intent message containing the authoritative state of the txn? - does this Intent message have a populated IgnoredSeqNums slice? The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents. This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field). While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when cockroachdb#44976 formally introduces unreplicated locks to the key-value layer. I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for cockroachdb#44976 and I'm feeling a little pressed for time with that change and the impending stability period.
45235: roachpb: split Intent proto message into Intent and LockUpdate messages r=nvanbenschoten a=nvanbenschoten The Intent message type has been a frequent source of complexity and confusion. @knz found this out first-hand in #42152/#42582 when he tried to add an `IgnoredSeqNums` field to the message type. This led to a series of questions that were not easily answered by simply reading about the message type, like: - why would an Intent ever have a non-PENDING status? - why would an Intent ever span multiple keys? - where are Intents created? - is this Intent message containing the authoritative state of the txn? - does this Intent message have a populated IgnoredSeqNums slice? The root problem was that the message type was serving two roles, as was documented in its comment. It both referenced on-disk write intents and served as a way to talk about updates that should be made to these write intents. This commit addresses this issue by splitting Intent into two (wire-compatible) messages with distinct purposes - Intent and LockUpdate. Intent's new sole purpose is now to reference persistent on-disk write intents on return paths of MVCC operations. They are only ever created when on-disk intents are encountered. LockUpdate's sole purpose is to represent updates that are intended for all locks held by the transaction within its span. They are only ever created after their transaction's record has been consulted, either after an EndTxn or a PushTxn request. This split simplifies the use of each of these types and helps make the questions above easier to answer or impossible to even ask (e.g. Intent no longer has a Status field). While here, the commit adds a `durability` field to LockUpdate. This is used to inform the message's recipient about the durability of the lock(s) it is being instructed to update. This will be important soon when #44976 formally introduces unreplicated locks to the key-value layer. I would suggest that reviewers start with the proto changes in `pkg/roachpb` and fan-out from there. Everything else is essentially fallout from the typing change. This is still a bit of a WIP as there is more renaming necessary before this could be merged, but I want to get people's opinions on this sooner rather than later because it's partly on the critical path for #44976 and I'm feeling a little pressed for time with that change and the impending stability period. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Builds on Raphael's PR #42152 and adds two commits, the second of which improves testing and the first of which adds the pebble MVCC scanner part of the ignore list functionality described in #41612, as well as restructures code in
mvccResolveWriteIntent
for better readability and to make it usable in the intent-history-cleanup-only case.Fixes #41612 .