-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
roachpb,kv: new TxnCoordSender API for step-wise execution #42854
Conversation
b86bf3d
to
62c40e4
Compare
b6caee1
to
efe19d0
Compare
@nvanbenschoten I think I will need your help on this one. I have tried two different approaches and each of them breaks for an "interesting" reason.
I think I really want to take approach (1) but I don't understand how/where to preserve the 1PC optimization. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you spell out why 1PC batches need sequence number 0? Is there a good reason?
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @nvanbenschoten)
No — I can't spell it out because I don't understand the optimization completely and I did not design it. However my working theory is that the thing checks that there was just 1 batch by verifying that the operation that sets the sequence number (e.g. 1) is also present in the batch — with the logic proposed here, it won't find it. However I don't know where this code lives. I really hope you or Nathan can help. |
Oh I see. Check this out: Would making it expect 2 instead of 1 fix your problem? |
Maybe. My problem is that my code currently lets But I could change the default to become 1. The problem I see with that is cross-version clusters. If I change the code you point me to, then during an upgrade the optimization will be disabled for all requests coming from pre-upgrade nodes. Is that acceptable? |
efe19d0
to
d03d50b
Compare
I have updated the PR after chatting with Nathan and Andrei:
|
Ok this and #42862 now do the job properly. RFAL |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
pkg/kv/txn_interceptor_seq_num_allocator.go, line 67 at r1 (raw file):
writeSeq enginepb.TxnSeq // readSeqPlusOne is either:
This plus one business is a bit confusing. I'd try grouping writeSeq
and readSeq
into a separate seqCounter
struct, with a IncWriteSeq()
, LockReadSeq()
, GetReadSeq()
methods, have a bool for whether the stepping is enabled, and have GetReadSeq()
return writeSeq
or readSeq
depending on the bool.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 104 at r1 (raw file):
// enabled, then we want the read operation to read at the read // seqnum, not the latest write seqnum. oldHeader.Sequence = s.readSeqPlusOne - 1
Let's make put a note here about read+write requests operating at writeSeq because they're special, to make it clear that it's intentional.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 139 at r1 (raw file):
func (s *txnSeqNumAllocator) stepLocked() error { if s.readSeqPlusOne-1 > s.writeSeq { return errors.AssertionFailedf("cannot step() after mistaken initialization (%d,%d)", s.writeSeq, s.readSeqPlusOne)
long line
pkg/kv/txn_interceptor_seq_num_allocator.go, line 139 at r1 (raw file):
func (s *txnSeqNumAllocator) stepLocked() error { if s.readSeqPlusOne-1 > s.writeSeq { return errors.AssertionFailedf("cannot step() after mistaken initialization (%d,%d)", s.writeSeq, s.readSeqPlusOne)
nit: I'd just say "invalid read seq" or such
pkg/kv/txn_interceptor_seq_num_allocator.go, line 148 at r1 (raw file):
// restores read-latest-write behavior. // Used by the TxnCoordSender's DisableStepping() method. func (s *txnSeqNumAllocator) disableSteppingLocked() error {
get rid of the error return
pkg/kv/txn_interceptor_seq_num_allocator.go, line 157 at r1 (raw file):
s.commandCount = 0 s.writeSeq = 0 if s.readSeqPlusOne > 0 {
Wouldn't a more natural behavior be for epochBumpedLocked()
to imply disableSteppingLocked()
? Like, let's start every epoch with stepping disabled in the spirit of forgetting everything that was done on the transaction before the restart.
As it stands, the behavior is inconsistent with what happens to a txn after a TransactionAbortedError
where I don't see code to enable stepping on the new txn.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 159 at r1 (raw file):
if s.readSeqPlusOne > 0 { s.readSeqPlusOne = 1 } else {
I think the else is unnecessary.
pkg/roachpb/data.proto, line 636 at r1 (raw file):
// Current read seqnum, plus 1. The special value of zero indicates synchronous // read-own-writes, where every KV read is able to observe the latest writes. // As soon as the value is increased past 0, (value - 1) becomes the sequence number
nit: bad wrapping
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
pkg/storage/replica_evaluate.go, line 246 at r1 (raw file):
args := union.GetInner() if baHeader.Txn != nil { // Set the Request's sequence number on the TxnMeta for this
TxnMeta
will no longer be accurate here, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewed 9 of 9 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @knz)
pkg/internal/client/sender.go, line 210 at r1 (raw file):
SerializeTxn() *roachpb.Transaction // Step creates a sequencing point in the current transaction. A
Could you replace "read operations" with "read-only operations" throughout these descriptions?
pkg/internal/client/sender.go, line 228 at r1 (raw file):
// effect remains disabled until the next call to Step(). // The method is idempotent. DisableStepping() error
Make it clear somewhere that transactions start with stepping disabled.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 67 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
This plus one business is a bit confusing. I'd try grouping
writeSeq
andreadSeq
into a separateseqCounter
struct, with aIncWriteSeq()
,LockReadSeq()
,GetReadSeq()
methods, have a bool for whether the stepping is enabled, and haveGetReadSeq()
returnwriteSeq
orreadSeq
depending on the bool.
Yeah, I like this idea. The "plus one" business seems to spread across a lot of places.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 100 at r1 (raw file):
// Default case: operate at the current seqnum. oldHeader.Sequence = s.writeSeq if s.readSeqPlusOne > 0 && roachpb.IsReadOnly(req) {
How do you feel about restructuring this to something like:
var seq enginepb.TxnSeq
if roachpb.IsTransactionWrite(req) || req.Method() == roachpb.EndTransaction {
s.writeSeq++
seq = s.writeSeq
} else /* if roachpb.IsReadOnly(req) */ {
if s.readSeqPlusOne > 0 {
seq = s.readSeqPlusOne - 1
} else {
seq = s.writeSeq
}
}
Guys if you want me to do something else than |
discussed with Andrei: the proposed structure works because we only care about Root->Leaf conversion |
831afb0
to
11045da
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/internal/client/sender.go, line 210 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Could you replace "read operations" with "read-only operations" throughout these descriptions?
Done.
pkg/internal/client/sender.go, line 228 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Make it clear somewhere that transactions start with stepping disabled.
Done.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 67 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
Yeah, I like this idea. The "plus one" business seems to spread across a lot of places.
I simplified this. PTAL.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 100 at r1 (raw file):
Previously, nvanbenschoten (Nathan VanBenschoten) wrote…
How do you feel about restructuring this to something like:
var seq enginepb.TxnSeq if roachpb.IsTransactionWrite(req) || req.Method() == roachpb.EndTransaction { s.writeSeq++ seq = s.writeSeq } else /* if roachpb.IsReadOnly(req) */ { if s.readSeqPlusOne > 0 { seq = s.readSeqPlusOne - 1 } else { seq = s.writeSeq } }
I can't do this: The complement of IsTransactionWrite
is not IsReadOnly
.
Anyway with the new field usage, the code is slightly easier on the eye.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 104 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Let's make put a note here about read+write requests operating at writeSeq because they're special, to make it clear that it's intentional.
Done.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 139 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
long line
Done.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 148 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
get rid of the error return
I'd rather not. I want to add an assert in here when #43032 lands.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 157 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
Wouldn't a more natural behavior be for
epochBumpedLocked()
to implydisableSteppingLocked()
? Like, let's start every epoch with stepping disabled in the spirit of forgetting everything that was done on the transaction before the restart.
As it stands, the behavior is inconsistent with what happens to a txn after aTransactionAbortedError
where I don't see code to enable stepping on the new txn.
Thanks for explaining. Done.
pkg/roachpb/data.proto, line 636 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: bad wrapping
Done.
pkg/storage/replica_evaluate.go, line 246 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
TxnMeta
will no longer be accurate here, right?
What do you mean?
11045da
to
ef43241
Compare
ef43241
to
cef4620
Compare
Rebased this on top of now-ready #43032. RFAL. @andreimatei check that the API definition works for you (and complies to RFC) @nvanbenschoten (optionally) check that the sequence number allocation is correct. I think you've checked that before and I haven't changed the logic since your last review. |
43032: kv: remove `GetMeta`, `AugmentMeta` and `TxnCoordMeta` r=knz a=knz Recommended/requested by @nvanbenschoten. Discussed with @andreimatei. Prerequisite to completing #42854. Prior to this patch, the same data structure `TxnCoordMeta` was used both to initialize a LeafTxn from a RootTxn, and a RootTxn from a LeafTxn. Moreover, the same method on `TxnCoordSender` (`AugmentMeta`) was used to "configure" a txn into a root or a leaf, and to update a root from the final state of leaves. This was causing difficult questions when adding features (all the fields in TxnCoordMeta needed to produce effects in one direction and no-ops in the other). It was also making it hard to read and understand the API. This patch alleviates this problem by separating the two protocols: ```go // From roots: func (txn *Txn) GetLeafTxnInputStateOrRejectClient(context.Context) (roachpb.LeafTxnInputState, error) // to create a new leaf: func NewLeafTxn(context.Context, *DB, roachpb.NodeID, *roachpb.LeafTxnInputState) *Txn // From leaves, at end of use: func (txn *Txn) GetLeafTxnFinalState(context.Context) roachpb.LeafTxnFinalState // Back into roots: func (txn *Txn) UpdateRootWithLeafFinalState(context.Context, tfs *roachpb.LeafTxnFinalState) ``` Additionally, this patch: - removes the general-purpose `Serialize()` method, and replaces it by `TestingCloneTxn()` specifically purposed for use in testing. - removes direct access to the TxnMeta `WriteTimestamp` in the SQL conn executor (to establish a high water mark for table lease expiration), and replaces it by a call to a new method `ProvisionalCommitTimestamp()`. Release note: None 43296: sql: support EXPLAIN with AS OF SYSTEM TIME r=RaduBerinde a=RaduBerinde We apparently can't stick an `EXPLAIN` in front of a query that uses AOST. The fix is very easy, we need an extra case for the logic that figures out the statement-wide timestamp. Note that if we want to do `SELECT FROM [EXPLAIN ...]`, in that case we still need to add AS OF SYSTEM TIME to the outer clause as usual. Fixes #43294. Release note (bug fix): EXPLAIN can now be used with statements that use AS OF SYSTEM TIME. 43300: blobs: Stat method bug fix r=g3orgia a=g3orgia The stat method has a typo/bug, it return nil instead of err. This PR fixes it. Release note: None 43302: scripts: fix the release note script to pick up more backports r=knz a=knz Prior to this patch, the release note script was only recognizing PR merges from either of two formats: - from Bors, starting with `Merge #xxx #yyy` - from Github, starting with `Merge pull request #xxx from ...` Sometime in 2019, Github has started populating merge commits using a different format, using instead the title of the PR followed by the PR number between parentheses. This new format was not recognized by the script and caused it to skip over these merges (and all their underlying commits) and list them in the section "changes without a release note annotation". This patch fixes it. Release note: None Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net> Co-authored-by: Radu Berinde <radu@cockroachlabs.com> Co-authored-by: Georgia Hong <georgiah@cockroachlabs.com>
cef4620
to
27b4181
Compare
Friendly ping. |
b796571
to
eb8f4e5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @nvanbenschoten)
pkg/internal/client/sender.go, line 228 at r2 (raw file):
// Step creates a sequencing point in the current transaction. A // sequencing point establishes a snapshot baseline for subsequent // read-only operations: until the next sequencing point, read-only operations
pretty please wrap the lines
pkg/internal/client/sender.go, line 244 at r2 (raw file):
// effect remains disabled until the next call to Step(). // The method is idempotent. // Note that a TxnCoordSender is initially in the non-
I don't think we should mention the TCS in this comment. All implementors of this interface must be in "non-stepping mode" initially.
pkg/kv/txn_coord_sender.go, line 337 at r2 (raw file):
// Load the in-flight writes in the pipeliner. tcs.interceptorAlloc.txnPipeliner.initializeLeaf(tis) // Load the read seqnum into the seq num allocator.
nit: I think this comment and the one above are bound to go stale. They also duplicate the comments on the respective methods.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 148 at r1 (raw file):
Previously, knz (kena) wrote…
I'd rather not. I want to add an assert in here when #43032 lands.
#43032 and I see no assertion...
pkg/kv/txn_interceptor_seq_num_allocator.go, line 62 at r2 (raw file):
wrapped lockedSender // writeSeq is the current write seqnum, or the value last assigned
nit: the "or" is ambiguous, it'd replace it with a "(...)" to make it clear that what follows is an explanation.
pkg/roachpb/data.proto, line 623 at r2 (raw file):
// regardless of the current seqnum generated for writes. This is // updated via the (client.TxnSender).Step() operation. int32 read_seq_num_plus_one = 9 [
you don't want to add the bool here too instead of the plus one business?
pkg/storage/replica_evaluate.go, line 246 at r1 (raw file):
Previously, knz (kena) wrote…
What do you mean?
I don't know what I meant :). nvm.
eb8f4e5
to
4d5f829
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @nvanbenschoten)
pkg/internal/client/sender.go, line 228 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
pretty please wrap the lines
Done.
pkg/internal/client/sender.go, line 244 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
I don't think we should mention the TCS in this comment. All implementors of this interface must be in "non-stepping mode" initially.
Done.
pkg/kv/txn_coord_sender.go, line 337 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: I think this comment and the one above are bound to go stale. They also duplicate the comments on the respective methods.
Thanks. Changed the comment to hint about what to do in the future.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 148 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
#43032 and I see no assertion...
Done.
pkg/kv/txn_interceptor_seq_num_allocator.go, line 62 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
nit: the "or" is ambiguous, it'd replace it with a "(...)" to make it clear that what follows is an explanation.
Done.
pkg/roachpb/data.proto, line 623 at r2 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
you don't want to add the bool here too instead of the plus one business?
I don't have strong feelings for either. What is best?
This patch introduces the following methods on `client.TxnSender`: ```go // Step creates a sequencing point in the current transaction. A // sequencing point establishes a snapshot baseline for subsequent // read operations: until the next sequencing point, read operations // observe the data at the time the snapshot was established and // ignore writes performed since. // // Before the first step is taken, the transaction operates as if // there was a step after every write: each read to a key is able to // see the latest write before it. This makes the step behavior // opt-in and backward-compatible with existing code which does not // need it. // The method is idempotent. Step() error // DisableStepping disables the sequencing point behavior and // ensures that every read can read the latest write. The // effect remains disabled until the next call to Step(). // The method is idempotent. DisableStepping() error ``` Additionally it implements it in the `TxnCoordSender`. `Step()` is the most important and will be used to introduce sequence points between SQL statements. `DisableStepping()` is a convenience function, so as to avoid introducing many `Step()` calls in the code that performs schema changes. Release note: none
4d5f829
to
a67a68e
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei and @nvanbenschoten)
pkg/roachpb/data.proto, line 623 at r2 (raw file):
Previously, knz (kena) wrote…
I don't have strong feelings for either. What is best?
Separated the fields as per previous suggestion.
TFYRs! bors r+ |
42854: roachpb,kv: new TxnCoordSender API for step-wise execution r=knz a=knz Supports #42864. This patch introduces the following methods on `client.TxnSender`: ```go // Step creates a sequencing point in the current transaction. A // sequencing point establishes a snapshot baseline for subsequent // read operations: until the next sequencing point, read operations // observe the data at the time the snapshot was established and // ignore writes performed since. // // Before the first step is taken, the transaction operates as if // there was a step after every write: each read to a key is able to // see the latest write before it. This makes the step behavior // opt-in and backward-compatible with existing code which does not // need it. // The method is idempotent. Step() error // DisableStepping disables the sequencing point behavior and // ensures that every read can read the latest write. The // effect remains disabled until the next call to Step(). // The method is idempotent. DisableStepping() error ``` Additionally it implements it in the `TxnCoordSender`. `Step()` is the most important and will be used to introduce sequence points between SQL statements. `DisableStepping()` is a convenience function, so as to avoid introducing many `Step()` calls in the code that performs schema changes. Release note: none Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Build succeeded |
Supports #42864.
This patch introduces the following methods on
client.TxnSender
:Additionally it implements it in the
TxnCoordSender
.Step()
is the most important and will be used to introduce sequencepoints between SQL statements.
DisableStepping()
is a convenience function, so as to avoidintroducing many
Step()
calls in the code that performs schemachanges.
Release note: none