-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
docs: new tech note on TxnCoordSender #42116
Conversation
@tbg you wanted to have a look, I think this is mature enough for you to read (and dump additional knowledge into if you have it). @nvanbenschoten I acknowledge that this tech note, as-is, is not doing justice to all the logic and intelligence present in the various interceptors. If you have reference links or extra contents to add, I'll take it. |
6188c65
to
b65e00a
Compare
TODO(self):
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice write up. I added a few comments.
Reviewed 19 of 19 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @nvanbenschoten)
docs/tech-notes/txn_coord_sender.md, line 453 at r1 (raw file):
object and the TCS to become unusable. Indeed, there is no good reason for that. It is actually silly, as a SQL client may legitimately want to continue using the txn object after detecting a logical error (eg
I wouldn't call this silly, just poor design. Whenever we extract information from an error that counts as a read, we have to make sure that read is accounted for by the KV store. For example, a ConditionFailedError from a CPut is in fact a successful read; if this isn't accounted for properly at the KV layer (timestamp cache update), there will be anomalies (CPut has code to do exactly that here). I generally find it unfortunate that we're relying on errors to return what are really logical results, and I hope that we buy out of that as much as we can. For CPut, for example, we'd have ConditionalPutResponse
carry a flag that tells us the actual value and whether a write was carried out. I suspect we're using errors for some of these only because errors eagerly halt the processing of the current batch.
Continuing past errors generally needs a good amount of idempotency (for example, getting a timeout during a CPut and retrying the CPut without seqnos could read-your-own-write). We had no way of doing that prior to the intent history and seqnos.
By the way, in case you haven't stumbled upon this yet, the txn span refresher has a horrible contract with DistSender, where DistSender returns a partially populated response batch on errors, from which the refresher then picks out spans for its refresh set. I'm wondering how this hasn't backfired yet.
docs/tech-notes/txn_coord_sender.md, line 486 at r1 (raw file):
complexity to process writes, concurrency would be limited by this counter, until/unless we can guarantee that two separate TCSs generate separate operation identifiers.
I wonder if we also need their read/write spans to be non-overlapping. There's all sort of weird stuff if they do overlap, though maybe it's not actually illegal. Either way, not something we're going to do today or tomorrow.
docs/tech-notes/txn_coord_sender.md, line 503 at r1 (raw file):
then at later instant t2 is augmented by a LeafTxn whose state was part of its "past" using the original txn object. How to combine the states from the two "generations" of the txn object?
The real problem I see here is similar to allowing leaves to write. You have to reason about the semantics in case the spans accessed overlap. If we've answered that, I'm not aware of any specific problem with updating the state in the way you're describing. As long as there is a distinguished error handling actor, this looks just like what txn.Update
already does (except we need to merge the whole state, like refresh spans, etc). Curious if you had a concrete problem in mind.
PS I'm totally fine not wanting to change this restriction, just working on my understanding.
docs/tech-notes/txn_coord_sender.md, line 549 at r1 (raw file):
- read operations operate "at" a particular seqnum: a MVCC read that encounters an intent ignores the values written at later seqnums and returns the most recent value at that seqnum instead.
Isn't this one-offy? If I want to retry a CPut idempotently (because the first one got context canceled, who knows), I'd want it to not see its own seqno.
I just realized it's worse than that, the cput has to ignore seqnums for its condition too (it shouldn't see a rolled back value when testing the condition). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
docs/tech-notes/txn_coord_sender.md, line 384 at r1 (raw file):
its "identity" (ID, epoch) is unchanged. For example, txn refreshes are processes automatically
are processed
docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):
This group contains 3 kinds of errors: 1. *transaction aborts* (`TransactionAbortedError`), which occurs when
TransactionAbortedError
trashes the TCS, but not the client.Txn. The client.Txn will recognize the error an instantiate a new TCS.
docs/tech-notes/txn_coord_sender.md, line 461 at r1 (raw file):
inside the TCS may become different "on the way out" (back to client.Txn and the SQL layer) from what it was "on the way in". It is currently the responsibility of the client (SQL layer), which may have
What are you referring to here? What state is SQL picking up on?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As I understand it, ConditionalPut demonstrates something we missed when introducing seqnos: commands that combine both a read and a write operation need two seqnums, one for its reads and one for its writes. Isn't the correct thing to use seqno-1
for the read portion? And of course, like any other read, it also needs to ignore the rolled back sequences, as you point out. I don't see a major problem here, just something we overlooked so far.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)
Release note: None
b65e00a
to
ee398a5
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Isn't the correct thing to use seqno-1 for the read portion?
Yes Nathan has convinced me this is true. Added a note to this effect.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)
docs/tech-notes/txn_coord_sender.md, line 384 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
are processed
Done.
docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
TransactionAbortedError
trashes the TCS, but not the client.Txn. The client.Txn will recognize the error an instantiate a new TCS.
Your prompt invited me to investigate more and I found out the situation is more complicated than that. There's in fact two categories of abort errors.
docs/tech-notes/txn_coord_sender.md, line 453 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I wouldn't call this silly, just poor design. Whenever we extract information from an error that counts as a read, we have to make sure that read is accounted for by the KV store. For example, a ConditionFailedError from a CPut is in fact a successful read; if this isn't accounted for properly at the KV layer (timestamp cache update), there will be anomalies (CPut has code to do exactly that here). I generally find it unfortunate that we're relying on errors to return what are really logical results, and I hope that we buy out of that as much as we can. For CPut, for example, we'd have
ConditionalPutResponse
carry a flag that tells us the actual value and whether a write was carried out. I suspect we're using errors for some of these only because errors eagerly halt the processing of the current batch.
Continuing past errors generally needs a good amount of idempotency (for example, getting a timeout during a CPut and retrying the CPut without seqnos could read-your-own-write). We had no way of doing that prior to the intent history and seqnos.By the way, in case you haven't stumbled upon this yet, the txn span refresher has a horrible contract with DistSender, where DistSender returns a partially populated response batch on errors, from which the refresher then picks out spans for its refresh set. I'm wondering how this hasn't backfired yet.
Added your comments in the tech note.
docs/tech-notes/txn_coord_sender.md, line 461 at r1 (raw file):
Previously, andreimatei (Andrei Matei) wrote…
What are you referring to here? What state is SQL picking up on?
Added comment to clarify.
docs/tech-notes/txn_coord_sender.md, line 486 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
I wonder if we also need their read/write spans to be non-overlapping. There's all sort of weird stuff if they do overlap, though maybe it's not actually illegal. Either way, not something we're going to do today or tomorrow.
Added note.
docs/tech-notes/txn_coord_sender.md, line 503 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
The real problem I see here is similar to allowing leaves to write. You have to reason about the semantics in case the spans accessed overlap. If we've answered that, I'm not aware of any specific problem with updating the state in the way you're describing. As long as there is a distinguished error handling actor, this looks just like what
txn.Update
already does (except we need to merge the whole state, like refresh spans, etc). Curious if you had a concrete problem in mind.
PS I'm totally fine not wanting to change this restriction, just working on my understanding.
My opinion is that this is the wrong problem to solve. These complex questions, which may or may not have good answers, should not be necessary to answer in the first place. Enabling concurrency between Root and Leafs on the same spans is just a horrible mess.
docs/tech-notes/txn_coord_sender.md, line 549 at r1 (raw file):
Previously, tbg (Tobias Grieger) wrote…
Isn't this one-offy? If I want to retry a CPut idempotently (because the first one got context canceled, who knows), I'd want it to not see its own seqno.
Done.
Going to merge this as it now has enough content to base further work on. Also it's needed to ground the discussion on #42766. bors r+ |
42116: docs: new tech note on TxnCoordSender r=knz a=knz I am splitting off this tech note from #41569 so it can become a document that can be consumed standalone. Direct link for easier consumption/review: https://github.com/knz/cockroach/blob/20191101-tcs-tech-note/docs/tech-notes/txn_coord_sender.md Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>
Build succeeded |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, @nvanbenschoten, and @tbg)
docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):
Previously, knz (kena) wrote…
Your prompt invited me to investigate more and I found out the situation is more complicated than that. There's in fact two categories of abort errors.
Are there? I'm not sure what you're referring to here, I think it's wrong. AOST read under GC doesn't have anything to do with a TransactionAbortedError
, does it?
TransactionAbortedError
means one thing - some other transaction blew away some of our intents.
docs/tech-notes/txn_coord_sender.md, line 417 at r2 (raw file):
internal CRDB mechanism, but not due to a logical error. Intuitively, this corresponds to errors due to the distributed nature of CRDB which would not occur in a
Deadlock between multiple transactions is the case where transaction aborts happen because of "user fault". That's analogous to single-node systems - there too sometimes transactions are aborted and forced to release their locks.
Can you suggest an extension to the text to make it better? |
I am splitting off this tech note from #41569 so it can become a document that can be consumed standalone.
Direct link for easier consumption/review:
https://github.com/knz/cockroach/blob/20191101-tcs-tech-note/docs/tech-notes/txn_coord_sender.md