docs: new tech note on TxnCoordSender #42116

knz · 2019-11-01T16:57:47Z

I am splitting off this tech note from #41569 so it can become a document that can be consumed standalone.

Direct link for easier consumption/review:
https://github.com/knz/cockroach/blob/20191101-tcs-tech-note/docs/tech-notes/txn_coord_sender.md

cockroach-teamcity · 2019-11-01T16:57:55Z

This change is

knz · 2019-11-01T16:58:43Z

@tbg you wanted to have a look, I think this is mature enough for you to read (and dump additional knowledge into if you have it).

@nvanbenschoten I acknowledge that this tech note, as-is, is not doing justice to all the logic and intelligence present in the various interceptors. If you have reference links or extra contents to add, I'll take it.

knz · 2019-11-01T17:14:20Z

TODO(self):

integrate the discussion on WriteTooOldError from the mailing list.

tbg

Nice write up. I added a few comments.

Reviewed 19 of 19 files at r1.
Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, and @nvanbenschoten)

docs/tech-notes/txn_coord_sender.md, line 453 at r1 (raw file):

object and the TCS to become unusable. Indeed, there is no good reason
for that. It is actually silly, as a SQL client may legitimately want
to continue using the txn object after detecting a logical error (eg

I wouldn't call this silly, just poor design. Whenever we extract information from an error that counts as a read, we have to make sure that read is accounted for by the KV store. For example, a ConditionFailedError from a CPut is in fact a successful read; if this isn't accounted for properly at the KV layer (timestamp cache update), there will be anomalies (CPut has code to do exactly that here). I generally find it unfortunate that we're relying on errors to return what are really logical results, and I hope that we buy out of that as much as we can. For CPut, for example, we'd have ConditionalPutResponse carry a flag that tells us the actual value and whether a write was carried out. I suspect we're using errors for some of these only because errors eagerly halt the processing of the current batch.
Continuing past errors generally needs a good amount of idempotency (for example, getting a timeout during a CPut and retrying the CPut without seqnos could read-your-own-write). We had no way of doing that prior to the intent history and seqnos.

By the way, in case you haven't stumbled upon this yet, the txn span refresher has a horrible contract with DistSender, where DistSender returns a partially populated response batch on errors, from which the refresher then picks out spans for its refresh set. I'm wondering how this hasn't backfired yet.

docs/tech-notes/txn_coord_sender.md, line 486 at r1 (raw file):

  complexity to process writes, concurrency would be limited by this
  counter, until/unless we can guarantee that two separate TCSs
  generate separate operation identifiers.

I wonder if we also need their read/write spans to be non-overlapping. There's all sort of weird stuff if they do overlap, though maybe it's not actually illegal. Either way, not something we're going to do today or tomorrow.

docs/tech-notes/txn_coord_sender.md, line 503 at r1 (raw file):

  then at later instant t2 is augmented by a LeafTxn whose
  state was part of its "past" using the original txn object.
  How to combine the states from the two "generations" of the txn object?

The real problem I see here is similar to allowing leaves to write. You have to reason about the semantics in case the spans accessed overlap. If we've answered that, I'm not aware of any specific problem with updating the state in the way you're describing. As long as there is a distinguished error handling actor, this looks just like what txn.Update already does (except we need to merge the whole state, like refresh spans, etc). Curious if you had a concrete problem in mind.
PS I'm totally fine not wanting to change this restriction, just working on my understanding.

docs/tech-notes/txn_coord_sender.md, line 549 at r1 (raw file):

- read operations operate "at" a particular seqnum: a MVCC read that
  encounters an intent ignores the values written at later seqnums
  and returns the most recent value at that seqnum instead.

Isn't this one-offy? If I want to retry a CPut idempotently (because the first one got context canceled, who knows), I'd want it to not see its own seqno.

knz · 2019-11-04T16:10:49Z

If I want to retry a CPut idempotently (because the first one got context canceled, who knows), I'd want it to not see its own seqno.

I just realized it's worse than that, the cput has to ignore seqnums for its condition too (it shouldn't see a rolled back value when testing the condition).

andreimatei

LGTM

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

docs/tech-notes/txn_coord_sender.md, line 384 at r1 (raw file):

     its "identity" (ID, epoch) is unchanged.

     For example, txn refreshes are processes automatically

are processed

docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):

  This group contains 3 kinds of errors:

  1. *transaction aborts* (`TransactionAbortedError`), which occurs when

TransactionAbortedError trashes the TCS, but not the client.Txn. The client.Txn will recognize the error an instantiate a new TCS.

docs/tech-notes/txn_coord_sender.md, line 461 at r1 (raw file):

inside the TCS may become different "on the way out" (back to
client.Txn and the SQL layer) from what it was "on the way in". It is
currently the responsibility of the client (SQL layer), which may have

What are you referring to here? What state is SQL picking up on?

tbg

As I understand it, ConditionalPut demonstrates something we missed when introducing seqnos: commands that combine both a read and a write operation need two seqnums, one for its reads and one for its writes. Isn't the correct thing to use seqno-1 for the read portion? And of course, like any other read, it also needs to ignore the rolled back sequences, as you point out. I don't see a major problem here, just something we overlooked so far.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @knz and @nvanbenschoten)

Release note: None

knz

Isn't the correct thing to use seqno-1 for the read portion?

Yes Nathan has convinced me this is true. Added a note to this effect.

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @nvanbenschoten, and @tbg)

docs/tech-notes/txn_coord_sender.md, line 384 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

are processed

Done.

docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

TransactionAbortedError trashes the TCS, but not the client.Txn. The client.Txn will recognize the error an instantiate a new TCS.

Your prompt invited me to investigate more and I found out the situation is more complicated than that. There's in fact two categories of abort errors.

docs/tech-notes/txn_coord_sender.md, line 453 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I wouldn't call this silly, just poor design. Whenever we extract information from an error that counts as a read, we have to make sure that read is accounted for by the KV store. For example, a ConditionFailedError from a CPut is in fact a successful read; if this isn't accounted for properly at the KV layer (timestamp cache update), there will be anomalies (CPut has code to do exactly that here). I generally find it unfortunate that we're relying on errors to return what are really logical results, and I hope that we buy out of that as much as we can. For CPut, for example, we'd have ConditionalPutResponse carry a flag that tells us the actual value and whether a write was carried out. I suspect we're using errors for some of these only because errors eagerly halt the processing of the current batch.
Continuing past errors generally needs a good amount of idempotency (for example, getting a timeout during a CPut and retrying the CPut without seqnos could read-your-own-write). We had no way of doing that prior to the intent history and seqnos.

By the way, in case you haven't stumbled upon this yet, the txn span refresher has a horrible contract with DistSender, where DistSender returns a partially populated response batch on errors, from which the refresher then picks out spans for its refresh set. I'm wondering how this hasn't backfired yet.

Added your comments in the tech note.

docs/tech-notes/txn_coord_sender.md, line 461 at r1 (raw file):

Previously, andreimatei (Andrei Matei) wrote…

What are you referring to here? What state is SQL picking up on?

Added comment to clarify.

docs/tech-notes/txn_coord_sender.md, line 486 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

I wonder if we also need their read/write spans to be non-overlapping. There's all sort of weird stuff if they do overlap, though maybe it's not actually illegal. Either way, not something we're going to do today or tomorrow.

Added note.

docs/tech-notes/txn_coord_sender.md, line 503 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

The real problem I see here is similar to allowing leaves to write. You have to reason about the semantics in case the spans accessed overlap. If we've answered that, I'm not aware of any specific problem with updating the state in the way you're describing. As long as there is a distinguished error handling actor, this looks just like what txn.Update already does (except we need to merge the whole state, like refresh spans, etc). Curious if you had a concrete problem in mind.
PS I'm totally fine not wanting to change this restriction, just working on my understanding.

My opinion is that this is the wrong problem to solve. These complex questions, which may or may not have good answers, should not be necessary to answer in the first place. Enabling concurrency between Root and Leafs on the same spans is just a horrible mess.

docs/tech-notes/txn_coord_sender.md, line 549 at r1 (raw file):

Previously, tbg (Tobias Grieger) wrote…

Isn't this one-offy? If I want to retry a CPut idempotently (because the first one got context canceled, who knows), I'd want it to not see its own seqno.

Done.

knz · 2019-11-26T15:20:49Z

Going to merge this as it now has enough content to base further work on.

Also it's needed to ground the discussion on #42766.

bors r+

42116: docs: new tech note on TxnCoordSender r=knz a=knz I am splitting off this tech note from #41569 so it can become a document that can be consumed standalone. Direct link for easier consumption/review: https://github.com/knz/cockroach/blob/20191101-tcs-tech-note/docs/tech-notes/txn_coord_sender.md Co-authored-by: Raphael 'kena' Poss <knz@thaumogen.net>

craig · 2019-11-26T15:52:28Z

Build succeeded

GitHub CI (Cockroach)

andreimatei

Reviewable status: complete! 0 of 0 LGTMs obtained (waiting on @andreimatei, @knz, @nvanbenschoten, and @tbg)

docs/tech-notes/txn_coord_sender.md, line 413 at r1 (raw file):

Previously, knz (kena) wrote…

Your prompt invited me to investigate more and I found out the situation is more complicated than that. There's in fact two categories of abort errors.

Are there? I'm not sure what you're referring to here, I think it's wrong. AOST read under GC doesn't have anything to do with a TransactionAbortedError, does it?
TransactionAbortedError means one thing - some other transaction blew away some of our intents.

docs/tech-notes/txn_coord_sender.md, line 417 at r2 (raw file):

     internal CRDB mechanism, but not due to a logical
     error. Intuitively, this corresponds to errors due to the
     distributed nature of CRDB which would not occur in a

Deadlock between multiple transactions is the case where transaction aborts happen because of "user fault". That's analogous to single-node systems - there too sometimes transactions are aborted and forced to release their locks.

knz · 2019-11-26T16:26:54Z

Can you suggest an extension to the text to make it better?

knz requested review from andreimatei and nvanbenschoten November 1, 2019 16:57

knz force-pushed the 20191101-tcs-tech-note branch from 6188c65 to b65e00a Compare November 1, 2019 17:06

tbg approved these changes Nov 4, 2019

View reviewed changes

andreimatei approved these changes Nov 4, 2019

View reviewed changes

tbg reviewed Nov 5, 2019

View reviewed changes

docs: new tech note on TxnCoordSender

ee398a5

Release note: None

knz force-pushed the 20191101-tcs-tech-note branch from b65e00a to ee398a5 Compare November 26, 2019 15:00

knz commented Nov 26, 2019

View reviewed changes

craig bot merged commit ee398a5 into cockroachdb:master Nov 26, 2019

knz deleted the 20191101-tcs-tech-note branch November 26, 2019 15:54

andreimatei reviewed Nov 26, 2019

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: new tech note on TxnCoordSender #42116

docs: new tech note on TxnCoordSender #42116

knz commented Nov 1, 2019 •

edited

Loading

cockroach-teamcity commented Nov 1, 2019

knz commented Nov 1, 2019

knz commented Nov 1, 2019

tbg left a comment

knz commented Nov 4, 2019

andreimatei left a comment

tbg left a comment

knz left a comment

knz commented Nov 26, 2019

craig bot commented Nov 26, 2019

andreimatei left a comment

knz commented Nov 26, 2019

docs: new tech note on TxnCoordSender #42116

docs: new tech note on TxnCoordSender #42116

Conversation

knz commented Nov 1, 2019 • edited Loading

cockroach-teamcity commented Nov 1, 2019

knz commented Nov 1, 2019

knz commented Nov 1, 2019

tbg left a comment

Choose a reason for hiding this comment

knz commented Nov 4, 2019

andreimatei left a comment

Choose a reason for hiding this comment

tbg left a comment

Choose a reason for hiding this comment

knz left a comment

Choose a reason for hiding this comment

knz commented Nov 26, 2019

craig bot commented Nov 26, 2019

Build succeeded

andreimatei left a comment

Choose a reason for hiding this comment

knz commented Nov 26, 2019

knz commented Nov 1, 2019 •

edited

Loading