-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
storage: propagate errors from contentionQueue, catch stalls in roachtest #37199
Conversation
NOTE: I saw this get stuck once after bumping the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice find! I know you're not planning on merging this yet but
Reviewed 3 of 3 files at r1, 2 of 2 files at r2.
Reviewable status: complete! 1 of 0 LGTMs obtained
I think I tracked down the other stall here. It has to do with a rare interleaving of requests in the
The first two transactions are stuck in the So what's going wrong? Why isn't this resolving itself?If
|
// Determine whether the transaction record could ever actually be written | |
// in the future. We provide the TxnMeta's timestamp (which we read from an | |
// intent) as the upper bound on the transaction's minimum timestamp. This | |
// may be greater than the transaction's actually original epoch-zero | |
// timestamp, in which case we're subjecting ourselves to false positives | |
// where we don't discover that a transaction is uncommittable, but never | |
// false negatives where we think that a transaction is uncommittable even | |
// when it's not and could later complete. |
and
cockroach/pkg/storage/batcheval/cmd_push_txn.go
Lines 161 to 167 in 5c1497a
// Performing this detection could have false positives where we determine | |
// that a record could still be written and conclude that we're in the first | |
// case. However, it cannot have false negatives where we determine that a | |
// record can not be written and conclude that we're in the second or third | |
// case. This is important, because it means that we may end up failing to | |
// push a finalized transaction but will never determine that a transaction | |
// is finalized when it still could end up committing. |
and
cockroach/pkg/storage/replica_tscache.go
Lines 286 to 292 in 5c1497a
// for the provided transaction information. Callers should provide an upper | |
// bound on the transaction's minimum timestamp across all epochs (typically | |
// the original timestamp of its first epoch). If this is not exact then the | |
// method may return false positives (i.e. it determines that the record could | |
// be created when it actually couldn't) but will never return false negatives | |
// (i.e. it will never determine that the record could not be created when it | |
// actually could). |
It turns out that this scenario can cause issues when combined with the dependencies enforced by the contentionQueue
. If the first transaction in the queue falls into this trap of falsely assuming a pushee is not aborted and later transactions don't, the later transactions can get stuck waiting on the first transaction to notice that the pushee has already been aborted. If the pushee itself is stuck in a second contentionQueue
on the same situation then it will never rollback its transaction record and inform the first transaction in the queue that it is indeed aborted. So everything gets stuck. It turns out that it is not ok for some pushees to consider a transaction to be pending while other pushees consider a transaction to be aborted.
The solution to this isn't immediately clear. I see two options at the moment:
- we begin storing the
EpochZeroOrigTimestamp
in intents. This allows all pushers to have perfect accuracy about whether or not a transaction has been aborted by any other pusher. In other words, it allows us to get rid of this nastiness. Interestingly, it would come for free with the proposal to change transaction IDs to be a combination of the epoch-zero-orig-timestamp and a gateway ID. - we go back to always leaving an ABORTED transaction record before a pushee can consider a transaction to be aborted. This feels safer (although there is still come complication around QueryTxn requests that find a missing txn record), but is also a large step backward away from a proposal like this. It also returns to letting pushers create transaction records as long as they are in the ABORTED state.
@tbg @andreimatei I'm curious whether either of you two has a strong preference here.
On the positive side, with QueryTxn
requests updating the TxnWaitQueue
when they observe a transaction to be aborted and with the first of these options implemented, the test seems to pass without any stalls (after about 6 hrs of running).
Great writeup! Before I dive in, I'm curious if any of the stuff @darinpp added to the txnWaitQ helped (#35921) or would've helped figure out where "something" was stuck. (Or, if you repro this scenario again, I'd be grateful if you gave it a try and reported back, in particular the "have been waiting" log message). I just took a look at the contentionQ code to remind myself how it works. Oof. I'd be surprised if this were the last bug.
We query the pushee periodically, but I just realized that this is based on the TxnExpiration, which is artificially high in this test. Just pointing this out for anyone else wondering. I like option 1. We can optimize for the common case in which Timestamp == Epoch0Timestamp by adding the convention that the epoch 0 timestamp defaults to the timestamp to avoid writing more than we do today (if we also decide to make the field nullable). |
The "have been waiting" log message in I also think the reason why the "have been waiting" log message was more useful than the metrics is because the log message actually includes transaction ids. Without that, it's not really possible to come to any conclusions about the state of a deadlock. With that in mind, a "have been waiting" log message in
👍 I was leaning that way myself. It will be really nice to remove this uncertainty. I don't think the migration will be too difficult either, as the new field will need to be optional no matter what. |
👍 that's about as good as I hoped it was going to get. The metrics can't do the debugging for you but looks like they gave you a good starting point to logspy in. |
I like option 1 too. Option 2 - writing more ABORTED txn records - seems indeed that we'd be going backwards. I hope we get to where we never write any ABORTED records. |
Friendly ping, was just looking through our roachtests and this seems like a nice wart to iron out. |
Yep, this is on my plate for after we stabilize parallel commits. The two additional items that I need to polish off are:
|
93cdbb8
to
b3bc9fd
Compare
b3bc9fd
to
541b3e3
Compare
… observed From cockroachdb#37199 (comment): > There actually appear to be two ways for us to get in this deadlock scenario. The first is due to a timestamp cache rotation. Imagine that 8703f023 and ee447be3 queried each other's transaction records after f081497c queried their transaction records. Between this time, nothing was written. However, there was a timestamp cache rotation! Each of these transactions would consider the other aborted because of ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY. If the abort was noticed during a QueryTxnRequest, no one would notify f081497c, who is still sitting in the TxnWaitQueue, and it would be stuck there until the pushee expired. We can fix this case by informing the TxnWaitQueue if a QueryTxnRequest ever decides that a transaction is aborted. Release note: None
541b3e3
to
01294fe
Compare
I've updated this PR to address the first additional item that we need to fix. The second one is larger, so I'm going to put that into a different PR. I've downgraded this to no longer say "Fixes" because it doesn't include the full fix. PTAL @ajwerner. |
01294fe
to
1df7a72
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the great writeup. My mental model on the TxnWaitQueue is still fuzzier than I'd like but this change is straightforward enough. Are there any downsides to notifying all of the pushees upon a single abort?
Reviewed 3 of 5 files at r3, 1 of 2 files at r4, 2 of 2 files at r5, 2 of 3 files at r6, 10 of 10 files at r8.
Reviewable status: complete! 1 of 0 LGTMs obtained
Informs cockroachdb#36089. Before this commit, requests could get stuck repeatedly attempting to push a transaction only to repeatedly find that they themselves were already aborted. The error would not propagate up to the transaction coordinator and the request would get stuck. This commit fixes this behavior by correctly propagating errors observed by the contentionQueue. Release note: None
…hold Informs cockroachdb#36089. This commit bumps the TxnLivenessThreshold for clusters running `kv/contention/nodes=4` to 10 minutes. This is sufficiently large such that if at any point a transaction is abandoned then all other transactions will begin waiting for it and the test will fail to achieve its minimum QPS requirement. Release note: None
…sult Drive by fix. Release note: None
… observed From cockroachdb#37199 (comment): > There actually appear to be two ways for us to get in this deadlock scenario. The first is due to a timestamp cache rotation. Imagine that 8703f023 and ee447be3 queried each other's transaction records after f081497c queried their transaction records. Between this time, nothing was written. However, there was a timestamp cache rotation! Each of these transactions would consider the other aborted because of ABORT_REASON_TIMESTAMP_CACHE_REJECTED_POSSIBLE_REPLAY. If the abort was noticed during a QueryTxnRequest, no one would notify f081497c, who is still sitting in the TxnWaitQueue, and it would be stuck there until the pushee expired. We can fix this case by informing the TxnWaitQueue if a QueryTxnRequest ever decides that a transaction is aborted. Release note: None
The test was already called queue_test.go and package is called txnwait, so the prefix on the name is unnecessary. Only code movement. Release note: None
We had three different defaultHeartbeatInterval constants and it was unclear which values corresponded to which heartbeats without reading their documentation. This commit renames these constants to avoid confusion. Release note: None
Until cockroachdb#36089 is fixed, we know this will fail regularly. This is especially true now that we extended TxnLivenessThreshold. Release note: None
1df7a72
to
82aeef4
Compare
I don't think so. Once the pushee is aborted, all of the pushers would like to know ASAP. Thanks for the review. bors r+ |
37199: storage: propagate errors from contentionQueue, catch stalls in roachtest r=nvanbenschoten a=nvanbenschoten Informs #36089. The PR is split into a series of commits. The first fixes part of a bug that was causing #36089 to fail (thanks to #36748) and the second improves the test to have a more obvious failure condition for this class of bug in the future. The third, fifth, and sixth clean up code. Finally, the fourth fixes another bug that could cause issues with #36089. Before the first commit, requests could get stuck repeatedly attempting to push a transaction only to repeatedly find that they themselves were already aborted. The error would not propagate up to the transaction coordinator and the request would get stuck. This commit fixes this behavior by correctly propagating errors observed by the `contentionQueue`. The second commit bumps the TxnLivenessThreshold for clusters running `kv/contention/nodes=4` to 10 minutes. This is sufficiently large such that if at any point a transaction is abandoned then all other transactions will begin waiting for it, throughput will drop to 0 for 10 straight minutes, and the test will fail to achieve its minimum QPS requirement. The fourth commit instructs pushers in the `txnwait.Queue` to inform all other pushers that are waiting for the same transaction when it observes an ABORTED transaction. I never saw this cause issues with #36089, but it seems very possible that it could given frequent tscache rotations. 38397: sql: deflake TestLogic//crdb_internal/max_retry_counter r=knz a=knz Fixes #38062. Release note: None 38654: exec: Handle NULLS in TopK sorter r=rohany a=rohany This commit fixes NULLs in the TopK sorter by avoiding use of the vec copy method, which has a bug. Instead, we add a set method to the vec comparator, and use the templatized comparator to perform the sets that the TopK sorter needs. To facilitate this, we add an UnsetNull method to the Nulls object. However, use of this method results in HasNull() maybe returning true even if the vector doesn't have nulls. This behavior already occurs when selection vectors are used. Based on discussions with @solongordon and @asubiotto, this behavior is OK, and future PR's will attempt to make this behavior better, and address the bugs within the Vec Copy method. 38725: cli/dump: more clearly inform the user upon tables with no visible columns r=knz a=knz Informs #37768. Informs #28948. This is coming up quite often on support, lately again on gitter and forum https://forum.cockroachlabs.com/t/error-while-dumping-core-backup/2901/3. This PR aims to lessen the burden on support and propose a clear "next action" for the user. Before: ``` kena@kenax ~/cockroach % ./cockroach dump --insecure defaultdb CREATE TABLE t (, FAMILY "primary" (rowid) ); Error: pq: at or near "from": syntax error Failed running "dump" ``` After: ``` kena@kenax ~/cockroach % ./cockroach dump --insecure defaultdb CREATE TABLE t (, FAMILY "primary" (rowid) ); Error: table "defaultdb.public.t" has no visible columns HINT: To proceed with the dump, either omit this table from the list of tables to dump, drop the table, or add some visible columns. -- See: #37768 Failed running "dump" ``` Release note (cli change): `cockroach dump` will now more clearly refer to issue #37768 when it encounters a table with no visible columns, which (currently) cannot be dumped successfully. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com> Co-authored-by: Raphael 'kena' Poss <knz@cockroachlabs.com> Co-authored-by: Rohan Yadav <rohany@alumni.cmu.edu>
Build succeeded |
Fixes cockroachdb#36089. Informs cockroachdb#37199. This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. Release note: None
Fixes cockroachdb#36089. Informs cockroachdb#37199. This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. Release note: None
Fixes cockroachdb#36089. Informs cockroachdb#37199. This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. Release note: None
Fixes cockroachdb#36089. Informs cockroachdb#37199. This commit addresses the second concern from cockroachdb#37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. Release note: None
38782: storage: persist minimum transaction timestamps in intents r=nvanbenschoten a=nvanbenschoten Fixes #36089. Informs #37199. This commit addresses the second concern from #37199 (comment) by implementing its suggestion #1. It augments the TxnMeta struct to begin storing the transaction's minimum timestamp. This allows pushers to have perfect accuracy into whether an intent is part of a transaction that can eventually be committed or whether it has been aborted by any other pusher and uncommittable. This allows us to get rid of the false positive cases where a pusher incorrectly detects that a transaction is still active and begins waiting on it. In this worst case, this could lead to transactions waiting for the entire transaction expiration for a contending txn. @tbg I'm assigning you because you reviewed most of the lazy transaction record stuff (especially #33523), but let me know if you'd like me to find a different reviewer. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Informs #36089.
The PR is split into a series of commits. The first fixes part of a bug that was causing #36089 to fail (thanks to #36748) and the second improves the test to have a more obvious failure condition for this class of bug in the future. The third, fifth, and sixth clean up code. Finally, the fourth fixes another bug that could cause issues with #36089.
Before the first commit, requests could get stuck repeatedly attempting to push a transaction only to repeatedly find that they themselves were already aborted. The error would not propagate up to the transaction coordinator and the request would get stuck. This commit fixes this behavior by correctly propagating errors observed by the
contentionQueue
.The second commit bumps the TxnLivenessThreshold for clusters running
kv/contention/nodes=4
to 10 minutes. This is sufficiently large such that if at any point a transaction is abandoned then all other transactions will begin waiting for it, throughput will drop to 0 for 10 straight minutes, and the test will fail to achieve its minimum QPS requirement.The fourth commit instructs pushers in the
txnwait.Queue
to inform all other pushers that are waiting for the same transaction when it observes an ABORTED transaction. I never saw this cause issues with #36089, but it seems very possible that it could given frequent tscache rotations.