Skip to content

Commit

Permalink
[BACKPORT 2.20]Re-introduce: [#14165] DocDB: Rollback only the newest…
Browse files Browse the repository at this point in the history
… transaction in a deadlock

Summary:
Original commit: e8e7394 / D32542
Note: this revision was originally reverted in 9e07d79, as it caused a legitimate bug in serializable workloads which was originally masked with a test change. In particular, we could get into a situation where a single transaction ends up creating multiple deadlocks at the same time, but since we were only aborting the youngest txn with the original change, and only processing one deadlock per probe, if the offending transaction was not the youngest transaction in the first deadlock detected, it would not be aborted, and the second deadlock would be ignored until the coordinator triggered another probe up to 60s later.

We are re-introducing the change here with the test change removed to un-mask the original issue, along with the following fixes:
1. When processing probes, do not drop ProbeResponses from old versions which may not have the deadlock field populated. This was a bug in the initial implementation that was unrelated to the revert
2. If a deadlock is detected, and it is determined that a txn other than the originating txn should be aborted, trigger another probe for the originating transaction in case it was involved in another deadlock which would be later discovered (and ignored) by the original probe
3. In IsAnySubtxnActive, if the transaction is aborted, return false. This was a bug in the original implementation as well

Original Summary:
Original commit: e8e7394 / D32542

This revision modifies the deadlock detector to abort only the youngest transaction in a detected deadlock.

Previously, each deadlock detector would add simply a txn_id to the deadlock probe response. We now include information about the status tablet and txn age.

Upon receiving a probe response, the probe origin previously would abort the transaction for which the probe was started, which was guaranteed to be local. We now abort the youngest transaction, which may require a remote AbortTransaction RPC.

**Upgrade/Rollback safety:**
This revision adds a new "deadlock" field to ProbeTransactionDeadlockResponsePB which contains a new custom DeadlockedTxnInfo message type. This field is meant to replace the existing deadlocked_txn_ids field. The code in this revision updates both fields and handles both fields to ensure upgrade/downgrade safety without the use of any flags.
Jira: DB-3646

Test Plan: ybd --cxx-test pgwrapper_pg_wait_on_conflict-test --gtest_filter PgWaitQueuesTest.DeadlockResolvesYoungestTxn

Reviewers: bkolagani, pjain

Reviewed By: bkolagani

Subscribers: yql, ybase, bogdan

Tags: #jenkins-ready

Differential Revision: https://phorge.dev.yugabyte.com/D33495
  • Loading branch information
robertsami committed Apr 11, 2024
1 parent 8c84383 commit 2226b7b
Show file tree
Hide file tree
Showing 6 changed files with 302 additions and 43 deletions.
2 changes: 2 additions & 0 deletions src/yb/client/transaction.cc
Original file line number Diff line number Diff line change
Expand Up @@ -1115,6 +1115,8 @@ class YBTransaction::Impl final : public internal::TxnBatcherIf {
} else {
metadata_.start_time = read_point_.Now();
}
// TODO(wait-queues): Consider using metadata_.pg_txn_start_us here for consistency with
// wait queues. https://github.com/yugabyte/yugabyte-db/issues/20976
start_.store(manager_->clock()->Now().GetPhysicalValueMicros(), std::memory_order_release);
}

Expand Down
Loading

0 comments on commit 2226b7b

Please sign in to comment.