Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[docdb] Support transaction promotion for geo-partitioned workloads in use of WaitQueue and DeadlockDetector in Wait-on-Conflict concurrency control #13585

Closed
robertsami opened this issue Aug 11, 2022 · 0 comments
Assignees
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue

Comments

@robertsami
Copy link
Contributor

robertsami commented Aug 11, 2022

Jira Link: DB-3161

@robertsami robertsami added the area/docdb YugabyteDB core features label Aug 11, 2022
@robertsami robertsami self-assigned this Aug 11, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 11, 2022
@robertsami robertsami changed the title [dst] Support transaction promotion for geo-partitioned workloads [dst] Support transaction promotion for geo-partitioned workloads in use of WaitQueue and DeadlockDetector in pessimistic locking Aug 11, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Sep 3, 2022
robertsami added a commit that referenced this issue Sep 16, 2022
…n promotion

Summary:
In workloads where transaction promotion is possible, pessimistic locking and deadlock detection may
not work as expected. This revision disables pessimistic locking in such workloads until their
interaction is fixed.

This revision also renames the flag "enable_pessimistic_locking" to "enable_wait_queue_based_pessimistic_locking" for clarity

Test Plan: Jenkins

Reviewers: esheng, pjain

Reviewed By: pjain

Subscribers: sergei, bogdan

Differential Revision: https://phabricator.dev.yugabyte.com/D19599
robertsami added a commit that referenced this issue Oct 5, 2022
…king is enabled

Summary: In 5c1941d, we disabled pessimistic locking if global transaction promotion was enabled. In this diff, we reverse the precedence. Since pessimistic locking is disabled by default, users who enable it explicitly would expect it to work rather than be silently ignored.

Test Plan: Jenkins

Reviewers: pjain, esheng, rthallam

Reviewed By: pjain, esheng, rthallam

Subscribers: rthallam, bogdan

Differential Revision: http://phabricator.dev.yugabyte.com/D19988
@rthallamko3 rthallamko3 changed the title [dst] Support transaction promotion for geo-partitioned workloads in use of WaitQueue and DeadlockDetector in pessimistic locking [docdb] Support transaction promotion for geo-partitioned workloads in use of WaitQueue and DeadlockDetector in pessimistic locking Oct 5, 2022
@pkj415 pkj415 changed the title [docdb] Support transaction promotion for geo-partitioned workloads in use of WaitQueue and DeadlockDetector in pessimistic locking [docdb] Support transaction promotion for geo-partitioned workloads in use of WaitQueue and DeadlockDetector in Wait-on-Conflict concurrency control Jan 6, 2023
@robertsami robertsami moved this from To do to GA Blocking in Wait-Queue Based Locking Jan 17, 2023
@robertsami robertsami moved this from GA Blocking to In progress in Wait-Queue Based Locking Feb 16, 2023
@robertsami robertsami assigned basavaraj29 and unassigned robertsami Feb 16, 2023
basavaraj29 added a commit that referenced this issue Mar 14, 2023
…oned workloads in use of WaitQueue and DeadlockDetector

Summary:
This diff enables wait-queues and deadlock detection for cross-region transactions.

When wait-queues are enabled and wait-on conflict policy is set, each transaction enters the wait queue if it finds blocker transactions, post undergoing conflict resolution. The waiter txn records blocker_info (txn id, status tablet, conflicting subtxns etc) for each blocking transaction and registers itself with the deadlock detector. All this information is fetched from the tablet's transaction participant.
In case of transaction promotion, all the involved transaction participants are notified of the updated status tablet location. The transaction gets promoted to global successfully only when all of the txn participants acknowledge this update, else the transaction is aborted. If successful, the old status tablet might be in kPending state until the commit time, at which it changes to kAborted state.

In this diff, a `TransactionStausListerner` interface is introduced and that `WaitQueue` implements this interface. The wait-queue is notified on each transaction promotion. When notified, the wait queue submits a task of type `UpdateWaitersOnBlockerPromotion` to a background threadpool. When some thread gets a chance to execute the submitted task, it does the following
- it checks if the transaction promoted was a waiter txn. If so, it is made to re-enter the wait queue by re-running conflict resolution. This ensures that the waiter transaction would definitely fetch the updated status tablet and the same is registered with the deadlock detector.
- it forces all waiter transactions blocking on this promoted transaction to re-enter the wait queue. This ensures that the wait-for dependency is updated with the latest blocker txn's status tablet and the wait-for probes are forwarded to the right txn coordinator.

There might be a control race encountered between a waiter txn entering the queue with the blocker's old status tablet and the wait queue process the promotion signal. The waiter wouldn't renter the queue if the signal gets processed before it entering the queue for the first time, and it follows that the deadlock detector wouldn't be aware of the latest wait-for dependencies. To prevent this, we check with the transaction participant on the latest status tablet of the blocker transaction on inserts to the wait-queue. This is done by acquiring a mutex, which is also acquired by while processing the promotion signal.

Summarizing the changes, txn promotion could result in either a success (which leads to updated status tablet location) or a failure (aborted state)., In either case, the wait queue receives a signal and the waiter transactions waiting on the promoted/aborted blocker re-run conflict resolution (Currently wait-queue periodically polls for txn status, and hence aborted/committed cases are taken care of. Rob is working on a parallel diff where this too is being changed to a notification mechanism). Additionally, there is a backup mechanism in place where we force all waiter transactions in the wait-queue to re-run conflict resolution periodically. It follows that the deadlock detector would be updated with the latest wait-for dependencies periodically.

User facing aspects - Since we don't execute the above logic of transaction promotion in-line with the promotion request, the shouldn't be any noticeable increase in latency for transaction promotion. The only concerning aspect would be that we re-run conflict resolution for the affected transactions when one of their blocker(s) get promoted. But this is necessary for maintaining up to date wait-for dependencies. And since the size of intentsdb should be considerably small, this shouldn't be much of a concern.

Test Plan:
Jenkins
```
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter DeadlockDetectionWithTxnPromotionTest.TestBlockerPromotionWithDeadlock
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter DeadlockDetectionWithTxnPromotionTest.TestBlockerPromotionWithoutDeadlock
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter DeadlockDetectionWithTxnPromotionTest.TestDeadlockAmongstGlobalTransactions
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter DeadlockDetectionWithTxnPromotionTest.TestWaiterPromotionWithoutDeadlock
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter DeadlockDetectionWithTxnPromotionTest.TestWaiterPromotionWithDeadlock
./yb_build.sh --cxx-test pgwrapper_geo_transactions_promotion-test --gtest_filter DeadlockDetectionWithTxnPromotionTest.TestDelayedWaiterRegistrationInWaitQeue
./yb_build.sh --cxx-test pgwrapper_pg_wait_on_conflict-test --gtest_filter PgWaitQueuesTest.TestWaiterTxnReRunConflictResolution
```

Reviewers: esheng, sergei, rsami

Reviewed By: rsami

Subscribers: pjain, jenkins-bot, ybase

Differential Revision: https://phabricator.dev.yugabyte.com/D23215
Wait-Queue Based Locking automation moved this from In progress to Done Mar 14, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/docdb YugabyteDB core features kind/enhancement This is an enhancement of an existing feature priority/medium Medium priority issue
Projects
Status: Done
Development

No branches or pull requests

3 participants