-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[YSQL] Integrate READ COMMITTED isolation with wait queue based Wait-on-Conflict concurrency control to detect deadlocks and for higher performance #13211
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
pg-compatibility
Label for issues that result in differences b/w YSQL and Pg semantics
priority/medium
Medium priority issue
Comments
pkj415
added
area/ysql
Yugabyte SQL (YSQL)
status/awaiting-triage
Issue awaiting triage
labels
Jul 7, 2022
yugabyte-ci
added
kind/bug
This issue is a bug
priority/medium
Medium priority issue
labels
Jul 7, 2022
pkj415
added
pg-compatibility
Label for issues that result in differences b/w YSQL and Pg semantics
and removed
kind/bug
This issue is a bug
status/awaiting-triage
Issue awaiting triage
labels
Jul 7, 2022
pkj415
changed the title
[YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking
[YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking to avoid deadlocks and higher performance
Sep 3, 2022
pkj415
changed the title
[YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking to avoid deadlocks and higher performance
[YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking to detect deadlocks and higher performance
Sep 5, 2022
yugabyte-ci
added
kind/enhancement
This is an enhancement of an existing feature
and removed
kind/bug
This issue is a bug
labels
Sep 5, 2022
Before you close this one @pkj415 please make sure to remove the note in the docs on the Read Committed architecture page. |
pkj415
added a commit
that referenced
this issue
Nov 29, 2022
Summary: READ COMMITTED isolation provides blocking i.e., waiting semantics during transaction conflicts by retrying a query indefinitely on kConflict errors with exponential backoff. Once all conflicting transactions end, the next retry of the query will successfully run. The exponential backoff delay between retries is to ensure that the YSQL backend doesn't overwhelm the system by retrying in a tight loop. D17304 (dc81106) added the new wait queue based implementation which can be used if the tserver gflag enable_wait_queues is set to true. In this case, a read/ write rpc from a YSQL backend to the transaction participant is blocked if there are conflicts detected on the participant and unblocked once all conflicting transactions have completed (either committed or aborted). There are 2 scenarios possible once the rpc is unblocked - (1) It is still conflicting because some transaction has committed which made a conflicting modification to the data. (2) No transaction with a conflicting modification to the data has committed. For case (1), the kConflict error is returned to the YSQL backend. In (2), the rpc makes progress and returns the appropriate result to the YQSL backend. The waiting is transparent to the YSQL backend (i.e., not differentiable from a rpc which wasn't blocked). So, if enable_wait_queues is set, there is no need for a READ COMMITTED transaction to sleep before retrying the query (because a kConflict error, if at all [with case (1)] is sent only after all conflicting transactions have ended). NOTE: REPEATABLE READ and SERIALIZABLE isolation levels also retry with exponential backoff when kConflict errors occur in the first statement of a transaction. This too is not needed if enable_wait_queues is true. This diff ensures that as well. Other miscellaneous changes - (1) SKIP LOCKED wasn't working for single shard transactions earlier, fixed it. (2) Changed naming at most places except docs from "pessimistic" and "optimistic" locking to "Wait-on-Conflict" and "Fail-on-Conflict". This is an effort to move away from the words "pessimistic" and "optimistic" since they are wrongly defined and used in internal discussions and their meanings cause confusion in external discussion. For example, when we say optimistic, people think we mean "optimistic concurrency control" which is widely known in literature and industry (like here - https://people.eecs.berkeley.edu/~fox/summaries/database/optimistic_concurrency.html) Created GitHub issue #14935 to situations where nodes can have different values for enable_wait_queues during the rolling restart. Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegressWithWaitQueues ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApplyWithWaitQueues Reviewers: tvesely, sergei, rsami Reviewed By: rsami Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D20974
pkj415
added a commit
that referenced
this issue
Nov 30, 2022
… wait queues Summary: READ COMMITTED isolation provides blocking i.e., waiting semantics during transaction conflicts by retrying a query indefinitely on kConflict errors with exponential backoff. Once all conflicting transactions end, the next retry of the query will successfully run. The exponential backoff delay between retries is to ensure that the YSQL backend doesn't overwhelm the system by retrying in a tight loop. D17304 (dc81106) added the new wait queue based implementation which can be used if the tserver gflag enable_wait_queues is set to true. In this case, a read/ write rpc from a YSQL backend to the transaction participant is blocked if there are conflicts detected on the participant and unblocked once all conflicting transactions have completed (either committed or aborted). There are 2 scenarios possible once the rpc is unblocked - (1) It is still conflicting because some transaction has committed which made a conflicting modification to the data. (2) No transaction with a conflicting modification to the data has committed. For case (1), the kConflict error is returned to the YSQL backend. In (2), the rpc makes progress and returns the appropriate result to the YQSL backend. The waiting is transparent to the YSQL backend (i.e., not differentiable from a rpc which wasn't blocked). So, if enable_wait_queues is set, there is no need for a READ COMMITTED transaction to sleep before retrying the query (because a kConflict error, if at all [with case (1)] is sent only after all conflicting transactions have ended). NOTE: REPEATABLE READ and SERIALIZABLE isolation levels also retry with exponential backoff when kConflict errors occur in the first statement of a transaction. This too is not needed if enable_wait_queues is true. This diff ensures that as well. Other miscellaneous changes - (1) SKIP LOCKED wasn't working for single shard transactions earlier, fixed it. (2) Changed naming at most places except docs from "pessimistic" and "optimistic" locking to "Wait-on-Conflict" and "Fail-on-Conflict". This is an effort to move away from the words "pessimistic" and "optimistic" since they are wrongly defined and used in internal discussions and their meanings cause confusion in external discussion. For example, when we say optimistic, people think we mean "optimistic concurrency control" which is widely known in literature and industry (like here - https://people.eecs.berkeley.edu/~fox/summaries/database/optimistic_concurrency.html) Created GitHub issue #14935 to situations where nodes can have different values for enable_wait_queues during the rolling restart. Original commit: 80a34c0 / D20974 Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegressWithWaitQueues ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApplyWithWaitQueues Reviewers: tvesely, sergei, rsami Reviewed By: rsami Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D21411
jayant07-yb
pushed a commit
to jayant07-yb/yugabyte-db
that referenced
this issue
Dec 7, 2022
…ueues Summary: READ COMMITTED isolation provides blocking i.e., waiting semantics during transaction conflicts by retrying a query indefinitely on kConflict errors with exponential backoff. Once all conflicting transactions end, the next retry of the query will successfully run. The exponential backoff delay between retries is to ensure that the YSQL backend doesn't overwhelm the system by retrying in a tight loop. D17304 (dc81106) added the new wait queue based implementation which can be used if the tserver gflag enable_wait_queues is set to true. In this case, a read/ write rpc from a YSQL backend to the transaction participant is blocked if there are conflicts detected on the participant and unblocked once all conflicting transactions have completed (either committed or aborted). There are 2 scenarios possible once the rpc is unblocked - (1) It is still conflicting because some transaction has committed which made a conflicting modification to the data. (2) No transaction with a conflicting modification to the data has committed. For case (1), the kConflict error is returned to the YSQL backend. In (2), the rpc makes progress and returns the appropriate result to the YQSL backend. The waiting is transparent to the YSQL backend (i.e., not differentiable from a rpc which wasn't blocked). So, if enable_wait_queues is set, there is no need for a READ COMMITTED transaction to sleep before retrying the query (because a kConflict error, if at all [with case (1)] is sent only after all conflicting transactions have ended). NOTE: REPEATABLE READ and SERIALIZABLE isolation levels also retry with exponential backoff when kConflict errors occur in the first statement of a transaction. This too is not needed if enable_wait_queues is true. This diff ensures that as well. Other miscellaneous changes - (1) SKIP LOCKED wasn't working for single shard transactions earlier, fixed it. (2) Changed naming at most places except docs from "pessimistic" and "optimistic" locking to "Wait-on-Conflict" and "Fail-on-Conflict". This is an effort to move away from the words "pessimistic" and "optimistic" since they are wrongly defined and used in internal discussions and their meanings cause confusion in external discussion. For example, when we say optimistic, people think we mean "optimistic concurrency control" which is widely known in literature and industry (like here - https://people.eecs.berkeley.edu/~fox/summaries/database/optimistic_concurrency.html) Created GitHub issue yugabyte#14935 to situations where nodes can have different values for enable_wait_queues during the rolling restart. Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegressWithWaitQueues ./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApplyWithWaitQueues Reviewers: tvesely, sergei, rsami Reviewed By: rsami Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D20974
pkj415
changed the title
[YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking to detect deadlocks and higher performance
[YSQL] Integrate READ COMMITTED isolation with wait queue based Wait-on-Conflict concurrency control to detect deadlocks and for higher performance
Jan 6, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
area/ysql
Yugabyte SQL (YSQL)
kind/enhancement
This is an enhancement of an existing feature
pg-compatibility
Label for issues that result in differences b/w YSQL and Pg semantics
priority/medium
Medium priority issue
Jira Link: DB-2879
Description
Currently READ COMMITTED isolation achieves Wait-on-Conflict concurrency control semantics by retrying
conflicting statements internally with exponential backoff. Savepoints are leveraged to
clean-up any partial work done by the statement before retrying.
There is no deadlock detection currently when two transactions like below are deadlocked:
Txn1 Txn2
update ... k=1;
update ... k=2;
update ... k=2;
update ... k=1;
The latter two UPDATEs in both transactions will retry indefinitely since the transactions
are deadlocked.
Once we integrate the READ COMMITTED isolation level with the wait-queue based
Wait-on-Conflict concurrency control in #5680, deadlock detection will be performed as part of the wait
queues.
Details -
READ COMMITTED isolation follows Wait-on-Conflict concurrency control semantics by retrying a
query indefinitely on kConflict errors with exponential backoff. Once all
conflicting transactions end, the next retry of the query will successfully run.
The exponential backoff delay between retries is to ensure that the backend
doesn't overwhelm the system by retrying in a tight loop.
D17304 (dc81106) added the new wait queue based
implementation for Wait-on-Conflict concurrency control which can be used if the tserver gflag
enable_wait_queues is set to true. In this case, a read/ write rpc from
a YSQL backend to the transaction participant is blocked if there are conflicts
detected on the participant and unblocked once all conflicting transactions have
completed (either committed or aborted). There are 2 scenarios possible once
the rpc is unblocked -
(1) It is still conflicting because some transaction has committed which made a
conflicting modification to the data.
(2) No transaction with a conflicting modification to the data has committed.
For case (1), the kConflict error is returned to the YSQL backend. In (2), the
rpc makes progress and returns the appropriate result to the YQSL backend. The
waiting is transparent to the YSQL backend (i.e., not differentiable from a rpc
which wasn't blocked).
So, if enable_wait_queues is set, there is no need for a
READ COMMITTED transaction to sleep before retrying the query (because a
kConflict error, if at all [with case (1)] is sent only after all conflicting
transactions have ended).
The text was updated successfully, but these errors were encountered: