Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Integrate READ COMMITTED isolation with wait queue based Wait-on-Conflict concurrency control to detect deadlocks and for higher performance #13211

Closed
pkj415 opened this issue Jul 7, 2022 · 1 comment
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature pg-compatibility Label for issues that result in differences b/w YSQL and Pg semantics priority/medium Medium priority issue

Comments

@pkj415
Copy link
Contributor

pkj415 commented Jul 7, 2022

Jira Link: DB-2879

Description

Currently READ COMMITTED isolation achieves Wait-on-Conflict concurrency control semantics by retrying
conflicting statements internally with exponential backoff. Savepoints are leveraged to
clean-up any partial work done by the statement before retrying.

There is no deadlock detection currently when two transactions like below are deadlocked:

Txn1 Txn2
update ... k=1;
update ... k=2;
update ... k=2;
update ... k=1;

The latter two UPDATEs in both transactions will retry indefinitely since the transactions
are deadlocked.

Once we integrate the READ COMMITTED isolation level with the wait-queue based
Wait-on-Conflict concurrency control in #5680, deadlock detection will be performed as part of the wait
queues.

Details -

READ COMMITTED isolation follows Wait-on-Conflict concurrency control semantics by retrying a
query indefinitely on kConflict errors with exponential backoff. Once all
conflicting transactions end, the next retry of the query will successfully run.
The exponential backoff delay between retries is to ensure that the backend
doesn't overwhelm the system by retrying in a tight loop.

D17304 (dc81106) added the new wait queue based
implementation for Wait-on-Conflict concurrency control which can be used if the tserver gflag
enable_wait_queues is set to true. In this case, a read/ write rpc from
a YSQL backend to the transaction participant is blocked if there are conflicts
detected on the participant and unblocked once all conflicting transactions have
completed (either committed or aborted). There are 2 scenarios possible once
the rpc is unblocked -

(1) It is still conflicting because some transaction has committed which made a
conflicting modification to the data.
(2) No transaction with a conflicting modification to the data has committed.

For case (1), the kConflict error is returned to the YSQL backend. In (2), the
rpc makes progress and returns the appropriate result to the YQSL backend. The
waiting is transparent to the YSQL backend (i.e., not differentiable from a rpc
which wasn't blocked).

So, if enable_wait_queues is set, there is no need for a
READ COMMITTED transaction to sleep before retrying the query (because a
kConflict error, if at all [with case (1)] is sent only after all conflicting
transactions have ended).

@pkj415 pkj415 added area/ysql Yugabyte SQL (YSQL) status/awaiting-triage Issue awaiting triage labels Jul 7, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Jul 7, 2022
@pkj415 pkj415 self-assigned this Jul 7, 2022
@pkj415 pkj415 added pg-compatibility Label for issues that result in differences b/w YSQL and Pg semantics and removed kind/bug This issue is a bug status/awaiting-triage Issue awaiting triage labels Jul 7, 2022
@yugabyte-ci yugabyte-ci added the kind/bug This issue is a bug label Aug 16, 2022
@pkj415 pkj415 changed the title [YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking [YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking to avoid deadlocks and higher performance Sep 3, 2022
@pkj415 pkj415 changed the title [YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking to avoid deadlocks and higher performance [YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking to detect deadlocks and higher performance Sep 5, 2022
@yugabyte-ci yugabyte-ci added kind/enhancement This is an enhancement of an existing feature and removed kind/bug This issue is a bug labels Sep 5, 2022
@polarweasel
Copy link
Contributor

Before you close this one @pkj415 please make sure to remove the note in the docs on the Read Committed architecture page.

pkj415 added a commit that referenced this issue Nov 29, 2022
Summary:
READ COMMITTED isolation provides blocking i.e., waiting semantics during
transaction conflicts by retrying a query indefinitely on kConflict errors with
exponential backoff. Once all conflicting transactions end, the next retry of
the query will successfully run. The exponential backoff delay between retries
is to ensure that the YSQL backend doesn't overwhelm the system by retrying
in a tight loop.

D17304 (dc81106) added the new wait
queue based implementation which can be used if the tserver gflag
enable_wait_queues is set to true. In this case, a read/ write rpc from
a YSQL backend to the transaction participant is blocked if there are conflicts
detected on the participant and unblocked once all conflicting transactions have
completed (either committed or aborted). There are 2 scenarios possible once
the rpc is unblocked -

(1) It is still conflicting because some transaction has committed which made a
conflicting modification to the data.
(2) No transaction with a conflicting modification to the data has committed.

For case (1), the kConflict error is returned to the YSQL backend. In (2), the
rpc makes progress and returns the appropriate result to the YQSL backend. The
waiting is transparent to the YSQL backend (i.e., not differentiable from a rpc
which wasn't blocked).

So, if enable_wait_queues is set, there is no need for a READ COMMITTED
transaction to sleep before retrying the query (because a kConflict error, if at
all [with case (1)] is sent only after all conflicting transactions have ended).

NOTE: REPEATABLE READ and SERIALIZABLE isolation levels also retry with
exponential backoff when kConflict errors occur in the first statement of a
transaction. This too is not needed if enable_wait_queues is true. This diff
ensures that as well.

Other miscellaneous changes -

(1) SKIP LOCKED wasn't working for single shard transactions earlier, fixed it.

(2) Changed naming at most places except docs from "pessimistic" and "optimistic"
locking to "Wait-on-Conflict" and "Fail-on-Conflict". This is an effort to move away
from the words "pessimistic" and "optimistic" since they are wrongly defined and
used in internal discussions and their meanings cause confusion in external
discussion. For example, when we say optimistic, people think we mean
"optimistic concurrency control" which is widely known in literature and
industry (like here - https://people.eecs.berkeley.edu/~fox/summaries/database/optimistic_concurrency.html)

Created GitHub issue #14935 to situations where nodes can have different values for
enable_wait_queues during the rolling restart.

Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegressWithWaitQueues
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApplyWithWaitQueues

Reviewers: tvesely, sergei, rsami

Reviewed By: rsami

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D20974
pkj415 added a commit that referenced this issue Nov 30, 2022
… wait queues

Summary:
READ COMMITTED isolation provides blocking i.e., waiting semantics during
transaction conflicts by retrying a query indefinitely on kConflict errors with
exponential backoff. Once all conflicting transactions end, the next retry of
the query will successfully run. The exponential backoff delay between retries
is to ensure that the YSQL backend doesn't overwhelm the system by retrying
in a tight loop.

D17304 (dc81106) added the new wait
queue based implementation which can be used if the tserver gflag
enable_wait_queues is set to true. In this case, a read/ write rpc from
a YSQL backend to the transaction participant is blocked if there are conflicts
detected on the participant and unblocked once all conflicting transactions have
completed (either committed or aborted). There are 2 scenarios possible once
the rpc is unblocked -

(1) It is still conflicting because some transaction has committed which made a
conflicting modification to the data.
(2) No transaction with a conflicting modification to the data has committed.

For case (1), the kConflict error is returned to the YSQL backend. In (2), the
rpc makes progress and returns the appropriate result to the YQSL backend. The
waiting is transparent to the YSQL backend (i.e., not differentiable from a rpc
which wasn't blocked).

So, if enable_wait_queues is set, there is no need for a READ COMMITTED
transaction to sleep before retrying the query (because a kConflict error, if at
all [with case (1)] is sent only after all conflicting transactions have ended).

NOTE: REPEATABLE READ and SERIALIZABLE isolation levels also retry with
exponential backoff when kConflict errors occur in the first statement of a
transaction. This too is not needed if enable_wait_queues is true. This diff
ensures that as well.

Other miscellaneous changes -

(1) SKIP LOCKED wasn't working for single shard transactions earlier, fixed it.

(2) Changed naming at most places except docs from "pessimistic" and "optimistic"
locking to "Wait-on-Conflict" and "Fail-on-Conflict". This is an effort to move away
from the words "pessimistic" and "optimistic" since they are wrongly defined and
used in internal discussions and their meanings cause confusion in external
discussion. For example, when we say optimistic, people think we mean
"optimistic concurrency control" which is widely known in literature and
industry (like here - https://people.eecs.berkeley.edu/~fox/summaries/database/optimistic_concurrency.html)

Created GitHub issue #14935 to situations where nodes can have different values for
enable_wait_queues during the rolling restart.

Original commit: 80a34c0 / D20974

Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegressWithWaitQueues
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApplyWithWaitQueues

Reviewers: tvesely, sergei, rsami

Reviewed By: rsami

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D21411
@pkj415 pkj415 closed this as completed Nov 30, 2022
jayant07-yb pushed a commit to jayant07-yb/yugabyte-db that referenced this issue Dec 7, 2022
…ueues

Summary:
READ COMMITTED isolation provides blocking i.e., waiting semantics during
transaction conflicts by retrying a query indefinitely on kConflict errors with
exponential backoff. Once all conflicting transactions end, the next retry of
the query will successfully run. The exponential backoff delay between retries
is to ensure that the YSQL backend doesn't overwhelm the system by retrying
in a tight loop.

D17304 (dc81106) added the new wait
queue based implementation which can be used if the tserver gflag
enable_wait_queues is set to true. In this case, a read/ write rpc from
a YSQL backend to the transaction participant is blocked if there are conflicts
detected on the participant and unblocked once all conflicting transactions have
completed (either committed or aborted). There are 2 scenarios possible once
the rpc is unblocked -

(1) It is still conflicting because some transaction has committed which made a
conflicting modification to the data.
(2) No transaction with a conflicting modification to the data has committed.

For case (1), the kConflict error is returned to the YSQL backend. In (2), the
rpc makes progress and returns the appropriate result to the YQSL backend. The
waiting is transparent to the YSQL backend (i.e., not differentiable from a rpc
which wasn't blocked).

So, if enable_wait_queues is set, there is no need for a READ COMMITTED
transaction to sleep before retrying the query (because a kConflict error, if at
all [with case (1)] is sent only after all conflicting transactions have ended).

NOTE: REPEATABLE READ and SERIALIZABLE isolation levels also retry with
exponential backoff when kConflict errors occur in the first statement of a
transaction. This too is not needed if enable_wait_queues is true. This diff
ensures that as well.

Other miscellaneous changes -

(1) SKIP LOCKED wasn't working for single shard transactions earlier, fixed it.

(2) Changed naming at most places except docs from "pessimistic" and "optimistic"
locking to "Wait-on-Conflict" and "Fail-on-Conflict". This is an effort to move away
from the words "pessimistic" and "optimistic" since they are wrongly defined and
used in internal discussions and their meanings cause confusion in external
discussion. For example, when we say optimistic, people think we mean
"optimistic concurrency control" which is widely known in literature and
industry (like here - https://people.eecs.berkeley.edu/~fox/summaries/database/optimistic_concurrency.html)

Created GitHub issue yugabyte#14935 to situations where nodes can have different values for
enable_wait_queues during the rolling restart.

Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#isolationRegressWithWaitQueues
./yb_build.sh --java-test org.yb.pgsql.TestPgIsolationRegress#withDelayedTxnApplyWithWaitQueues

Reviewers: tvesely, sergei, rsami

Reviewed By: rsami

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D20974
@pkj415 pkj415 changed the title [YSQL] Integrate READ COMMITTED isolation with wait queue based pessimistic locking to detect deadlocks and higher performance [YSQL] Integrate READ COMMITTED isolation with wait queue based Wait-on-Conflict concurrency control to detect deadlocks and for higher performance Jan 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/enhancement This is an enhancement of an existing feature pg-compatibility Label for issues that result in differences b/w YSQL and Pg semantics priority/medium Medium priority issue
Projects
Status: Done
Development

No branches or pull requests

3 participants