Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[YSQL] Support READ COMMITTED isolation level semantics for DMLs #9468

Closed
pkj415 opened this issue Jul 26, 2021 · 0 comments
Closed

[YSQL] Support READ COMMITTED isolation level semantics for DMLs #9468

pkj415 opened this issue Jul 26, 2021 · 0 comments
Assignees
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue

Comments

@pkj415
Copy link
Contributor

pkj415 commented Jul 26, 2021

Jira Link: DB-3139
Functional spec for full feature - https://docs.google.com/document/d/1bayBT9H0acTFJPLAGcaO5B3MKI57Nra_oDxchYKIFTI
Design doc - https://docs.google.com/document/d/1yqnYJDYjotQXBzhe0kwxfB7c-PmAsrDM-Rs6uAVRgOU/edit

Status Task
Use a new ConsistentReadPoint (i.e., use current time) for every statement in a read committed txn.
Handle kReadRestart error to ensure it is never thrown to the user (unless a statement's output exceeds ysql_output_buffer_size (gflag which has a default of 256KB).
Handle kConflict error to ensure it is never thrown to the user (unless a statement's output exceeds ysql_output_buffer_size (gflag which has a default of 256KB). This will implicitly cover a basic implementation of pessimistic locking exclusively for READ COMMITTED isolation. #5680 tracks a more advanced pessimistic locking implementation that also works for other isolation levels.
@pkj415 pkj415 self-assigned this Jul 26, 2021
@pkj415 pkj415 added the area/ysql Yugabyte SQL (YSQL) label Jul 26, 2021
pkj415 added a commit to pkj415/yugabyte-db that referenced this issue Nov 11, 2021
Summary:
Support initial part of READ COMMITTED isolation level by using a new ConsistentReadPoint for every query in a read committed txn. The read point will be set to the current hybrid time on the txn manager (i.e., postgres)

Test Plan:
./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation
./yb_build.sh --java-test org.yb.pgsql.TestPgTransparentRestarts

Reviewers: dmitry

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D13794
@pkj415 pkj415 changed the title [YSQL] Support READ COMMITTED isolation level with optimistic locking [YSQL] Support READ COMMITTED isolation level Nov 11, 2021
pkj415 added a commit that referenced this issue Nov 11, 2021
Summary:
Support initial part of READ COMMITTED isolation level by using a new
ConsistentReadPoint for every statement in a read committed txn. The read point
will be set to the current hybrid time on the txn manager (i.e., postgres).

The feature is guarded under the tserver gflag yb_enable_read_committed_isolation -
1. A false value implies the existing behaviour of internally mapping "read committed"
   to "repeatable read".
2. A true value means that we treat "read committed" as a separate isolation
   level with the correct expected semantics.

NOTE: To ensure we don't reset read point to current time in case we are in a
kReadRestart retry, a new field "recently_restarted_read_point_" is introduced
in consistent_read_point.h. (An alternate solution involves two steps - i) not
restarting read point during restart wrapper and then ii) restarting in
StartTransactionCommand(). This leads to too much code change and complexity.
The new field helps get rid of that complexity).

Test Plan:
Jenkins: urgent

./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation
./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadCommittedEnabledEnvVarCaching
./yb_build.sh --java-test org.yb.pgsql.TestPgTransparentRestarts

Reviewers: kgupta, smishra, dsrinivasan, alex, mihnea

Reviewed By: mihnea

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D13794
pkj415 added a commit that referenced this issue Nov 11, 2021
Summary:
Support initial part of READ COMMITTED isolation level by using a new
ConsistentReadPoint for every statement in a read committed txn. The read point
will be set to the current hybrid time on the txn manager (i.e., postgres).

The feature is guarded under the tserver gflag yb_enable_read_committed_isolation -
1. A false value implies the existing behaviour of internally mapping "read committed"
   to "repeatable read".
2. A true value means that we treat "read committed" as a separate isolation
   level with the correct expected semantics.

NOTE: To ensure we don't reset read point to current time in case we are in a
kReadRestart retry, a new field "recently_restarted_read_point_" is introduced
in consistent_read_point.h. (An alternate solution involves two steps - i) not
restarting read point during restart wrapper and then ii) restarting in
StartTransactionCommand(). This leads to too much code change and complexity.
The new field helps get rid of that complexity).

Original diff: https://phabricator.dev.yugabyte.com/D13794, 0e0dcde

Test Plan:
Jenkins: urgent, rebase: 2.11.0

./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation
./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadCommittedEnabledEnvVarCaching
./yb_build.sh --java-test org.yb.pgsql.TestPgTransparentRestarts

Reviewers: dsrinivasan, mihnea

Reviewed By: mihnea

Subscribers: jenkins-bot, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D13880
pkj415 added a commit to pkj415/yugabyte-db that referenced this issue Dec 2, 2021
pkj415 added a commit that referenced this issue Dec 28, 2021
…tement in READ COMMITTED isolation (Part-2)

Summary:
For a REPEATABLE READ isolation transaction, the read point is picked as the
current hybrid time and the transaction's reads are supposed to include all
other transactions' data that committed before this transaction was issued. Any
transaction that committed before this transaction, might have a commit time of
at most this txn's start time (as seen on the txn manager) + max clock skew.
This upper bound of max clock skew + transaction start time is called
global_limit. To be precise, we use the read point (chosen based on current
hybrid time) as the txn start time to compute the upper bound of glolal_limit.
Data written by other txns with commit time within global_limit can be of two
types -

(1) the data committed before this transaction was issued but still has a commit
time greater than read point. But since we want to read everything committed
before this txn was issued, one way to proceed is to transparently shift the
read point of the txn to a read point >= the commit time of such data, re-read
at the tablet server and send the new read point to the query layer for using as
the txn read point. But in case the txn had already read some data using the old
read point from other tablet servers, transparent shifting of the read point is
not possible at the tablet server and a kReadRestart error is thrown to the
query layer.

(2) the data committed after this transaction was issued, and can be ignored. But
there is no way to differentiate between such data and data of type 1. So the
tablet server takes a conservative approach and tries to either transparently
shift the read point as explained above, or if that isn't possible, throws a
kReadRestart error to the query layer.

There is an optimization that still helps us differentiate between type 1 and
type 2 in a special case and avoid shifting read point/ throwing kReadRestart:
the committed data has a commit time within global_limit but the tablet server
knows that the commit was after this txn was issued because the corresponding
intents were written to intents db after the current transaction was issued
(based on optimization in c784595). This is checked by comparing the encoded
intent write time in committed entries with the local_limit of the txn.
local_limit is a per tablet server limit and chosen as the safe time on the
first rpc to the tablet server as part of a txn.

So, barring the optimization, if a tablet server sees data ahead of read point
but within global_limit, it will either transparently shift the read point ahead
and inform the query layer about it (if no earlier reads have been performed by
the query layer at other tablet servers) (OR) throw a kReadRestart error
to the query layer. A tablet server might find data with commit time that is
within global_limit either in regular db or in intents db (if the data was
written as part of a recently committed txn whose intents are yet to move to
regular db).

Apart from the transparent shifting of read point, a second level of transparent
retries exists the query layer: In case the tablet server throws a kReadRestart
error, the query layer either forwards it to the external client or inturn
transparently retries the whole txn if no response data has been sent to the
client, by picking a later read point called "restart read point" that is later
enough to include all such commit times of transactions with data of type 1/2.
Retry at a "restart read point" isn't possible if any data has been sent to the
client as part of the txn because that might change older response data. Note
that transparent retries, if at all, can only be done in the first statement
since completion of a statement results in surely sending some data to the
external client.

One fact to note is: the first read of a key can result in kReadRestart, but
further reads of the same key via later rpcs to the same tablet server can
result in kReadRestart only due to data with commit time after read point but
within global_limit such that -

(1) it was committed after the first rpc. Because if it had committed before the
first rpc, a kReadRestart would have been thrown in the first rpc resulting in
the new restart read point to be >= all commit timestamps as seen in the first
rpc.

(2) the intent of the key was written as part of the committed txn before the
first rpc (to be precise, before the local_limit that is picked as part of the
first rpc). This is because, other txns' committed data with intents written
after the local_limit won't result in kReadRestart given the optimization above.

In other words, narrower requirements are to be met for a kReadRestart to occur
in read of a key after the first read. So, chances of a kReadRestart due to read
of a given key, decrease after the first rpc that reads that key.

In a READ COMMITTED isolation transaction, a new read point is picked for each
statement based on the current hybrid time. Each statement is supposed to
include all transactions that commit before the statement is issued. Due to
this, all of the above discussion now applies on a per statement level.
Even the fact above changes to: '"for each statement", the first read of a key
can result in kReadRestart, but further reads of the same key....'. This
increases the chances of kReadRestart in a single transaction. This can be
resolved by transparently retrying kReadRestart errors for each statement in the
query layer in case no data has been sent to the client for that statement (we
are allowed to do this in READ COMMITTED instead of worrying if any data has
been sent to the client for the txn because data sent before this statement had
an older read point and hence older responses wouldn't change).

Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgTransparentRestarts

Reviewers: alex

Reviewed By: alex

Subscribers: sergei, mihnea, yql

Differential Revision: https://phabricator.dev.yugabyte.com/D14397
pkj415 added a commit to pkj415/yugabyte-db that referenced this issue Feb 18, 2022
…n READ COMMITTED isolation (Part-3)

Summary:
In this third part, we ensure that we don't throw kConflict errors to external
ysql clients when using READ COMMITTED isolation level. We do this by -

1. Re-executing a statement when kConflict is seen: this is done by leveraging
   savepoints. An internal savepoint is created before execution of every
   statement, which is rolled back to on facing a kConflict. This helps get rid
   of any provisional writes that where written by the statement before the
   conflict and hence are no longer valid.

   The statement is retried indefinitely till statement timeout with configurable
   exponential backoff. This gives a feeling that pessimistic locking is also
   in place. Note that we also lazily rely only on the statement timeout to get
   rid of deadlocks, without proactively detecting them with a distributed
   deadlock detection algorithm. That will be come in as a separate improvement
   with pessimistic locking.

2. Using the highest priority for READ COMMITTED txns: this helps ensure that no
   other txns can abort a txn.

Test Plan:
Enabled Postgres's existing eval-plan-qual isolation test with appropriate
modifications for disable cases that require features yet to be implemented on
YB.

Added a bunch of new tests from the functional spec as well:
src/test/isolation/specs/yb_pb_eval-plan-qual.spec
src/test/isolation/specs/yb_read_committed_insert.spec
src/test/isolation/specs/yb_read_committed_test_internal_savepoint.spec
src/test/isolation/specs/yb_read_committed_update_and_explicit_locking.spec

Reviewers: mihnea, alex, rsami

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15383
pkj415 added a commit that referenced this issue Feb 24, 2022
…OMMITTED isolation (Part-3)

Summary:
In this third part, we ensure that we don't throw kConflict errors to external
ysql clients when using READ COMMITTED isolation level. We do this by -

(1) Re-executing a statement when kConflict is seen: this is done by leveraging
savepoints. An internal savepoint is created before execution of every
statement, which is rolled back to on facing a kConflict. This helps get rid
of any provisional writes that where written by the statement before the
conflict and hence are no longer valid.

The statement is retried indefinitely until statement timeout with configurable
exponential backoff. This gives a feeling that pessimistic locking is also
in place. Note that we also lazily rely only on the statement timeout to get
rid of deadlocks, without proactively detecting them with a distributed
deadlock detection algorithm. That will be come in as a separate improvement
with pessimistic locking.

(2) Using the highest priority for READ COMMITTED txns: this helps ensure that
no other txns can abort a READ COMMITTED txn. Even other READ COMMITTED
txns can't.

Test Plan:
Jenkins: urgent

Enabled Postgres's existing eval-plan-qual isolation test with appropriate
modifications to disable cases that require features yet to be implemented on
YB.

Added a bunch of new tests from the functional spec as well:
src/test/isolation/specs/yb_pb_eval-plan-qual.spec
src/test/isolation/specs/yb_read_committed_insert.spec
src/test/isolation/specs/yb_read_committed_test_internal_savepoint.spec
src/test/isolation/specs/yb_read_committed_update_and_explicit_locking.spec

Reviewers: mihnea, alex, rsami, mtakahara

Reviewed By: rsami, mtakahara

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15383
pkj415 added a commit that referenced this issue Feb 25, 2022
…errors in READ COMMITTED isolation (Part-3)

Summary:
In this third part, we ensure that we don't throw kConflict errors to external
ysql clients when using READ COMMITTED isolation level. We do this by -

(1) Re-executing a statement when kConflict is seen: this is done by leveraging
savepoints. An internal savepoint is created before execution of every
statement, which is rolled back to on facing a kConflict. This helps get rid
of any provisional writes that where written by the statement before the
conflict and hence are no longer valid.

The statement is retried indefinitely until statement timeout with configurable
exponential backoff. This gives a feeling that pessimistic locking is also
in place. Note that we also lazily rely only on the statement timeout to get
rid of deadlocks, without proactively detecting them with a distributed
deadlock detection algorithm. That will be come in as a separate improvement
with pessimistic locking.

(2) Using the highest priority for READ COMMITTED txns: this helps ensure that
no other txns can abort a READ COMMITTED txn. Even other READ COMMITTED
txns can't.

Original commit: https://phabricator.dev.yugabyte.com/D15383, TBD

Test Plan:
Jenklins: urgent, rebase: 2.12

Enabled Postgres's existing eval-plan-qual isolation test with appropriate
modifications to disable cases that require features yet to be implemented on
YB.

Added a bunch of new tests from the functional spec as well:
src/test/isolation/specs/yb_pb_eval-plan-qual.spec
src/test/isolation/specs/yb_read_committed_insert.spec
src/test/isolation/specs/yb_read_committed_test_internal_savepoint.spec
src/test/isolation/specs/yb_read_committed_update_and_explicit_locking.spec

Reviewers: mihnea, alex, rsami, mtakahara

Reviewed By: mtakahara

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15571
@pkj415 pkj415 closed this as completed Mar 7, 2022
jayant07-yb pushed a commit to jayant07-yb/yugabyte-db that referenced this issue Mar 8, 2022
…n READ COMMITTED isolation (Part-3)

Summary:
In this third part, we ensure that we don't throw kConflict errors to external
ysql clients when using READ COMMITTED isolation level. We do this by -

(1) Re-executing a statement when kConflict is seen: this is done by leveraging
savepoints. An internal savepoint is created before execution of every
statement, which is rolled back to on facing a kConflict. This helps get rid
of any provisional writes that where written by the statement before the
conflict and hence are no longer valid.

The statement is retried indefinitely until statement timeout with configurable
exponential backoff. This gives a feeling that pessimistic locking is also
in place. Note that we also lazily rely only on the statement timeout to get
rid of deadlocks, without proactively detecting them with a distributed
deadlock detection algorithm. That will be come in as a separate improvement
with pessimistic locking.

(2) Using the highest priority for READ COMMITTED txns: this helps ensure that
no other txns can abort a READ COMMITTED txn. Even other READ COMMITTED
txns can't.

Test Plan:
Jenkins: urgent

Enabled Postgres's existing eval-plan-qual isolation test with appropriate
modifications to disable cases that require features yet to be implemented on
YB.

Added a bunch of new tests from the functional spec as well:
src/test/isolation/specs/yb_pb_eval-plan-qual.spec
src/test/isolation/specs/yb_read_committed_insert.spec
src/test/isolation/specs/yb_read_committed_test_internal_savepoint.spec
src/test/isolation/specs/yb_read_committed_update_and_explicit_locking.spec

Reviewers: mihnea, alex, rsami, mtakahara

Reviewed By: rsami, mtakahara

Subscribers: yql

Differential Revision: https://phabricator.dev.yugabyte.com/D15383
@pkj415 pkj415 changed the title [YSQL] Support READ COMMITTED isolation level [YSQL] Support READ COMMITTED isolation level semantics for DMLs Aug 9, 2022
@yugabyte-ci yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 9, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area/ysql Yugabyte SQL (YSQL) kind/bug This issue is a bug priority/medium Medium priority issue
Projects
Status: Done
Development

No branches or pull requests

2 participants