[YSQL] Support READ COMMITTED isolation level semantics for DMLs #9468

pkj415 · 2021-07-26T22:02:32Z

Jira Link: DB-3139
Functional spec for full feature - https://docs.google.com/document/d/1bayBT9H0acTFJPLAGcaO5B3MKI57Nra_oDxchYKIFTI
Design doc - https://docs.google.com/document/d/1yqnYJDYjotQXBzhe0kwxfB7c-PmAsrDM-Rs6uAVRgOU/edit

Status	Task
✅	Use a new ConsistentReadPoint (i.e., use current time) for every statement in a read committed txn.
✅	Handle kReadRestart error to ensure it is never thrown to the user (unless a statement's output exceeds ysql_output_buffer_size (gflag which has a default of 256KB).
✅	Handle kConflict error to ensure it is never thrown to the user (unless a statement's output exceeds ysql_output_buffer_size (gflag which has a default of 256KB). This will implicitly cover a basic implementation of pessimistic locking exclusively for READ COMMITTED isolation. #5680 tracks a more advanced pessimistic locking implementation that also works for other isolation levels.

Summary: Support initial part of READ COMMITTED isolation level by using a new ConsistentReadPoint for every query in a read committed txn. The read point will be set to the current hybrid time on the txn manager (i.e., postgres) Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation ./yb_build.sh --java-test org.yb.pgsql.TestPgTransparentRestarts Reviewers: dmitry Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D13794

Summary: Support initial part of READ COMMITTED isolation level by using a new ConsistentReadPoint for every statement in a read committed txn. The read point will be set to the current hybrid time on the txn manager (i.e., postgres). The feature is guarded under the tserver gflag yb_enable_read_committed_isolation - 1. A false value implies the existing behaviour of internally mapping "read committed" to "repeatable read". 2. A true value means that we treat "read committed" as a separate isolation level with the correct expected semantics. NOTE: To ensure we don't reset read point to current time in case we are in a kReadRestart retry, a new field "recently_restarted_read_point_" is introduced in consistent_read_point.h. (An alternate solution involves two steps - i) not restarting read point during restart wrapper and then ii) restarting in StartTransactionCommand(). This leads to too much code change and complexity. The new field helps get rid of that complexity). Test Plan: Jenkins: urgent ./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation ./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadCommittedEnabledEnvVarCaching ./yb_build.sh --java-test org.yb.pgsql.TestPgTransparentRestarts Reviewers: kgupta, smishra, dsrinivasan, alex, mihnea Reviewed By: mihnea Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D13794

Summary: Support initial part of READ COMMITTED isolation level by using a new ConsistentReadPoint for every statement in a read committed txn. The read point will be set to the current hybrid time on the txn manager (i.e., postgres). The feature is guarded under the tserver gflag yb_enable_read_committed_isolation - 1. A false value implies the existing behaviour of internally mapping "read committed" to "repeatable read". 2. A true value means that we treat "read committed" as a separate isolation level with the correct expected semantics. NOTE: To ensure we don't reset read point to current time in case we are in a kReadRestart retry, a new field "recently_restarted_read_point_" is introduced in consistent_read_point.h. (An alternate solution involves two steps - i) not restarting read point during restart wrapper and then ii) restarting in StartTransactionCommand(). This leads to too much code change and complexity. The new field helps get rid of that complexity). Original diff: https://phabricator.dev.yugabyte.com/D13794, 0e0dcde Test Plan: Jenkins: urgent, rebase: 2.11.0 ./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadPointInReadCommittedIsolation ./yb_build.sh --java-test org.yb.pgsql.TestPgTransactions#testReadCommittedEnabledEnvVarCaching ./yb_build.sh --java-test org.yb.pgsql.TestPgTransparentRestarts Reviewers: dsrinivasan, mihnea Reviewed By: mihnea Subscribers: jenkins-bot, yql Differential Revision: https://phabricator.dev.yugabyte.com/D13880

…olation level

…tement in READ COMMITTED isolation (Part-2) Summary: For a REPEATABLE READ isolation transaction, the read point is picked as the current hybrid time and the transaction's reads are supposed to include all other transactions' data that committed before this transaction was issued. Any transaction that committed before this transaction, might have a commit time of at most this txn's start time (as seen on the txn manager) + max clock skew. This upper bound of max clock skew + transaction start time is called global_limit. To be precise, we use the read point (chosen based on current hybrid time) as the txn start time to compute the upper bound of glolal_limit. Data written by other txns with commit time within global_limit can be of two types - (1) the data committed before this transaction was issued but still has a commit time greater than read point. But since we want to read everything committed before this txn was issued, one way to proceed is to transparently shift the read point of the txn to a read point >= the commit time of such data, re-read at the tablet server and send the new read point to the query layer for using as the txn read point. But in case the txn had already read some data using the old read point from other tablet servers, transparent shifting of the read point is not possible at the tablet server and a kReadRestart error is thrown to the query layer. (2) the data committed after this transaction was issued, and can be ignored. But there is no way to differentiate between such data and data of type 1. So the tablet server takes a conservative approach and tries to either transparently shift the read point as explained above, or if that isn't possible, throws a kReadRestart error to the query layer. There is an optimization that still helps us differentiate between type 1 and type 2 in a special case and avoid shifting read point/ throwing kReadRestart: the committed data has a commit time within global_limit but the tablet server knows that the commit was after this txn was issued because the corresponding intents were written to intents db after the current transaction was issued (based on optimization in c784595). This is checked by comparing the encoded intent write time in committed entries with the local_limit of the txn. local_limit is a per tablet server limit and chosen as the safe time on the first rpc to the tablet server as part of a txn. So, barring the optimization, if a tablet server sees data ahead of read point but within global_limit, it will either transparently shift the read point ahead and inform the query layer about it (if no earlier reads have been performed by the query layer at other tablet servers) (OR) throw a kReadRestart error to the query layer. A tablet server might find data with commit time that is within global_limit either in regular db or in intents db (if the data was written as part of a recently committed txn whose intents are yet to move to regular db). Apart from the transparent shifting of read point, a second level of transparent retries exists the query layer: In case the tablet server throws a kReadRestart error, the query layer either forwards it to the external client or inturn transparently retries the whole txn if no response data has been sent to the client, by picking a later read point called "restart read point" that is later enough to include all such commit times of transactions with data of type 1/2. Retry at a "restart read point" isn't possible if any data has been sent to the client as part of the txn because that might change older response data. Note that transparent retries, if at all, can only be done in the first statement since completion of a statement results in surely sending some data to the external client. One fact to note is: the first read of a key can result in kReadRestart, but further reads of the same key via later rpcs to the same tablet server can result in kReadRestart only due to data with commit time after read point but within global_limit such that - (1) it was committed after the first rpc. Because if it had committed before the first rpc, a kReadRestart would have been thrown in the first rpc resulting in the new restart read point to be >= all commit timestamps as seen in the first rpc. (2) the intent of the key was written as part of the committed txn before the first rpc (to be precise, before the local_limit that is picked as part of the first rpc). This is because, other txns' committed data with intents written after the local_limit won't result in kReadRestart given the optimization above. In other words, narrower requirements are to be met for a kReadRestart to occur in read of a key after the first read. So, chances of a kReadRestart due to read of a given key, decrease after the first rpc that reads that key. In a READ COMMITTED isolation transaction, a new read point is picked for each statement based on the current hybrid time. Each statement is supposed to include all transactions that commit before the statement is issued. Due to this, all of the above discussion now applies on a per statement level. Even the fact above changes to: '"for each statement", the first read of a key can result in kReadRestart, but further reads of the same key....'. This increases the chances of kReadRestart in a single transaction. This can be resolved by transparently retrying kReadRestart errors for each statement in the query layer in case no data has been sent to the client for that statement (we are allowed to do this in READ COMMITTED instead of worrying if any data has been sent to the client for the txn because data sent before this statement had an older read point and hence older responses wouldn't change). Test Plan: ./yb_build.sh --java-test org.yb.pgsql.TestPgTransparentRestarts Reviewers: alex Reviewed By: alex Subscribers: sergei, mihnea, yql Differential Revision: https://phabricator.dev.yugabyte.com/D14397

…level (#10745)

…n READ COMMITTED isolation (Part-3) Summary: In this third part, we ensure that we don't throw kConflict errors to external ysql clients when using READ COMMITTED isolation level. We do this by - 1. Re-executing a statement when kConflict is seen: this is done by leveraging savepoints. An internal savepoint is created before execution of every statement, which is rolled back to on facing a kConflict. This helps get rid of any provisional writes that where written by the statement before the conflict and hence are no longer valid. The statement is retried indefinitely till statement timeout with configurable exponential backoff. This gives a feeling that pessimistic locking is also in place. Note that we also lazily rely only on the statement timeout to get rid of deadlocks, without proactively detecting them with a distributed deadlock detection algorithm. That will be come in as a separate improvement with pessimistic locking. 2. Using the highest priority for READ COMMITTED txns: this helps ensure that no other txns can abort a txn. Test Plan: Enabled Postgres's existing eval-plan-qual isolation test with appropriate modifications for disable cases that require features yet to be implemented on YB. Added a bunch of new tests from the functional spec as well: src/test/isolation/specs/yb_pb_eval-plan-qual.spec src/test/isolation/specs/yb_read_committed_insert.spec src/test/isolation/specs/yb_read_committed_test_internal_savepoint.spec src/test/isolation/specs/yb_read_committed_update_and_explicit_locking.spec Reviewers: mihnea, alex, rsami Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15383

…OMMITTED isolation (Part-3) Summary: In this third part, we ensure that we don't throw kConflict errors to external ysql clients when using READ COMMITTED isolation level. We do this by - (1) Re-executing a statement when kConflict is seen: this is done by leveraging savepoints. An internal savepoint is created before execution of every statement, which is rolled back to on facing a kConflict. This helps get rid of any provisional writes that where written by the statement before the conflict and hence are no longer valid. The statement is retried indefinitely until statement timeout with configurable exponential backoff. This gives a feeling that pessimistic locking is also in place. Note that we also lazily rely only on the statement timeout to get rid of deadlocks, without proactively detecting them with a distributed deadlock detection algorithm. That will be come in as a separate improvement with pessimistic locking. (2) Using the highest priority for READ COMMITTED txns: this helps ensure that no other txns can abort a READ COMMITTED txn. Even other READ COMMITTED txns can't. Test Plan: Jenkins: urgent Enabled Postgres's existing eval-plan-qual isolation test with appropriate modifications to disable cases that require features yet to be implemented on YB. Added a bunch of new tests from the functional spec as well: src/test/isolation/specs/yb_pb_eval-plan-qual.spec src/test/isolation/specs/yb_read_committed_insert.spec src/test/isolation/specs/yb_read_committed_test_internal_savepoint.spec src/test/isolation/specs/yb_read_committed_update_and_explicit_locking.spec Reviewers: mihnea, alex, rsami, mtakahara Reviewed By: rsami, mtakahara Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15383

…errors in READ COMMITTED isolation (Part-3) Summary: In this third part, we ensure that we don't throw kConflict errors to external ysql clients when using READ COMMITTED isolation level. We do this by - (1) Re-executing a statement when kConflict is seen: this is done by leveraging savepoints. An internal savepoint is created before execution of every statement, which is rolled back to on facing a kConflict. This helps get rid of any provisional writes that where written by the statement before the conflict and hence are no longer valid. The statement is retried indefinitely until statement timeout with configurable exponential backoff. This gives a feeling that pessimistic locking is also in place. Note that we also lazily rely only on the statement timeout to get rid of deadlocks, without proactively detecting them with a distributed deadlock detection algorithm. That will be come in as a separate improvement with pessimistic locking. (2) Using the highest priority for READ COMMITTED txns: this helps ensure that no other txns can abort a READ COMMITTED txn. Even other READ COMMITTED txns can't. Original commit: https://phabricator.dev.yugabyte.com/D15383, TBD Test Plan: Jenklins: urgent, rebase: 2.12 Enabled Postgres's existing eval-plan-qual isolation test with appropriate modifications to disable cases that require features yet to be implemented on YB. Added a bunch of new tests from the functional spec as well: src/test/isolation/specs/yb_pb_eval-plan-qual.spec src/test/isolation/specs/yb_read_committed_insert.spec src/test/isolation/specs/yb_read_committed_test_internal_savepoint.spec src/test/isolation/specs/yb_read_committed_update_and_explicit_locking.spec Reviewers: mihnea, alex, rsami, mtakahara Reviewed By: mtakahara Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15571

…n READ COMMITTED isolation (Part-3) Summary: In this third part, we ensure that we don't throw kConflict errors to external ysql clients when using READ COMMITTED isolation level. We do this by - (1) Re-executing a statement when kConflict is seen: this is done by leveraging savepoints. An internal savepoint is created before execution of every statement, which is rolled back to on facing a kConflict. This helps get rid of any provisional writes that where written by the statement before the conflict and hence are no longer valid. The statement is retried indefinitely until statement timeout with configurable exponential backoff. This gives a feeling that pessimistic locking is also in place. Note that we also lazily rely only on the statement timeout to get rid of deadlocks, without proactively detecting them with a distributed deadlock detection algorithm. That will be come in as a separate improvement with pessimistic locking. (2) Using the highest priority for READ COMMITTED txns: this helps ensure that no other txns can abort a READ COMMITTED txn. Even other READ COMMITTED txns can't. Test Plan: Jenkins: urgent Enabled Postgres's existing eval-plan-qual isolation test with appropriate modifications to disable cases that require features yet to be implemented on YB. Added a bunch of new tests from the functional spec as well: src/test/isolation/specs/yb_pb_eval-plan-qual.spec src/test/isolation/specs/yb_read_committed_insert.spec src/test/isolation/specs/yb_read_committed_test_internal_savepoint.spec src/test/isolation/specs/yb_read_committed_update_and_explicit_locking.spec Reviewers: mihnea, alex, rsami, mtakahara Reviewed By: rsami, mtakahara Subscribers: yql Differential Revision: https://phabricator.dev.yugabyte.com/D15383

pkj415 self-assigned this Jul 26, 2021

pkj415 added the area/ysql Yugabyte SQL (YSQL) label Jul 26, 2021

This was referenced Jul 26, 2021

[YSQL] Postgres compatible transaction semantics #5683

Open

READ COMMITTED required for Camunda #9464

Closed

pkj415 changed the title ~~[YSQL] Support READ COMMITTED isolation level with optimistic locking~~ [YSQL] Support READ COMMITTED isolation level Nov 11, 2021

pkj415 added a commit to pkj415/yugabyte-db that referenced this issue Dec 2, 2021

[yugabyte#9468] [Docs] YSQL: Update docs to include read committed is…

47386e8

…olation level

pkj415 added a commit that referenced this issue Jan 10, 2022

[#9468] [Docs] YSQL: Update docs to include read committed isolation …

518049a

…level (#10745)

pkj415 mentioned this issue Feb 23, 2022

[YSQL] Handle conflicts in READ COMMITTED by performing Pg style “READ COMMITTED Update Checking” #11573

Open

pkj415 closed this as completed Mar 7, 2022

pkj415 changed the title ~~[YSQL] Support READ COMMITTED isolation level~~ [YSQL] Support READ COMMITTED isolation level semantics for DMLs Aug 9, 2022

yugabyte-ci added kind/bug This issue is a bug priority/medium Medium priority issue labels Aug 9, 2022

pkj415 mentioned this issue Aug 9, 2022

[YSQL] Support READ COMMITTED isolation level #13557

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[YSQL] Support READ COMMITTED isolation level semantics for DMLs #9468

[YSQL] Support READ COMMITTED isolation level semantics for DMLs #9468

pkj415 commented Jul 26, 2021 •

edited by yugabyte-ci

Loading

[YSQL] Support READ COMMITTED isolation level semantics for DMLs #9468

[YSQL] Support READ COMMITTED isolation level semantics for DMLs #9468

Comments

pkj415 commented Jul 26, 2021 • edited by yugabyte-ci Loading

pkj415 commented Jul 26, 2021 •

edited by yugabyte-ci

Loading