-
Notifications
You must be signed in to change notification settings - Fork 3.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
sql/opt: add implicit SELECT FOR UPDATE support for UPSERT statements #50180
Labels
A-sql-optimizer
SQL logical planning and optimizations.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Comments
This was referenced Jun 13, 2020
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jun 25, 2020
…anges Fixes cockroachdb#50202. In cockroachdb#50202, we saw that we would accidentally allow batches to be split on range boundaries and continue to carry the `CanForwardReadTimestamp` flag. This can lead to serializability violations under very specific conditions: 1. the first operation that the transaction performs is a batch of locking read requests (due to implicit or explicit SFU) 2. this batch of locking reads spans multiple ranges 3. this batch of locking reads is issued in parallel by the DistSender 4. this locking read hits contention and is bumped on at least one of the ranges due to a WriteTooOld error 5. an unreplicated lock from one of the non-refreshed sub-batches is lost during a lease transfer. It turns out that the `kv/contention` roachtest meets these requirements perfectly when implicit SFU support is added to UPSERT statements: cockroachdb#50180. It creates a tremendous amount of contention and issues a batch of locking ScanRequests during a LookupJoin as its first operation. This materializes as ConditionFailedErrors (which should be impossible) in the CPuts that the UPSERT issues to maintain the table's secondary index. This commit fixes this bug by ensuring that if a batch is going to be split across ranges and any of its requests would need to refresh on read timestamp bumps, it does not have its CanForwardReadTimestamp flag set. It would be incorrect to allow part of a batch to perform a server-side refresh if another part of the batch might have returned a different result at the higher timestamp, which is a fancy way of saying that it needs to refresh because it is using optimistic locking. Such behavior could cause a transaction to observe an inconsistent snapshot and violate serializability. Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR UPDATE statement containing an IN clause could fail to observe a consistent snapshot and violate serializability.
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jun 29, 2020
…anges Fixes cockroachdb#50202. In cockroachdb#50202, we saw that we would accidentally allow batches to be split on range boundaries and continue to carry the `CanForwardReadTimestamp` flag. This can lead to serializability violations under very specific conditions: 1. the first operation that the transaction performs is a batch of locking read requests (due to implicit or explicit SFU) 2. this batch of locking reads spans multiple ranges 3. this batch of locking reads is issued in parallel by the DistSender 4. this locking read hits contention and is bumped on at least one of the ranges due to a WriteTooOld error 5. an unreplicated lock from one of the non-refreshed sub-batches is lost during a lease transfer. It turns out that the `kv/contention` roachtest meets these requirements perfectly when implicit SFU support is added to UPSERT statements: cockroachdb#50180. It creates a tremendous amount of contention and issues a batch of locking ScanRequests during a LookupJoin as its first operation. This materializes as ConditionFailedErrors (which should be impossible) in the CPuts that the UPSERT issues to maintain the table's secondary index. This commit fixes this bug by ensuring that if a batch is going to be split across ranges and any of its requests would need to refresh on read timestamp bumps, it does not have its CanForwardReadTimestamp flag set. It would be incorrect to allow part of a batch to perform a server-side refresh if another part of the batch might have returned a different result at the higher timestamp, which is a fancy way of saying that it needs to refresh because it is using optimistic locking. Such behavior could cause a transaction to observe an inconsistent snapshot and violate serializability. Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR UPDATE statement containing an IN clause could fail to observe a consistent snapshot and violate serializability.
craig bot
pushed a commit
that referenced
this issue
Jun 30, 2020
50633: kv: unset CanForwardReadTimestamp flag on batches that spans ranges r=nvanbenschoten a=nvanbenschoten Fixes #50202. In #50202, we saw that we would accidentally allow batches to be split on range boundaries and continue to carry the `CanForwardReadTimestamp` flag. This can lead to serializability violations under very specific conditions: 1. the first operation that the transaction performs is a batch of locking read requests (due to implicit or explicit SFU) 2. this batch of locking reads spans multiple ranges 3. this batch of locking reads is issued in parallel by the DistSender 4. this locking read hits contention and is bumped on at least one of the ranges due to a WriteTooOld error 5. an unreplicated lock from one of the non-refreshed sub-batches is lost during a lease transfer. It turns out that the `kv/contention` roachtest meets these requirements perfectly when implicit SFU support is added to UPSERT statements: #50180. It creates a tremendous amount of contention and issues a batch of locking ScanRequests during a LookupJoin as its first operation. This materializes as ConditionFailedErrors (which should be impossible) in the CPuts that the UPSERT issues to maintain the table's secondary index. This PR fixes this bug by ensuring that if a batch is going to be split across ranges and any of its requests would need to refresh on read timestamp bumps, it does not have its CanForwardReadTimestamp flag set. It would be incorrect to allow part of a batch to perform a server-side refresh if another part of the batch might have returned a different result at the higher timestamp, which is a fancy way of saying that it needs to refresh because it is using optimistic locking. Such behavior could cause a transaction to observe an inconsistent snapshot and violate serializability. It then adds support for locking scans to kvnemesis, which would have caught to bug fairly easily. Finally, it fixes a KV API UX issue around locking scans and retry errors. Before this change, it was possible for a non-transactional locking scan (which itself doesn't make much sense) to hit a WriteTooOld retry error. This was caused by eager propagation of WriteTooOld errors from MVCC when FailOnMoreRecent was enabled for an MVCCScan. I'd appreciate if @itsbilal could give that last commit a review. Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR UPDATE statement containing an IN clause could fail to observe a consistent snapshot and violate serializability. Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
nvanbenschoten
added a commit
to nvanbenschoten/cockroach
that referenced
this issue
Jun 30, 2020
…anges Fixes cockroachdb#50202. In cockroachdb#50202, we saw that we would accidentally allow batches to be split on range boundaries and continue to carry the `CanForwardReadTimestamp` flag. This can lead to serializability violations under very specific conditions: 1. the first operation that the transaction performs is a batch of locking read requests (due to implicit or explicit SFU) 2. this batch of locking reads spans multiple ranges 3. this batch of locking reads is issued in parallel by the DistSender 4. this locking read hits contention and is bumped on at least one of the ranges due to a WriteTooOld error 5. an unreplicated lock from one of the non-refreshed sub-batches is lost during a lease transfer. It turns out that the `kv/contention` roachtest meets these requirements perfectly when implicit SFU support is added to UPSERT statements: cockroachdb#50180. It creates a tremendous amount of contention and issues a batch of locking ScanRequests during a LookupJoin as its first operation. This materializes as ConditionFailedErrors (which should be impossible) in the CPuts that the UPSERT issues to maintain the table's secondary index. This commit fixes this bug by ensuring that if a batch is going to be split across ranges and any of its requests would need to refresh on read timestamp bumps, it does not have its CanForwardReadTimestamp flag set. It would be incorrect to allow part of a batch to perform a server-side refresh if another part of the batch might have returned a different result at the higher timestamp, which is a fancy way of saying that it needs to refresh because it is using optimistic locking. Such behavior could cause a transaction to observe an inconsistent snapshot and violate serializability. Release note (bug fix): fix a rare bug where a multi-Range SELECT FOR UPDATE statement containing an IN clause could fail to observe a consistent snapshot and violate serializability.
craig bot
pushed a commit
that referenced
this issue
Aug 20, 2020
50053: Script for the PublishRelease TC build configuration r=pbardea a=jlinder Before: the script wasn't implemented. Now: Part of the new release process, this script - tags the selected SHA to the provided name - compiles the binaries and archive and uploads them to S3 as the versioned name - uploads the docker image to docker.io/cockroachdb/cockroach - pushes the tag to github.com/cockroachdb/cockroach - push all the artificats to their respective `latest` locations as appropriate Release note: None 51567: kvserver: Allow rebalances between stores on the same nodes. r=lunevalex a=lunevalex Closes #6782 This change modifies the replica_queue to allow rebalances between multiple stores within a single node. This is possible because we previously introduced atomic rebalances in #12768. The first step was to remove the constraints in the allocator that prevented same node rebalances and update the validation in the replica_queue to accept these rebalance proposals. There is one caveat that with 1x replication an atomic rebalance is not possible, so we now support adding multiple replicas of the range to the same node under this condition. With the constraints removed there would be nothing in the allocator to prevent it from placing multiple replicas of a range on the same node across multiple stores. This is not desired because in a simple 3x replication scenario a single node crash may result in a whole range becoming unavailable. Allocator uses locality tags to model failure domains, but a node was not considered to be a locality. It is thus natural to extend the failure domain definition to the node and model it as a locality tier. Now stores on the same node would be factored into the diversity_score and repel each other, just like nodes in the same datacenter do in a multi-region setup. Release note (performance improvement): This change removes the last roadblock to running CockroachDB with multiple stores (i.e. disks) per node. The allocation algorithm now supports intra-node rebalances, which means CRDB can fully utilize the additional stores on the same node. 52754: importccl: speed up revert of IMPORT INTO empty table r=dt a=dt When IMPORT INTO fails, it reverts the tables to their pre-IMPORT state. Typically this requires running a somewhat expensive RevertRange operation that finds the keys written by the IMPORT in amongst all the table data and deletes just those keys. This is somewhat expensive -- we need to iterate the keys in the target table and check them to see if they need to be reverted. Non-INTO style IMPORTs create the table into which they will IMPORT and thus can just drop it wholesale on failure, instead of doing this expensive revert. However INTO-style IMPORTs could use a similarly fast/cheap wholesale delete *if they knew the table was empty* when the IMPORT was started. This change tracks which tables were empty when the IMPORT started and then deletes, rather than reverts, the table span on failure. Release note (performance improvement): Cleaning up after a failure during IMPORT INTO a table which was empty is now faster. 53023: opt: add index acceleration support for ~ and && bounding box operators r=rytaft a=rytaft This commit adds index acceleration support for the bounding box comparison operators, `~` and `&&`. It maps `~` to Covers and `&&` to Intersects. Release note (performance improvement): The ~ and && geospatial bounding box operations can now benefit from index acceleration if one of the operands is an indexed geometry column. 53049: bulkio: Fix transaction semantics in job scheduler. r=miretskiy a=miretskiy Fixes #53033 Fixes #52959 Use transaction when querying for the schedules to run. In addition, ensure that a single bad schedule does not cause all of the previous work to be wasted by using transaction savepoints. Release Notes: None 53132: sql/opt: add implicit SELECT FOR UPDATE support for UPSERT statements r=nvanbenschoten a=nvanbenschoten Fixes #50180. This commit adds support for implicit SELECT FOR UPDATE support for UPSERT statements with a VALUES clause. This should improve throughput and latency for contended UPSERT statements in much the same way that 435fa43 did for UPDATE statements. However, this only has an effect on UPSERT statements into tables with multiple indexes because UPSERT statements into single-index tables hit a fast-path where they perform a blind-write without doing an initial row scan. Conceptually, if we picture an UPSERT statement as the composition of a SELECT statement and an INSERT statement (with loosened semantics around existing rows) then this change performs the following transformation: ``` UPSERT t = SELECT FROM t + INSERT INTO t => UPSERT t = SELECT FROM t FOR UPDATE + INSERT INTO t ``` I plan to test this out on a contended `indexes` workload at some point in the future. Release note (sql change): UPSERT statements now acquire locks using the FOR UPDATE locking mode during their initial row scan, which improves performance for contended workloads when UPSERTing into tables with multiple indexes. This behavior is configurable using the enable_implicit_select_for_update session variable and the sql.defaults.implicit_select_for_update.enabled cluster setting. Co-authored-by: James H. Linder <jamesl@cockroachlabs.com> Co-authored-by: Alex Lunev <alexl@cockroachlabs.com> Co-authored-by: David Taylor <tinystatemachine@gmail.com> Co-authored-by: Rebecca Taft <becca@cockroachlabs.com> Co-authored-by: Yevgeniy Miretskiy <yevgeniy@cockroachlabs.com> Co-authored-by: Nathan VanBenschoten <nvanbenschoten@gmail.com>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
A-sql-optimizer
SQL logical planning and optimizations.
C-enhancement
Solution expected to add code/behavior + preserve backward-compat (pg compat issues are exception)
Address this TODO:
cockroach/pkg/sql/opt/exec/execbuilder/mutation.go
Lines 981 to 989 in 01a780b
Specifically, UPSERT statements can occasionally hit a fast-path where they perform a blind
PutRequest
. However, for those that can't (e.g. writing a subset of columns or need to update a secondary index), an initial row scan must be used, just like with UPDATE statements. We should apply a FOR UPDATE row-level locking mode to the initial row scan of an UPSERT statement, in accordance with theenable_implicit_select_for_update
session variable.The text was updated successfully, but these errors were encountered: