forked from delta-io/delta
-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[DELTA-OSS-EXTERNAL] Improved Delta concurrency with finer-grained co…
…nflict detection in OptTxnImpl This is a modified PR from the original PR delta-io#114 by `tomasbartalos` (kudos, it was a very good PR!). This PR tracks transaction changes at a finer granularity (no new columns required in RemoveFile action) thus allowing more concurrent operations to succeed. closes delta-io#228 and delta-io#72 This PR improves the conflict detection logic in OptTxn using the following strategy. - OptTxn tracks two additional things - All the partitions read by the query using the OptTxn - All the files read by the query - When committing a txn, it checks this txn's actions against the actions of concurrently committed txns using the following strategy: 1. If any of the concurrently added files are in the partitions read by this txn, then fail because this txn should have read them. -It’s okay for files to have been removed from the partitions read by this txn as long as this txn never read those files. This is checked by the next rule. 2. If any of the files read by this txn have already been removed by concurrent txns, then fail. 3. If any of the files removed by this txn have already been removed by concurrent txns, then fail. - In addition, I have made another change where setting `dataChange` to `false` in all the actions (enabled by delta-io#223) will ensure the txn will not conflict with any other concurrent txn based on predicates. Tests written by `tomasbartalos` in the original PR. Some tests were changed because some scenarios that were blocked in the original PR are now allowed, thanks to more granular and permissive conflict detection logic. Some test names tweaked to ensure clarity. GitOrigin-RevId: f02a8f48838f86d256a86cd40241cdbfa74addb4 Lead-authored-by: Tathagata Das <tathagata.das1565@gmail.com> Co-authored-by: Tomas Bartalos <tomas.bartalos@nike.sk>
- Loading branch information
Showing
4 changed files
with
805 additions
and
50 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
91 changes: 91 additions & 0 deletions
91
src/main/scala/org/apache/spark/sql/delta/isolationLevels.scala
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,91 @@ | ||
/* | ||
* Copyright 2019 Databricks, Inc. | ||
* | ||
* Licensed under the Apache License, Version 2.0 (the "License"); | ||
* you may not use this file except in compliance with the License. | ||
* You may obtain a copy of the License at | ||
* | ||
* http://www.apache.org/licenses/LICENSE-2.0 | ||
* | ||
* Unless required by applicable law or agreed to in writing, software | ||
* distributed under the License is distributed on an "AS IS" BASIS, | ||
* WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
* See the License for the specific language governing permissions and | ||
* limitations under the License. | ||
*/ | ||
|
||
package org.apache.spark.sql.delta | ||
|
||
/** | ||
* Trait that defines the level consistency guarantee is going to be provided by | ||
* `OptimisticTransaction.commit()`. [[Serializable]] is the most | ||
* strict level and [[SnapshotIsolation]] is the least strict one. | ||
* | ||
* @see [[IsolationLevel.allLevelsInDescOrder]] for all the levels in the descending order | ||
* of strictness and [[IsolationLevel.DEFAULT]] for the default table isolation level. | ||
*/ | ||
sealed trait IsolationLevel { | ||
override def toString: String = this.getClass.getSimpleName.stripSuffix("$") | ||
} | ||
|
||
/** | ||
* This isolation level will ensure serializability between all read and write operations. | ||
* Specifically, for write operations, this mode will ensure that the result of | ||
* the table will be perfectly consistent with the visible history of operations, that is, | ||
* as if all the operations were executed sequentially one by one. | ||
*/ | ||
case object Serializable extends IsolationLevel | ||
|
||
/** | ||
* This isolation level will ensure snapshot isolation consistency guarantee between write | ||
* operations only. In other words, if only the write operations are considered, then | ||
* there exists a serializable sequence between them that would produce the same result | ||
* as seen in the table. However, if both read and write operations are considered, then | ||
* there may not exist a serializable sequence that would explain all the observed reads. | ||
* | ||
* This provides a lower consistency guarantee than [[Serializable]] but a higher | ||
* availability than that. For example, unlike [[Serializable]], this level allows an UPDATE | ||
* operation to be committed even if there was a concurrent INSERT operation that has already | ||
* added data that should have been read by the UPDATE. It will be as if the UPDATE was executed | ||
* before the INSERT even if the former was committed after the latter. As a side effect, | ||
* the visible history of operations may not be consistent with the | ||
* result expected if these operations were executed sequentially one by one. | ||
*/ | ||
case object WriteSerializable extends IsolationLevel | ||
|
||
/** | ||
* This isolation level will ensure that all reads will see a consistent | ||
* snapshot of the table and any transactional write will successfully commit only | ||
* if the values updated by the transaction have not been changed externally since | ||
* the snapshot was read by the transaction. | ||
* | ||
* This provides a lower consistency guarantee than [[WriteSerializable]] but a higher | ||
* availability than that. For example, unlike [[WriteSerializable]], this level allows two | ||
* concurrent UPDATE operations reading the same data to be committed successfully as long as | ||
* they don't modify the same data. | ||
* | ||
* Note that for operations that do not modify data in the table, Snapshot isolation is same | ||
* as Serializablity. Hence such operations can be safely committed with Snapshot isolation level. | ||
*/ | ||
case object SnapshotIsolation extends IsolationLevel | ||
|
||
|
||
object IsolationLevel { | ||
|
||
val DEFAULT = WriteSerializable | ||
|
||
/** All possible isolation levels in descending order of guarantees provided */ | ||
val allLevelsInDescOrder: Seq[IsolationLevel] = Seq( | ||
Serializable, | ||
WriteSerializable, | ||
SnapshotIsolation) | ||
|
||
/** All the valid isolation levels that can be specified as the table isolation level */ | ||
val validTableIsolationLevels = Set[IsolationLevel](Serializable, WriteSerializable) | ||
|
||
def fromString(s: String): IsolationLevel = { | ||
allLevelsInDescOrder.find(_.toString.equalsIgnoreCase(s)).getOrElse { | ||
throw new IllegalArgumentException(s"invalid isolation level '$s'") | ||
} | ||
} | ||
} |
Oops, something went wrong.