-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-49363][SS][TESTS] Add unit tests for potential RocksDB state store SST file mismatch #47850
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
|
@@ -25,6 +25,7 @@ import scala.collection.mutable | |
import scala.concurrent.{ExecutionContext, Future} | ||
import scala.concurrent.duration._ | ||
import scala.language.implicitConversions | ||
import scala.util.Random | ||
|
||
import org.apache.commons.io.FileUtils | ||
import org.apache.hadoop.conf.Configuration | ||
|
@@ -1770,6 +1771,138 @@ class RocksDBSuite extends AlsoTestWithChangelogCheckpointingEnabled with Shared | |
} | ||
} | ||
|
||
testWithChangelogCheckpointingEnabled("reloading the same version") { | ||
// Keep executing the same batch for two or more times. Some queries with ForEachBatch | ||
// will cause this behavior. | ||
// The test was accidentally fixed by SPARK-48586 (https://github.com/apache/spark/pull/47130) | ||
val remoteDir = Utils.createTempDir().toString | ||
val conf = dbConf.copy(minDeltasForSnapshot = 2, compactOnCommit = false) | ||
new File(remoteDir).delete() // to make sure that the directory gets created | ||
withDB(remoteDir, conf = conf) { db => | ||
// load the same version of pending snapshot uploading | ||
// This is possible because after committing version x, we can continue to x+1, and replay | ||
// x+1. The replay will load a checkpoint by version x. At this moment, the snapshot | ||
// uploading may not be finished. | ||
// Previously this generated a problem: new files generated by reloading are added to | ||
// local -> cloud file map and the information is used to skip some files uploading, which is | ||
// wrong because these files aren't a part of the RocksDB checkpoint. | ||
// This test was accidentally fixed by | ||
// SPARK-48931 (https://github.com/apache/spark/pull/47393) | ||
|
||
db.load(0) | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Sorry to ask about more effort, but shall we leave "walkthrough" code comment for future readers? I think it'd be much easier to understand than understanding the scenario described in above, and try to think through by themselves. Let's ensure that the test is understandable for moderate people. Please refer to the test suites for stateful operator which we track watermark value (and state rows for complicated case) - we put code comment per microbatch to walkthrough. |
||
db.put("foo", "bar") | ||
// Snapshot checkpoint not needed | ||
db.commit() | ||
|
||
// Continue using local DB | ||
db.load(1) | ||
db.put("foo", "bar") | ||
// Should create a local RocksDB snapshot | ||
db.commit() | ||
// Upload the local RocksDB snapshot to the cloud with 2.zip | ||
db.doMaintenance() | ||
|
||
// This will reload Db from the cloud. | ||
db.load(1) | ||
db.put("foo", "bar") | ||
// Should create another local snapshot | ||
db.commit() | ||
|
||
// Continue using local DB | ||
db.load(2) | ||
db.put("foo", "bar") | ||
// Snapshot checkpoint not needed | ||
db.commit() | ||
|
||
// Reload DB from the cloud, loading from 2.zip | ||
db.load(2) | ||
db.put("foo", "bar") | ||
// Snapshot checkpoint not needed | ||
db.commit() | ||
|
||
// Will upload local snapshot and overwrite 2.zip | ||
db.doMaintenance() | ||
|
||
// Reload new 2.zip just uploaded to validate it is not corrupted. | ||
db.load(2) | ||
db.put("foo", "bar") | ||
db.commit() | ||
|
||
// Test the maintenance thread is delayed even after the next snapshot is created. | ||
// There will be two outstanding snapshots. | ||
for (batchVersion <- 3 to 6) { | ||
db.load(batchVersion) | ||
db.put("foo", "bar") | ||
// In batchVersion 3 and 5, it will generate a local snapshot but won't be uploaded. | ||
db.commit() | ||
} | ||
db.doMaintenance() | ||
|
||
// Test the maintenance is called after each batch. This tests a common case where | ||
// maintenance tasks finish quickly. | ||
for (batchVersion <- 7 to 10) { | ||
for (j <- 0 to 1) { | ||
db.load(batchVersion) | ||
db.put("foo", "bar") | ||
db.commit() | ||
db.doMaintenance() | ||
} | ||
} | ||
} | ||
} | ||
|
||
for (randomSeed <- 1 to 8) { | ||
for (ifTestSkipBatch <- 0 to 1) { | ||
testWithChangelogCheckpointingEnabled( | ||
s"randomized snapshotting $randomSeed ifTestSkipBatch $ifTestSkipBatch") { | ||
// The unit test simulates the case where batches can be reloaded and maintenance tasks | ||
// can be delayed. After each batch, we randomly decide whether we would move onto the | ||
// next batch, and whetehr maintenance task is executed. | ||
val remoteDir = Utils.createTempDir().toString | ||
val conf = dbConf.copy(minDeltasForSnapshot = 3, compactOnCommit = false) | ||
new File(remoteDir).delete() // to make sure that the directory gets created | ||
withDB(remoteDir, conf = conf) { db => | ||
// A second DB is opened to simulate another executor that runs some batches that | ||
// skipped in the current DB. | ||
withDB(remoteDir, conf = conf) { db2 => | ||
val random = new Random(randomSeed) | ||
var curVer: Int = 0 | ||
for (i <- 1 to 100) { | ||
db.load(curVer) | ||
db.put("foo", "bar") | ||
db.commit() | ||
// For a one in five chance, maintenance task is executed. The chance is created to | ||
// simulate the case where snapshot isn't immediatelly uploaded, and even delayed | ||
// so that the next snapshot is ready. We create a snapshot in every 3 batches, so | ||
// with 1/5 chance, it is more likely to create longer maintenance delay. | ||
if (random.nextInt(5) == 0) { | ||
db.doMaintenance() | ||
} | ||
// For half the chance, we move to the next version, and half the chance we keep the | ||
// same version. When the same version is kept, the DB will be reloaded. | ||
if (random.nextInt(2) == 0) { | ||
val inc = if (ifTestSkipBatch == 1) { | ||
random.nextInt(3) | ||
} else { | ||
1 | ||
} | ||
if (inc > 1) { | ||
// Create changelog files in the gap | ||
for (j <- 1 to inc - 1) { | ||
db2.load(curVer + j) | ||
db2.put("foo", "bar") | ||
db2.commit() | ||
} | ||
} | ||
curVer = curVer + inc | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
} | ||
|
||
test("validate Rocks DB SST files do not have a VersionIdMismatch" + | ||
" when metadata file is not overwritten - scenario 1") { | ||
val fmClass = "org.apache.spark.sql.execution.streaming.state." + | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't get how this test is simulating the scenario described here. We are talking about the race condition, but this test does not trigger the race condition because doMaintenance() is synchronous.
Same with randomized test in below. Both tests do not fail with reverting SPARK-48931.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry the ticket I gave was the wrong one. I updated it and it should now work.
The commit cannot be easily reverted now, but I applied the task to the parent hash of the commit (b5a55e4) and confirmed that it would fail:
and after applying the commit 40ad829, the test would pass.
I can explain why it fails in person. But it is nothing to do with data race (I hope I didn't say it in the commnts). It's related to the sequence of:
and this uploading is problematic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah now I get the picture of it. Was confused due to incorrect JIRA ticket. Thanks for the detailed explanation!