[SPARK-48589][SQL][SS] Add option snapshotStartBatchId and snapshotPartitionId to state data source #46944

eason-yuchen-liu · 2024-06-11T20:29:33Z

What changes were proposed in this pull request?

This PR defines two new options, snapshotStartBatchId and snapshotPartitionId, for the existing state reader. Both of them should be provided at the same time.

When there is no snapshot file at snapshotStartBatch (note there is an off-by-one issue between version and batch Id), throw an exception.
Otherwise, the reader should continue to rebuild the state by reading delta files only, and ignore all snapshot files afterwards.
Note that if a batchId option is already specified. That batchId is the ending batchId, we should then end at that batchId.
This feature supports state generated by HDFS state store provider and RocksDB state store provider with changelog checkpointing enabled. It does not support RocksDB with changelog disabled which is the default for RocksDB.

Why are the changes needed?

Sometimes when a snapshot is corrupted, users want to bypass it when reading a later state. This PR gives user ability to specify the starting snapshot version and partition. This feature can be useful for debugging purpose.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Created test cases for testing edge cases for the input of new options. Created test for the new public function replayReadStateFromSnapshot. Created integration test for the new options against four stateful operators: limit, aggregation, deduplication, stream-stream join. Instead of generating states within the tests which is unstable, I prepare golden files for the integration test.

Was this patch authored or co-authored using generative AI tooling?

No.

…-liu/spark into skipSnapshotAtBatch

eason-yuchen-liu · 2024-06-12T20:26:17Z

Is there necessity to add an end-to-end test for the options? If so, I can create another PR. The way to construct it is probably by sleeping for a sufficiently long time for maintenance task to run. @anishshri-db @HeartSaVioR

...rc/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StatePartitionReader.scala

...re/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StateScanBuilder.scala

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

WweiL · 2024-06-13T00:05:38Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

+        throw QueryExecutionErrors.failedToReadSnapshotFileNotExistsError(
+          snapshotFile(startVersion), toString(), null)
+      }
+      synchronized { putStateIntoStateCacheMap(startVersion, startVersionMap.get) }


is it possible to refactor this with existing loadMap fcn? or add helper function for shared logic

For HDFS, it is hard because the common part is really small. But for RocksDB, there is room for refactoring. For example, this is PR is to test whether we can extract a common part of both load function. #46927

WweiL · 2024-06-13T00:16:01Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

+   * @param endVersion checkpoint version to end with
+   */
+  def getStore(startVersion: Long, endVersion: Long): StateStore =
+    throw new SparkUnsupportedOperationException("getStore with startVersion and endVersion " +


can we just put nothing here? like

def getStore(version: Long): StateStore

It seems that we cannot, because to make this method optional, it has to have a default implementation, otherwise a build error will be thrown.

Hmm - what error do you see here ? can you paste it please ?

Building on the assumption that when users create custom state store provider and they do not implement this method because it is optional. They will see errors like

Missing implementation for member of trait StateStoreProvider

WweiL · 2024-06-13T00:18:13Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

 import org.apache.spark.io.CompressionCodec
 import org.apache.spark.sql.{AnalysisException, DataFrame, Encoders, Row}
 import org.apache.spark.sql.catalyst.expressions.{BoundReference, GenericInternalRow}
 import org.apache.spark.sql.catalyst.plans.physical.HashPartitioning
 import org.apache.spark.sql.execution.datasources.v2.state.utils.SchemaUtil
 import org.apache.spark.sql.execution.streaming.{CommitLog, MemoryStream, OffsetSeqLog}
-import org.apache.spark.sql.execution.streaming.state.{HDFSBackedStateStoreProvider, RocksDBStateStoreProvider, StateStore}
+import org.apache.spark.sql.execution.streaming.state._


is this because these three are everything in that pkg?

No. The reason is I use three new classes in this pkg. I think it would be too long to include them all. What do you think?

Yea this should be good

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

WweiL · 2024-06-13T00:38:33Z

@WweiL
Tagging myself so it shows on my dashboard

sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala

...ore/src/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSource.scala

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

common/utils/src/main/resources/error/error-conditions.json

…-liu/spark into skipSnapshotAtBatch

HeartSaVioR · 2024-06-27T06:00:25Z

...rc/main/scala/org/apache/spark/sql/execution/datasources/v2/state/StatePartitionReader.scala

+
+      case Some(snapshotStartBatchId) =>
+        if (!provider.isInstanceOf[SupportsFineGrainedReplayFromSnapshot]) {
+          StateStoreErrors.stateStoreProviderDoesNotSupportFineGrainedReplay(


I guess we've been used to throw the exception here explicitly rather than the method to throw?

This error will be thrown in two places (another is in JoinStateManager), so I create a function for it. This way the error also gets its own error class.

HeartSaVioR · 2024-06-27T06:05:12Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

+   * @param endVersion   checkpoint version to end with
+   * @return [[HDFSBackedStateStore]]
+   */
+  override def replayStateFromSnapshot(startVersion: Long, endVersion: Long): StateStore = {


Please apply the same: for all methods in this PR, if the meaning of startVersion is actually the snapshot version to begin with, let's use snapshotVersion which is more clearer about the intention.

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

HeartSaVioR · 2024-06-27T06:16:09Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

+      if (startVersion < 1) {
+        throw QueryExecutionErrors.unexpectedStateStoreVersion(startVersion)
+      }
+      if (endVersion < startVersion || endVersion < 0) {


I guess we'd like to give a different error for invalid value (negative) vs criteria (endVersion has to be equal or later than startVersion). The error message wouldn't give the context on why it failed. Users could check the option value by themselves but ideally better to kindly tell them.

Also the former check already covers the latter one; startVersion has to be equal or higher than 1, so endVersion also has to be equal or higher than 1. The latter is only needed when we want to produce different error on different pattern of invalid value.

I guess we'd like to give a different error for invalid value (negative) vs criteria (endVersion has to be equal or later than startVersion). The error message wouldn't give the context on why it failed. Users could check the option value by themselves but ideally better to kindly tell them.

There is a better error message in StateDataSource where the users' input is verified and it is the only usage of this function. I think the error message here will not matter too much since I would not expect users to call this method directly.

HeartSaVioR · 2024-06-27T07:00:52Z

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala

+      if (endVersion < startVersion) {
+        throw QueryExecutionErrors.unexpectedStateStoreVersion(endVersion)
+      }
+      rocksDB.loadFromSnapshot(startVersion, endVersion)


just to double check, readOnly flag is not needed unlike the path of load(), do I understand correctly? If then shall we just implement one of two and call other to reduce redundant code?

Yes. Good catch.

HeartSaVioR · 2024-06-27T07:04:22Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

+ */
+trait SupportsFineGrainedReplayFromSnapshot {
+  /**
+   * Used by snapshotStartBatchId option when reading state generated by join operation as data


Let's not couple too much with implementation details, especially the current implementation of Spark codebase. 3rd party state store provider does not need to know about this. If we think this is needed to please them to implement the two different methods, let's just leave this method for the interface level and wrap the state store with read-only in caller side.

HeartSaVioR · 2024-06-27T07:10:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStore.scala

+   * @param snapshotVersion checkpoint version of the snapshot to start with
+   * @param endVersion   checkpoint version to end with
+   */
+  def replayReadStateFromSnapshot(snapshotVersion: Long, endVersion: Long): ReadStateStore


Shall we just provide the default implementation to wrap the read-write state store to read-only? We wouldn't need to let state store provider to implement this except the case they can optimize specifically for read-only.

You can update the comment like the way why they may want to implement this or they would just leave it as default, instead of describing where will call this method.

HeartSaVioR · 2024-06-27T07:12:08Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala

@@ -264,3 +296,8 @@ class StateStoreValueRowFormatValidationFailure(errorMsg: String)
  extends SparkRuntimeException(
    errorClass = "STATE_STORE_VALUE_ROW_FORMAT_VALIDATION_FAILURE",
    messageParameters = Map("errorMsg" -> errorMsg))
+
+class StateStoreProviderDoesNotSupportFineGrainedReplay(inputClass: String)
+ extends SparkUnsupportedOperationException(


nit: one more space

Where to insert?

before e, it's only one space.

HeartSaVioR · 2024-06-27T07:13:24Z

...ain/scala/org/apache/spark/sql/execution/streaming/state/SymmetricHashJoinStateManager.scala

-        stateStoreProvider.getStore(stateInfo.get.storeVersion)
+        if (snapshotStartVersion.isDefined) {
+          if (!stateStoreProvider.isInstanceOf[SupportsFineGrainedReplayFromSnapshot]) {
+            StateStoreErrors.stateStoreProviderDoesNotSupportFineGrainedReplay(


HeartSaVioR · 2024-06-27T07:28:16Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

@@ -796,4 +973,141 @@ abstract class StateDataSourceReadSuite extends StateDataSourceTestBase with Ass
      testForSide("right")
    }
  }
+
+  protected def testSnapshotNotFound(): Unit = {
+    withTempDir(tempDir => {


nit: according to Databricks scala style, this should be withTempDir { tempDir =>, could save one indentation (curly brace)

HeartSaVioR · 2024-06-27T07:33:31Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+        provider.asInstanceOf[SupportsFineGrainedReplayFromSnapshot]
+          .replayReadStateFromSnapshot(1, 2)
+      }
+      checkError(exc, "CANNOT_LOAD_STATE_STORE.UNCATEGORIZED")


It would be nice if we can provide users the better error message e.g. snapshot file does not exist, but I'm OK with addressing this later.

Let's put it later along with the changelog file not found exception.

HeartSaVioR · 2024-06-27T07:34:44Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+  }
+
+  protected def testGetReadStoreWithStartVersion(): Unit = {
+    withTempDir(tempDir => {


HeartSaVioR · 2024-06-27T07:35:48Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+  }
+
+  protected def testSnapshotPartitionId(): Unit = {
+    withTempDir(tempDir => {


HeartSaVioR · 2024-06-27T07:37:30Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+        .option(StateSourceOptions.SNAPSHOT_START_BATCH_ID, 0)
+        .option(
+          StateSourceOptions.SNAPSHOT_PARTITION_ID,
+          spark.sessionState.conf.numShufflePartitions)


just need to be > 0

I see, it is because of limit operator.

HeartSaVioR · 2024-06-27T07:39:30Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+    })
+  }
+
+  // Todo: Should also test against state generated by 3.5


Is it remaining TODO, or does not need to be done at all? If we don't need to, let's remove the golden files for 3.5. I guess it's not intentional to test cross version compatibility.

HeartSaVioR · 2024-06-27T07:42:04Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+    checkAnswer(stateSnapshotDf, stateDf)
+  }
+
+  protected def testSnapshotOnLimitState(providerName: String): Unit = {


General comment for tests using golden file: please leave the code as comment or so how you build the golden file (the query you used), to let other be able to re-build the golden file if needed.

…play

anishshri-db · 2024-06-28T21:59:35Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

+  }
+
+  /**
+   * Consturct the state at endVersion from snapshot from snapshotVersion.


nit: Construct the state at

anishshri-db · 2024-06-28T22:12:15Z

...rc/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDBStateStoreProvider.scala

@@ -367,6 +368,22 @@ private[sql] class RocksDBStateStoreProvider
  private def verify(condition: => Boolean, msg: String): Unit = {
    if (!condition) { throw new IllegalStateException(msg) }
  }
+
+  override def replayStateFromSnapshot(snapshotVersion: Long, endVersion: Long): StateStore = {


Can you add a small function comment here ?

anishshri-db · 2024-06-28T22:13:17Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala

+    errorClass = "CANNOT_LOAD_STATE_STORE.CANNOT_READ_MISSING_SNAPSHOT_FILE",
+    messageParameters = Map(
+      "fileToRead" -> fileToRead,
+      "clazz" -> clazz))


is this a common convention for the parameter naming ? this will be visible in the error message that is thrown, correct ?

It seems so. the parameter names will not appear. I learned from here: https://github.com/apache/spark/blob/6bfeb094248269920df8b107c86f0982404935cd/common/utils/src/main/resources/error/error-conditions.json#L236C54-L236C59

anishshri-db · 2024-06-28T22:17:23Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+
+  protected def testSnapshotOnDeduplicateState(providerName: String): Unit = {
+    /** The golden files are generated by:
+    withSQLConf({


nit: indent seems odd in these places, but maybe not a big deal for such comments

Will move one tab right.

anishshri-db · 2024-06-28T22:18:13Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+    }
+     */
+    val resourceUri = this.getClass.getResource(
+      s"/structured-streaming/checkpoint-version-4.0.0/$providerName/limit/"


I thought we were going to run against 3.5.1 and then run the query once to generate the operator metadata. Did we decide against that ?

Strictly saying, the test about checkpoint with no operator metadata to create operator metadata should have been done in state metadata testing. If we don't have one, we'd better to have one, but no need to couple with this PR.

anishshri-db

lgtm - pending some minor comments

HeartSaVioR

Only nits and minors. Thanks for the patience!

HeartSaVioR · 2024-07-02T03:26:28Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

@@ -900,7 +900,8 @@ private[sql] class HDFSBackedStateStoreProvider extends StateStoreProvider with
   */
  override def replayStateFromSnapshot(snapshotVersion: Long, endVersion: Long): StateStore = {
    val newMap = replayLoadedMapForStoreFromSnapshot(snapshotVersion, endVersion)
-    logInfo(log"Retrieved version ${MDC(LogKeys.STATE_STORE_VERSION, snapshotVersion)} to " +
+    logInfo(log"Retrieved snapshot at version " +
+      log"${MDC(LogKeys.STATE_STORE_VERSION, snapshotVersion)} and apply delta files to version" +


nit: space after version, as the next string does not start with space.

HeartSaVioR · 2024-07-02T03:26:43Z

...main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala

@@ -917,9 +918,10 @@ private[sql] class HDFSBackedStateStoreProvider extends StateStoreProvider with
  override def replayReadStateFromSnapshot(snapshotVersion: Long, endVersion: Long):
    ReadStateStore = {
    val newMap = replayLoadedMapForStoreFromSnapshot(snapshotVersion, endVersion)
-    logInfo(log"Retrieved version ${MDC(LogKeys.STATE_STORE_VERSION, snapshotVersion)} to " +
+    logInfo(log"Retrieved snapshot at version " +
+      log"${MDC(LogKeys.STATE_STORE_VERSION, snapshotVersion)} and apply delta files to version" +


nit: same here

HeartSaVioR · 2024-07-02T03:38:24Z

...est/scala/org/apache/spark/sql/execution/datasources/v2/state/StateDataSourceReadSuite.scala

+    }
+     */
+    val resourceUri = this.getClass.getResource(
+      s"/structured-streaming/checkpoint-version-4.0.0/$providerName/limit/"


Strictly saying, the test about checkpoint with no operator metadata to create operator metadata should have been done in state metadata testing. If we don't have one, we'd better to have one, but no need to couple with this PR.

HeartSaVioR · 2024-07-02T03:39:04Z

sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/StateStoreErrors.scala

@@ -264,3 +296,8 @@ class StateStoreValueRowFormatValidationFailure(errorMsg: String)
  extends SparkRuntimeException(
    errorClass = "STATE_STORE_VALUE_ROW_FORMAT_VALIDATION_FAILURE",
    messageParameters = Map("errorMsg" -> errorMsg))
+
+class StateStoreProviderDoesNotSupportFineGrainedReplay(inputClass: String)
+ extends SparkUnsupportedOperationException(


before e, it's only one space.

eason-yuchen-liu · 2024-07-02T04:55:29Z

Thanks for all the careful checks by @HeartSaVioR @anishshri-db @WweiL. This PR is ready to merge.

HeartSaVioR

+1

HeartSaVioR · 2024-07-02T20:15:34Z

Thanks! Merging to master.

…to State Data Source ### What changes were proposed in this pull request? In #46944 and #47188, we introduced some new options to the State Data Source. This PR aims to explain these new features in the documentation. ### Why are the changes needed? It is necessary to reflect the latest change in the documentation website. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The API Doc website can be rendered correctly. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47274 from eason-yuchen-liu/snapshot-doc. Authored-by: Yuchen Liu <yuchen.liu@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

…to State Data Source ### What changes were proposed in this pull request? In apache#46944 and apache#47188, we introduced some new options to the State Data Source. This PR aims to explain these new features in the documentation. ### Why are the changes needed? It is necessary to reflect the latest change in the documentation website. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The API Doc website can be rendered correctly. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47274 from eason-yuchen-liu/snapshot-doc. Authored-by: Yuchen Liu <yuchen.liu@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

…rtitionId to state data source ### What changes were proposed in this pull request? This PR defines two new options, snapshotStartBatchId and snapshotPartitionId, for the existing state reader. Both of them should be provided at the same time. 1. When there is no snapshot file at `snapshotStartBatch` (note there is an off-by-one issue between version and batch Id), throw an exception. 2. Otherwise, the reader should continue to rebuild the state by reading delta files only, and ignore all snapshot files afterwards. 3. Note that if a `batchId` option is already specified. That batchId is the ending batchId, we should then end at that batchId. 4. This feature supports state generated by HDFS state store provider and RocksDB state store provider with changelog checkpointing enabled. **It does not support RocksDB with changelog disabled which is the default for RocksDB.** ### Why are the changes needed? Sometimes when a snapshot is corrupted, users want to bypass it when reading a later state. This PR gives user ability to specify the starting snapshot version and partition. This feature can be useful for debugging purpose. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Created test cases for testing edge cases for the input of new options. Created test for the new public function `replayReadStateFromSnapshot`. Created integration test for the new options against four stateful operators: limit, aggregation, deduplication, stream-stream join. Instead of generating states within the tests which is unstable, I prepare golden files for the integration test. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#46944 from eason-yuchen-liu/skipSnapshotAtBatch. Lead-authored-by: Yuchen Liu <yuchen.liu@databricks.com> Co-authored-by: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

…to State Data Source ### What changes were proposed in this pull request? In apache#46944 and apache#47188, we introduced some new options to the State Data Source. This PR aims to explain these new features in the documentation. ### Why are the changes needed? It is necessary to reflect the latest change in the documentation website. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? The API Doc website can be rendered correctly. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47274 from eason-yuchen-liu/snapshot-doc. Authored-by: Yuchen Liu <yuchen.liu@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>

eason-yuchen-liu and others added 13 commits June 4, 2024 15:28

initial implementation

6db0e3d

Merge branch 'skipSnapshotAtBatch' of https://github.com/eason-yuchen…

7dad0c1

…-liu/spark into skipSnapshotAtBatch

add test cases for two options in HDFS state store

2475173

allow rocksdb to reconstruct state from a specific checkpoint

07267b5

test directly on the method instead of end to end

9d902d7

Merge branch 'apache:master' into skipSnapshotAtBatch

eddb3c7

make sure test is stable

1a3d20a

delete useless test files

292ec5d

add new test on partition not found error

aa337c1

clean up and format

dfa712e

move partition error

4ebd078

improve doc

1656580

minor

61dea35

github-actions bot added SQL STRUCTURED STREAMING labels Jun 11, 2024

eason-yuchen-liu changed the title ~~[SPARK-48588][SS] Add option snapshotStartBatchId and snapshotPartitionId to state data source~~ [SPARK-48589][SS] Add option snapshotStartBatchId and snapshotPartitionId to state data source Jun 11, 2024

eason-yuchen-liu changed the title ~~[SPARK-48589][SS] Add option snapshotStartBatchId and snapshotPartitionId to state data source~~ [SPARK-48589][SQL][SS] Add option snapshotStartBatchId and snapshotPartitionId to state data source Jun 11, 2024

support reading join states

5229152

eason-yuchen-liu marked this pull request as ready for review June 12, 2024 20:23

WweiL reviewed Jun 13, 2024

View reviewed changes

anishshri-db reviewed Jun 13, 2024

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/errors/QueryExecutionErrors.scala Outdated Show resolved Hide resolved

anishshri-db reviewed Jun 13, 2024

View reviewed changes

eason-yuchen-liu and others added 5 commits June 13, 2024 11:45

address reviews by Wei partially

4825215

address comments from Anish & Wei

20e1b9c

Merge branch 'master' into skipSnapshotAtBatch

9eb6c76

log StateSourceOptions optionally

4d4cd70

Merge branch 'skipSnapshotAtBatch' of https://github.com/eason-yuchen…

1870b35

…-liu/spark into skipSnapshotAtBatch

eason-yuchen-liu requested a review from WweiL June 13, 2024 21:26

eason-yuchen-liu requested review from anishshri-db and HeartSaVioR June 26, 2024 22:33

HeartSaVioR reviewed Jun 27, 2024

View reviewed changes

eason-yuchen-liu added 5 commits June 27, 2024 11:05

rename to startVersion to snapshotVersion to make its function clear

e15213e

rename SupportsFineGrainedReplayFromSnapshot to SupportsFineGrainedRe…

42d952f

…play

reflect more comments from Jungtaek

6f1425d

throw the exception

4deb63e

provide the script to regenerate golden files

d140708

eason-yuchen-liu requested a review from HeartSaVioR June 27, 2024 22:58

anishshri-db reviewed Jun 28, 2024

View reviewed changes

anishshri-db approved these changes Jun 28, 2024

View reviewed changes

address comments from Anish

337785d

HeartSaVioR reviewed Jul 2, 2024

View reviewed changes

minor

9dbe295

eason-yuchen-liu requested a review from HeartSaVioR July 2, 2024 17:05

HeartSaVioR approved these changes Jul 2, 2024

View reviewed changes

HeartSaVioR closed this in ee0d306 Jul 2, 2024

eason-yuchen-liu mentioned this pull request Jul 9, 2024

[SPARK-48850][DOCS][SS][SQL] Add documentation for new options added to State Data Source #47274

Closed

[SPARK-48589][SQL][SS] Add option snapshotStartBatchId and snapshotPartitionId to state data source #46944

[SPARK-48589][SQL][SS] Add option snapshotStartBatchId and snapshotPartitionId to state data source #46944

Conversation

eason-yuchen-liu commented Jun 11, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

eason-yuchen-liu commented Jun 12, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

WweiL commented Jun 13, 2024

Choose a reason for hiding this comment

eason-yuchen-liu Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eason-yuchen-liu Jun 27, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eason-yuchen-liu Jun 28, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

anishshri-db left a comment

Choose a reason for hiding this comment

HeartSaVioR left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

eason-yuchen-liu commented Jul 2, 2024

HeartSaVioR left a comment

Choose a reason for hiding this comment

HeartSaVioR commented Jul 2, 2024

eason-yuchen-liu commented Jun 11, 2024 •

edited

Loading

eason-yuchen-liu commented Jun 12, 2024 •

edited

Loading

eason-yuchen-liu Jun 27, 2024 •

edited

Loading

eason-yuchen-liu Jun 27, 2024 •

edited

Loading

eason-yuchen-liu Jun 28, 2024 •

edited

Loading