-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48903][SS] Set the RocksDB last snapshot version correctly on remote load #47363
Conversation
cc - @HeartSaVioR - PTAL, thx ! |
@chaoqin-li1123 - could you PTAL too ? Thx |
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/RocksDB.scala
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM, thanks for making the improvement!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Only a comment for test.
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/RocksDBSuite.scala
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 pending CI
Thanks! Merged to master. |
…remote load ### What changes were proposed in this pull request? Set the RocksDB last snapshot version correctly on remote load ### Why are the changes needed? Avoid creating full snapshot on every first batch after restart and also reset a snapshot that is likely no longer valid ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit tests ``` ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.streaming.state.RocksDBSuite, threads: ForkJoinPool.commonPool-worker-6 (daemon=true), ForkJoinPool.commonPool-worker-4 (daemon=true), ForkJoinPool.commonPool-worker-7 (daemon=true), ForkJoinPool.commonPool-worker-5 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), rpc-boss-3-1 (daemon=true), ForkJoinPool.commonPool-worker-8 (daemon=true), shuffle-boss-6-1 (daemon=true), ForkJoinPool.commonPool-worker-1 (daemon=true), ForkJoinPool.common... [info] Run completed in 4 minutes, 40 seconds. [info] Total number of tests run: 176 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 176, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47363 from anishshri-db/task/SPARK-48903. Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
…remote load ### What changes were proposed in this pull request? Set the RocksDB last snapshot version correctly on remote load ### Why are the changes needed? Avoid creating full snapshot on every first batch after restart and also reset a snapshot that is likely no longer valid ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Added unit tests ``` ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.streaming.state.RocksDBSuite, threads: ForkJoinPool.commonPool-worker-6 (daemon=true), ForkJoinPool.commonPool-worker-4 (daemon=true), ForkJoinPool.commonPool-worker-7 (daemon=true), ForkJoinPool.commonPool-worker-5 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), rpc-boss-3-1 (daemon=true), ForkJoinPool.commonPool-worker-8 (daemon=true), shuffle-boss-6-1 (daemon=true), ForkJoinPool.commonPool-worker-1 (daemon=true), ForkJoinPool.common... [info] Run completed in 4 minutes, 40 seconds. [info] Total number of tests run: 176 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 176, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47363 from anishshri-db/task/SPARK-48903. Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
What changes were proposed in this pull request?
Set the RocksDB last snapshot version correctly on remote load
Why are the changes needed?
Avoid creating full snapshot on every first batch after restart and also reset a snapshot that is likely no longer valid
Does this PR introduce any user-facing change?
No
How was this patch tested?
Added unit tests
Was this patch authored or co-authored using generative AI tooling?
No