-
Notifications
You must be signed in to change notification settings - Fork 28.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SPARK-48772][SS][SQL] State Data Source Change Feed Reader Mode #47188
Commits on Jul 2, 2024
-
Squashed commit of the following:
commit 261c671 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jul 2 13:57:57 2024 -0700 solve conflict commit 39d0b17 Merge: 9af25f1 c2d59b0 Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com> Date: Tue Jul 2 13:45:12 2024 -0700 rebase to master commit c2d59b0 Merge: 9cf8b25 9af25f1 Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com> Date: Tue Jul 2 13:44:50 2024 -0700 Merge branch 'skipSnapshotAtBatch' into state-cdc commit 9af25f1 Merge: 8fa9ef5 fea930a Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com> Date: Tue Jul 2 13:23:25 2024 -0700 Merge branch 'apache:master' into skipSnapshotAtBatch commit fea930a Author: Anish Shrigondekar <anish.shrigondekar@databricks.com> Date: Wed Jul 3 05:21:50 2024 +0900 [SPARK-48770][SS] Change to read operator metadata once on driver to check if we can find info for numColsPrefixKey used for session window agg queries ### What changes were proposed in this pull request? Change to read operator metadata once on driver to check if we can find info for numColsPrefixKey used for session window agg queries ### Why are the changes needed? Avoid reading the operator metadata file multiple times on the executors ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Existing unit tests ``` ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.datasources.v2.state.RocksDBStateDataSourceReadSuite, threads: ForkJoinPool.commonPool-worker-6 (daemon=true), ForkJoinPool.commonPool-worker-4 (daemon=true), Idle Worker Monitor for python3 (daemon=true), ForkJoinPool.commonPool-worker-7 (daemon=true), ForkJoinPool.commonPool-worker-5 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), rpc-boss-3-1 (daemon=true), ForkJoinPool.commonPool-worker-8 (daemon=true), shuffle-boss-6-1 (daemon=tru... [info] Run completed in 1 minute, 39 seconds. [info] Total number of tests run: 14 [info] Suites: completed 1, aborted 0 [info] Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0 [info] All tests passed. ``` ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47167 from anishshri-db/task/SPARK-48770. Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com> Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com> commit 8fa9ef5 Merge: 9dbe295 ee0d306 Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com> Date: Tue Jul 2 13:21:01 2024 -0700 Merge branch 'apache:master' into skipSnapshotAtBatch commit 9cf8b25 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jul 2 10:53:53 2024 -0700 add input error tests commit 7354408 Merge: 6d6d511 9dbe295 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jul 2 10:17:34 2024 -0700 Merge branch 'skipSnapshotAtBatch' into state-cdc commit 9dbe295 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jul 1 21:54:33 2024 -0700 minor commit 6d6d511 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jul 1 15:53:04 2024 -0700 move StateStoreChangeDataReader to other files and delete it commit 104ba9c Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jul 1 15:36:08 2024 -0700 rename PUT to update commit 12298b2 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jul 1 13:09:02 2024 -0700 minor commit 75839ac Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jul 1 13:03:59 2024 -0700 name all cdc to changeData commit ace711c Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jul 1 12:49:07 2024 -0700 check validity of input to options commit 3834cc9 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 28 17:51:16 2024 -0700 solve format issue commit 337785d Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 28 17:07:18 2024 -0700 address comments from Anish commit 15a8316 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 28 16:46:57 2024 -0700 refactor StateStoreChangeDataReader commit b1eb8c4 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 28 15:03:09 2024 -0700 add integration tests to the new features commit 7c6cdad Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 27 16:35:46 2024 -0700 unify the two traits commit cd6a39b Merge: 271b98e d140708 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 27 16:22:45 2024 -0700 Merge branch 'skipSnapshotAtBatch' into state-cdc commit d140708 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 27 15:17:06 2024 -0700 provide the script to regenerate golden files commit 4deb63e Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 27 14:22:00 2024 -0700 throw the exception commit 6f1425d Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 27 12:09:54 2024 -0700 reflect more comments from Jungtaek commit 42d952f Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 27 11:11:33 2024 -0700 rename SupportsFineGrainedReplayFromSnapshot to SupportsFineGrainedReplay commit e15213e Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 27 11:05:50 2024 -0700 rename to startVersion to snapshotVersion to make its function clear commit 271b98e Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Wed Jun 26 15:46:33 2024 -0700 make sure StateStoreChangeData is used everywhere commit ff5bff2 Merge: 6922595 40b6dc6 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Wed Jun 26 15:22:19 2024 -0700 Merge branch 'skipSnapshotAtBatch' into state-cdc commit 40b6dc6 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Wed Jun 26 10:59:17 2024 -0700 move error to StateStoreErrors commit 23639f4 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Wed Jun 26 10:44:22 2024 -0700 create new error for SupportsFineGrainedReplayFromSnapshot commit 97ee3ef Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Wed Jun 26 10:25:57 2024 -0700 some naming and formatting comments from Anish and Jungtaek commit 1a23abb Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 25 14:56:07 2024 -0700 refactor the code to isolate from current state stores used by streaming queries commit 876256e Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 25 12:29:40 2024 -0700 reflect comments from Jungtaek commit ef9b095 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 25 12:08:34 2024 -0700 create integration test against golden files commit 6922595 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jun 24 13:44:19 2024 -0700 stage commit 3ece6f2 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 21 21:22:50 2024 -0700 resort error-conditions commit be30817 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 21 17:30:12 2024 -0700 Reflect more comments from Anish commit cf84d50 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 21 14:02:58 2024 -0700 support hdfs state store provider commit 752cdc7 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 20 17:51:33 2024 -0700 separate CDCPartitionReader from StatePartitionReader commit bd87055 Merge: 2184396 2eb6646 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 20 17:29:31 2024 -0700 Merge branch 'skipSnapshotAtBatch' into state-cdc commit 2eb6646 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 20 17:10:45 2024 -0700 also update the name of StateTable commit 2184396 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 20 17:03:18 2024 -0700 hdfs initial implementation commit 3f266c1 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jun 17 09:46:07 2024 -0700 style commit fe9cea1 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 14 12:50:21 2024 -0700 address more comments from Anish commit 1870b35 Merge: 4d4cd70 9eb6c76 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 13 14:25:23 2024 -0700 Merge branch 'skipSnapshotAtBatch' of https://github.com/eason-yuchen-liu/spark into skipSnapshotAtBatch commit 4d4cd70 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 13 14:24:55 2024 -0700 log StateSourceOptions optionally commit 9eb6c76 Merge: 20e1b9c 08e741b Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com> Date: Thu Jun 13 14:18:50 2024 -0700 Merge branch 'master' into skipSnapshotAtBatch commit 20e1b9c Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 13 14:16:14 2024 -0700 address comments from Anish & Wei commit 4825215 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 13 11:45:55 2024 -0700 address reviews by Wei partially commit 5229152 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Wed Jun 12 11:29:46 2024 -0700 support reading join states commit 61dea35 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 11 13:16:56 2024 -0700 minor commit 1656580 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 11 12:07:06 2024 -0700 improve doc commit 4ebd078 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 11 11:48:30 2024 -0700 move partition error commit dfa712e Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 11 11:42:09 2024 -0700 clean up and format commit aa337c1 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 11 10:22:59 2024 -0700 add new test on partition not found error commit 292ec5d Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jun 10 16:54:38 2024 -0700 delete useless test files commit 1a3d20a Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jun 10 16:52:22 2024 -0700 make sure test is stable commit eddb3c7 Merge: 9d902d7 5a2f374 Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com> Date: Mon Jun 10 11:43:03 2024 -0700 Merge branch 'apache:master' into skipSnapshotAtBatch commit 9d902d7 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Mon Jun 10 11:13:02 2024 -0700 test directly on the method instead of end to end commit 07267b5 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Fri Jun 7 16:43:45 2024 -0700 allow rocksdb to reconstruct state from a specific checkpoint commit 2475173 Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Thu Jun 6 10:32:56 2024 -0700 add test cases for two options in HDFS state store commit 7dad0c1 Merge: 6db0e3d 8a0927c Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 4 15:30:20 2024 -0700 Merge branch 'skipSnapshotAtBatch' of https://github.com/eason-yuchen-liu/spark into skipSnapshotAtBatch commit 6db0e3d Author: Yuchen Liu <yuchen.liu@databricks.com> Date: Tue Jun 4 15:28:49 2024 -0700 initial implementation
Configuration menu - View commit details
-
Copy full SHA for 1ade442 - Browse repository at this point
Copy the full SHA 1ade442View commit details -
Configuration menu - View commit details
-
Copy full SHA for 98bf8ec - Browse repository at this point
Copy the full SHA 98bf8ecView commit details -
Configuration menu - View commit details
-
Copy full SHA for fb890ae - Browse repository at this point
Copy the full SHA fb890aeView commit details -
Configuration menu - View commit details
-
Copy full SHA for db45c6f - Browse repository at this point
Copy the full SHA db45c6fView commit details -
Merge branch 'readStateChange' of https://github.com/eason-yuchen-liu…
…/spark into readStateChange
Configuration menu - View commit details
-
Copy full SHA for 1926e5e - Browse repository at this point
Copy the full SHA 1926e5eView commit details -
Configuration menu - View commit details
-
Copy full SHA for 24c0351 - Browse repository at this point
Copy the full SHA 24c0351View commit details
Commits on Jul 8, 2024
-
Configuration menu - View commit details
-
Copy full SHA for d4a4b80 - Browse repository at this point
Copy the full SHA d4a4b80View commit details -
Configuration menu - View commit details
-
Copy full SHA for 42552ac - Browse repository at this point
Copy the full SHA 42552acView commit details -
Configuration menu - View commit details
-
Copy full SHA for 24db837 - Browse repository at this point
Copy the full SHA 24db837View commit details -
Configuration menu - View commit details
-
Copy full SHA for adde991 - Browse repository at this point
Copy the full SHA adde991View commit details -
Configuration menu - View commit details
-
Copy full SHA for d3ca86c - Browse repository at this point
Copy the full SHA d3ca86cView commit details -
Configuration menu - View commit details
-
Copy full SHA for 5199c56 - Browse repository at this point
Copy the full SHA 5199c56View commit details
Commits on Jul 9, 2024
-
Use NextIterator as the interface rather than StateStoreChangeDataRea…
…der & change the existing StateStoreChangelogReader to correctly implement the interface
Configuration menu - View commit details
-
Copy full SHA for ce75133 - Browse repository at this point
Copy the full SHA ce75133View commit details -
Configuration menu - View commit details
-
Copy full SHA for 84dcf15 - Browse repository at this point
Copy the full SHA 84dcf15View commit details -
Configuration menu - View commit details
-
Copy full SHA for 22a086b - Browse repository at this point
Copy the full SHA 22a086bView commit details -
Configuration menu - View commit details
-
Copy full SHA for c797d0b - Browse repository at this point
Copy the full SHA c797d0bView commit details -
Merge branch 'readStateChange' of https://github.com/eason-yuchen-liu…
…/spark into readStateChange
Configuration menu - View commit details
-
Copy full SHA for 5921479 - Browse repository at this point
Copy the full SHA 5921479View commit details -
Configuration menu - View commit details
-
Copy full SHA for e5674cf - Browse repository at this point
Copy the full SHA e5674cfView commit details -
Configuration menu - View commit details
-
Copy full SHA for c012e1a - Browse repository at this point
Copy the full SHA c012e1aView commit details -
Configuration menu - View commit details
-
Copy full SHA for ff0cd43 - Browse repository at this point
Copy the full SHA ff0cd43View commit details
Commits on Jul 10, 2024
-
Configuration menu - View commit details
-
Copy full SHA for 2ad7590 - Browse repository at this point
Copy the full SHA 2ad7590View commit details -
Configuration menu - View commit details
-
Copy full SHA for 43420f6 - Browse repository at this point
Copy the full SHA 43420f6View commit details