Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48772][SS][SQL] State Data Source Change Feed Reader Mode #47188

Closed

Commits on Jul 2, 2024

  1. Squashed commit of the following:

    commit 261c671
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jul 2 13:57:57 2024 -0700
    
        solve conflict
    
    commit 39d0b17
    Merge: 9af25f1 c2d59b0
    Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com>
    Date:   Tue Jul 2 13:45:12 2024 -0700
    
        rebase to master
    
    commit c2d59b0
    Merge: 9cf8b25 9af25f1
    Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com>
    Date:   Tue Jul 2 13:44:50 2024 -0700
    
        Merge branch 'skipSnapshotAtBatch' into state-cdc
    
    commit 9af25f1
    Merge: 8fa9ef5 fea930a
    Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com>
    Date:   Tue Jul 2 13:23:25 2024 -0700
    
        Merge branch 'apache:master' into skipSnapshotAtBatch
    
    commit fea930a
    Author: Anish Shrigondekar <anish.shrigondekar@databricks.com>
    Date:   Wed Jul 3 05:21:50 2024 +0900
    
        [SPARK-48770][SS] Change to read operator metadata once on driver to check if we can find info for numColsPrefixKey used for session window agg queries
    
        ### What changes were proposed in this pull request?
        Change to read operator metadata once on driver to check if we can find info for numColsPrefixKey used for session window agg queries
    
        ### Why are the changes needed?
        Avoid reading the operator metadata file multiple times on the executors
    
        ### Does this PR introduce _any_ user-facing change?
        No
    
        ### How was this patch tested?
        Existing unit tests
    
        ```
        ===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.datasources.v2.state.RocksDBStateDataSourceReadSuite, threads: ForkJoinPool.commonPool-worker-6 (daemon=true), ForkJoinPool.commonPool-worker-4 (daemon=true), Idle Worker Monitor for python3 (daemon=true), ForkJoinPool.commonPool-worker-7 (daemon=true), ForkJoinPool.commonPool-worker-5 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), rpc-boss-3-1 (daemon=true), ForkJoinPool.commonPool-worker-8 (daemon=true), shuffle-boss-6-1 (daemon=tru...
        [info] Run completed in 1 minute, 39 seconds.
        [info] Total number of tests run: 14
        [info] Suites: completed 1, aborted 0
        [info] Tests: succeeded 14, failed 0, canceled 0, ignored 0, pending 0
        [info] All tests passed.
        ```
    
        ### Was this patch authored or co-authored using generative AI tooling?
        No
    
        Closes apache#47167 from anishshri-db/task/SPARK-48770.
    
        Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>
        Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
    
    commit 8fa9ef5
    Merge: 9dbe295 ee0d306
    Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com>
    Date:   Tue Jul 2 13:21:01 2024 -0700
    
        Merge branch 'apache:master' into skipSnapshotAtBatch
    
    commit 9cf8b25
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jul 2 10:53:53 2024 -0700
    
        add input error tests
    
    commit 7354408
    Merge: 6d6d511 9dbe295
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jul 2 10:17:34 2024 -0700
    
        Merge branch 'skipSnapshotAtBatch' into state-cdc
    
    commit 9dbe295
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jul 1 21:54:33 2024 -0700
    
        minor
    
    commit 6d6d511
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jul 1 15:53:04 2024 -0700
    
        move StateStoreChangeDataReader to other files and delete it
    
    commit 104ba9c
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jul 1 15:36:08 2024 -0700
    
        rename PUT to update
    
    commit 12298b2
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jul 1 13:09:02 2024 -0700
    
        minor
    
    commit 75839ac
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jul 1 13:03:59 2024 -0700
    
        name all cdc to changeData
    
    commit ace711c
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jul 1 12:49:07 2024 -0700
    
        check validity of input to options
    
    commit 3834cc9
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 28 17:51:16 2024 -0700
    
        solve format issue
    
    commit 337785d
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 28 17:07:18 2024 -0700
    
        address comments from Anish
    
    commit 15a8316
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 28 16:46:57 2024 -0700
    
        refactor StateStoreChangeDataReader
    
    commit b1eb8c4
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 28 15:03:09 2024 -0700
    
        add integration tests to the new features
    
    commit 7c6cdad
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 27 16:35:46 2024 -0700
    
        unify the two traits
    
    commit cd6a39b
    Merge: 271b98e d140708
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 27 16:22:45 2024 -0700
    
        Merge branch 'skipSnapshotAtBatch' into state-cdc
    
    commit d140708
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 27 15:17:06 2024 -0700
    
        provide the script to regenerate golden files
    
    commit 4deb63e
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 27 14:22:00 2024 -0700
    
        throw the exception
    
    commit 6f1425d
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 27 12:09:54 2024 -0700
    
        reflect more comments from Jungtaek
    
    commit 42d952f
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 27 11:11:33 2024 -0700
    
        rename SupportsFineGrainedReplayFromSnapshot to SupportsFineGrainedReplay
    
    commit e15213e
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 27 11:05:50 2024 -0700
    
        rename to startVersion to snapshotVersion to make its function clear
    
    commit 271b98e
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Wed Jun 26 15:46:33 2024 -0700
    
        make sure StateStoreChangeData is used everywhere
    
    commit ff5bff2
    Merge: 6922595 40b6dc6
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Wed Jun 26 15:22:19 2024 -0700
    
        Merge branch 'skipSnapshotAtBatch' into state-cdc
    
    commit 40b6dc6
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Wed Jun 26 10:59:17 2024 -0700
    
        move error to StateStoreErrors
    
    commit 23639f4
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Wed Jun 26 10:44:22 2024 -0700
    
        create new error for SupportsFineGrainedReplayFromSnapshot
    
    commit 97ee3ef
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Wed Jun 26 10:25:57 2024 -0700
    
        some naming and formatting comments from Anish and Jungtaek
    
    commit 1a23abb
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 25 14:56:07 2024 -0700
    
        refactor the code to isolate from current state stores used by streaming queries
    
    commit 876256e
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 25 12:29:40 2024 -0700
    
        reflect comments from Jungtaek
    
    commit ef9b095
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 25 12:08:34 2024 -0700
    
        create integration test against golden files
    
    commit 6922595
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jun 24 13:44:19 2024 -0700
    
        stage
    
    commit 3ece6f2
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 21 21:22:50 2024 -0700
    
        resort error-conditions
    
    commit be30817
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 21 17:30:12 2024 -0700
    
        Reflect more comments from Anish
    
    commit cf84d50
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 21 14:02:58 2024 -0700
    
        support hdfs state store provider
    
    commit 752cdc7
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 20 17:51:33 2024 -0700
    
        separate CDCPartitionReader from StatePartitionReader
    
    commit bd87055
    Merge: 2184396 2eb6646
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 20 17:29:31 2024 -0700
    
        Merge branch 'skipSnapshotAtBatch' into state-cdc
    
    commit 2eb6646
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 20 17:10:45 2024 -0700
    
        also update the name of StateTable
    
    commit 2184396
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 20 17:03:18 2024 -0700
    
        hdfs initial implementation
    
    commit 3f266c1
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jun 17 09:46:07 2024 -0700
    
        style
    
    commit fe9cea1
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 14 12:50:21 2024 -0700
    
        address more comments from Anish
    
    commit 1870b35
    Merge: 4d4cd70 9eb6c76
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 13 14:25:23 2024 -0700
    
        Merge branch 'skipSnapshotAtBatch' of https://github.com/eason-yuchen-liu/spark into skipSnapshotAtBatch
    
    commit 4d4cd70
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 13 14:24:55 2024 -0700
    
        log StateSourceOptions optionally
    
    commit 9eb6c76
    Merge: 20e1b9c 08e741b
    Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com>
    Date:   Thu Jun 13 14:18:50 2024 -0700
    
        Merge branch 'master' into skipSnapshotAtBatch
    
    commit 20e1b9c
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 13 14:16:14 2024 -0700
    
        address comments from Anish & Wei
    
    commit 4825215
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 13 11:45:55 2024 -0700
    
        address reviews by Wei partially
    
    commit 5229152
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Wed Jun 12 11:29:46 2024 -0700
    
        support reading join states
    
    commit 61dea35
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 11 13:16:56 2024 -0700
    
        minor
    
    commit 1656580
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 11 12:07:06 2024 -0700
    
        improve doc
    
    commit 4ebd078
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 11 11:48:30 2024 -0700
    
        move partition error
    
    commit dfa712e
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 11 11:42:09 2024 -0700
    
        clean up and format
    
    commit aa337c1
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 11 10:22:59 2024 -0700
    
        add new test on partition not found error
    
    commit 292ec5d
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jun 10 16:54:38 2024 -0700
    
        delete useless test files
    
    commit 1a3d20a
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jun 10 16:52:22 2024 -0700
    
        make sure test is stable
    
    commit eddb3c7
    Merge: 9d902d7 5a2f374
    Author: Yuchen Liu <170372783+eason-yuchen-liu@users.noreply.github.com>
    Date:   Mon Jun 10 11:43:03 2024 -0700
    
        Merge branch 'apache:master' into skipSnapshotAtBatch
    
    commit 9d902d7
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Mon Jun 10 11:13:02 2024 -0700
    
        test directly on the method instead of end to end
    
    commit 07267b5
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Fri Jun 7 16:43:45 2024 -0700
    
        allow rocksdb to reconstruct state from a specific checkpoint
    
    commit 2475173
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Thu Jun 6 10:32:56 2024 -0700
    
        add test cases for two options in HDFS state store
    
    commit 7dad0c1
    Merge: 6db0e3d 8a0927c
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 4 15:30:20 2024 -0700
    
        Merge branch 'skipSnapshotAtBatch' of https://github.com/eason-yuchen-liu/spark into skipSnapshotAtBatch
    
    commit 6db0e3d
    Author: Yuchen Liu <yuchen.liu@databricks.com>
    Date:   Tue Jun 4 15:28:49 2024 -0700
    
        initial implementation
    eason-yuchen-liu committed Jul 2, 2024
    Configuration menu
    Copy the full SHA
    1ade442 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    98bf8ec View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    fb890ae View commit details
    Browse the repository at this point in the history
  4. Add comments

    eason-yuchen-liu committed Jul 2, 2024
    Configuration menu
    Copy the full SHA
    db45c6f View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    1926e5e View commit details
    Browse the repository at this point in the history
  6. minor

    eason-yuchen-liu committed Jul 2, 2024
    Configuration menu
    Copy the full SHA
    24c0351 View commit details
    Browse the repository at this point in the history

Commits on Jul 8, 2024

  1. Configuration menu
    Copy the full SHA
    d4a4b80 View commit details
    Browse the repository at this point in the history
  2. Configuration menu
    Copy the full SHA
    42552ac View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    24db837 View commit details
    Browse the repository at this point in the history
  4. minor

    eason-yuchen-liu committed Jul 8, 2024
    Configuration menu
    Copy the full SHA
    adde991 View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    d3ca86c View commit details
    Browse the repository at this point in the history
  6. minor

    eason-yuchen-liu committed Jul 8, 2024
    Configuration menu
    Copy the full SHA
    5199c56 View commit details
    Browse the repository at this point in the history

Commits on Jul 9, 2024

  1. Use NextIterator as the interface rather than StateStoreChangeDataRea…

    …der & change the existing StateStoreChangelogReader to correctly implement the interface
    eason-yuchen-liu committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    ce75133 View commit details
    Browse the repository at this point in the history
  2. more doc

    eason-yuchen-liu committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    84dcf15 View commit details
    Browse the repository at this point in the history
  3. Configuration menu
    Copy the full SHA
    22a086b View commit details
    Browse the repository at this point in the history
  4. Configuration menu
    Copy the full SHA
    c797d0b View commit details
    Browse the repository at this point in the history
  5. Configuration menu
    Copy the full SHA
    5921479 View commit details
    Browse the repository at this point in the history
  6. Configuration menu
    Copy the full SHA
    e5674cf View commit details
    Browse the repository at this point in the history
  7. pass tests

    eason-yuchen-liu committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    c012e1a View commit details
    Browse the repository at this point in the history
  8. continue

    eason-yuchen-liu committed Jul 9, 2024
    Configuration menu
    Copy the full SHA
    ff0cd43 View commit details
    Browse the repository at this point in the history

Commits on Jul 10, 2024

  1. Configuration menu
    Copy the full SHA
    2ad7590 View commit details
    Browse the repository at this point in the history
  2. continue

    eason-yuchen-liu committed Jul 10, 2024
    Configuration menu
    Copy the full SHA
    43420f6 View commit details
    Browse the repository at this point in the history