Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-48891][SS] Refactor StateSchemaCompatibilityChecker to unify all state schema formats #47359

Closed
wants to merge 23 commits into from

Conversation

anishshri-db
Copy link
Contributor

@anishshri-db anishshri-db commented Jul 15, 2024

What changes were proposed in this pull request?

Refactor StateSchemaCompatibilityChecker to unify all state schema formats

Why are the changes needed?

Needed to integrate future changes around state data source reader and schema evolution and consolidate these changes

  • Consolidates all state schema reader/writers in one place
  • Consolidates all validation logic through the same API

Does this PR introduce any user-facing change?

No

How was this patch tested?

Added unit tests

12:38:45.481 WARN org.apache.spark.sql.execution.streaming.state.StateSchemaCompatibilityCheckerSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.streaming.state.StateSchemaCompatibilityCheckerSuite, threads: rpc-boss-3-1 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), ForkJoinPool.commonPool-worker-2 (daemon=true), shuffle-boss-6-1 (daemon=true), ForkJoinPool.commonPool-worker-1 (daemon=true) =====
[info] Run completed in 12 seconds, 565 milliseconds.
[info] Total number of tests run: 30
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 30, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.

Was this patch authored or co-authored using generative AI tooling?

No

@anishshri-db anishshri-db changed the title [SPARK-48891] Refactor StateSchemaCompatibilityChecker to unify all state schema formats [DO-NOT-MERGE][SPARK-48891] Refactor StateSchemaCompatibilityChecker to unify all state schema formats Jul 15, 2024
@anishshri-db anishshri-db marked this pull request as draft July 15, 2024 20:29
@anishshri-db anishshri-db changed the title [DO-NOT-MERGE][SPARK-48891] Refactor StateSchemaCompatibilityChecker to unify all state schema formats [SPARK-48891][SS] Refactor StateSchemaCompatibilityChecker to unify all state schema formats Jul 18, 2024
@anishshri-db anishshri-db marked this pull request as ready for review July 18, 2024 20:04
@anishshri-db
Copy link
Contributor Author

cc - @ericm-db @jingz-db - PTAL, thx !

Copy link
Contributor

@ericm-db ericm-db left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@anishshri-db
Copy link
Contributor Author

@HeartSaVioR - could you PTAL ? Thx !

Copy link
Contributor

@HeartSaVioR HeartSaVioR left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1

@HeartSaVioR
Copy link
Contributor

Thanks! Merging to master.

jingz-db pushed a commit to jingz-db/spark that referenced this pull request Jul 22, 2024
…ll state schema formats

### What changes were proposed in this pull request?
Refactor StateSchemaCompatibilityChecker to unify all state schema formats

### Why are the changes needed?
Needed to integrate future changes around state data source reader and schema evolution and consolidate these changes

- Consolidates all state schema reader/writers in one place
- Consolidates all validation logic through the same API

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit tests

```
12:38:45.481 WARN org.apache.spark.sql.execution.streaming.state.StateSchemaCompatibilityCheckerSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.streaming.state.StateSchemaCompatibilityCheckerSuite, threads: rpc-boss-3-1 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), ForkJoinPool.commonPool-worker-2 (daemon=true), shuffle-boss-6-1 (daemon=true), ForkJoinPool.commonPool-worker-1 (daemon=true) =====
[info] Run completed in 12 seconds, 565 milliseconds.
[info] Total number of tests run: 30
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 30, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47359 from anishshri-db/task/SPARK-48891.

Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
…ll state schema formats

### What changes were proposed in this pull request?
Refactor StateSchemaCompatibilityChecker to unify all state schema formats

### Why are the changes needed?
Needed to integrate future changes around state data source reader and schema evolution and consolidate these changes

- Consolidates all state schema reader/writers in one place
- Consolidates all validation logic through the same API

### Does this PR introduce _any_ user-facing change?
No

### How was this patch tested?
Added unit tests

```
12:38:45.481 WARN org.apache.spark.sql.execution.streaming.state.StateSchemaCompatibilityCheckerSuite:

===== POSSIBLE THREAD LEAK IN SUITE o.a.s.sql.execution.streaming.state.StateSchemaCompatibilityCheckerSuite, threads: rpc-boss-3-1 (daemon=true), ForkJoinPool.commonPool-worker-3 (daemon=true), ForkJoinPool.commonPool-worker-2 (daemon=true), shuffle-boss-6-1 (daemon=true), ForkJoinPool.commonPool-worker-1 (daemon=true) =====
[info] Run completed in 12 seconds, 565 milliseconds.
[info] Total number of tests run: 30
[info] Suites: completed 1, aborted 0
[info] Tests: succeeded 30, failed 0, canceled 0, ignored 0, pending 0
[info] All tests passed.
```

### Was this patch authored or co-authored using generative AI tooling?
No

Closes apache#47359 from anishshri-db/task/SPARK-48891.

Authored-by: Anish Shrigondekar <anish.shrigondekar@databricks.com>
Signed-off-by: Jungtaek Lim <kabhwan.opensource@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants