[SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns #47156

zedtang · 2024-07-01T01:35:19Z

What changes were proposed in this pull request?

Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change the clustering columns:

ALTER TABLE tbl CLUSTER BY (a, b);  -- update clustering columns to a and b
ALTER TABLE tbl CLUSTER BY NONE;  -- remove clustering columns

This change updates the clustering columns for catalogs to utilize. Clustering columns are maintained in:

CatalogTable's PROP_CLUSTERING_COLUMNS for session catalog
Table's partitioning transform array for V2 catalog

which is consistent with CREATE TABLE CLUSTER BY( #42577).

Why are the changes needed?

Provides a way to update the clustering columns.

Does this PR introduce any user-facing change?

Yes, it introduces new SQL syntax and a new keyword NONE.

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No

zedtang · 2024-07-01T01:36:38Z

@cloud-fan, @imback82, this PR is ready for review, thanks!

sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala

…alog/CatalogV2Util.scala

cloud-fan · 2024-07-03T03:31:22Z

thanks, merging to master!

…yntax-ddl-alter-table.md` ### What changes were proposed in this pull request? The pr is following up #47156, aims to - add `CLUSTER BY` to doc `sql-ref-syntax-ddl-alter-table.md` - move parser tests from `o.a.s.s.c.p.DDLParserSuite` to `AlterTableClusterByParserSuite` - use `checkError` to check exception in `o.a.s.s.e.c.AlterTableClusterBySuiteBase` ### Why are the changes needed? - Enable the doc `sql-ref-syntax-ddl-alter-table.md` to cover new syntax `ALTER TABLE ... CLUSTER BY ...`. - Align with other similar tests, eg: AlterTableRename* ### Does this PR introduce _any_ user-facing change? Yes, Make end-users can query the explanation of `CLUSTER BY` through the doc `sql-ref-syntax-ddl-alter-table.md`. ### How was this patch tested? Updated UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes #47254 from panbingkun/SPARK-48760_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

… change clustering columns ### What changes were proposed in this pull request? Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change the clustering columns: ```sql ALTER TABLE tbl CLUSTER BY (a, b); -- update clustering columns to a and b ALTER TABLE tbl CLUSTER BY NONE; -- remove clustering columns ``` This change updates the clustering columns for catalogs to utilize. Clustering columns are maintained in: * CatalogTable's `PROP_CLUSTERING_COLUMNS` for session catalog * Table's `partitioning` transform array for V2 catalog which is consistent with CREATE TABLE CLUSTER BY( apache#42577). ### Why are the changes needed? Provides a way to update the clustering columns. ### Does this PR introduce _any_ user-facing change? Yes, it introduces new SQL syntax and a new keyword NONE. ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47156 from zedtang/alter-table-cluster-by. Lead-authored-by: Jiaheng Tang <jiaheng.tang@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…yntax-ddl-alter-table.md` ### What changes were proposed in this pull request? The pr is following up apache#47156, aims to - add `CLUSTER BY` to doc `sql-ref-syntax-ddl-alter-table.md` - move parser tests from `o.a.s.s.c.p.DDLParserSuite` to `AlterTableClusterByParserSuite` - use `checkError` to check exception in `o.a.s.s.e.c.AlterTableClusterBySuiteBase` ### Why are the changes needed? - Enable the doc `sql-ref-syntax-ddl-alter-table.md` to cover new syntax `ALTER TABLE ... CLUSTER BY ...`. - Align with other similar tests, eg: AlterTableRename* ### Does this PR introduce _any_ user-facing change? Yes, Make end-users can query the explanation of `CLUSTER BY` through the doc `sql-ref-syntax-ddl-alter-table.md`. ### How was this patch tested? Updated UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47254 from panbingkun/SPARK-48760_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? #47156 introduced a bug in `CatalogV2Util.applyClusterByChanges` that it will remove the existing `ClusterByTransform` first, regardless of whether there is a `ClusterBy` table change. This means any table change will remove the clustering columns from the table. This PR fixes the bug by removing the `ClusterByTransform` only when there is a `ClusterBy` table change. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Amend existing test to catch this bug. ### Was this patch authored or co-authored using generative AI tooling? No Closes #47288 from zedtang/fix-apply-cluster-by-changes. Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…yntax-ddl-alter-table.md` ### What changes were proposed in this pull request? The pr is following up apache#47156, aims to - add `CLUSTER BY` to doc `sql-ref-syntax-ddl-alter-table.md` - move parser tests from `o.a.s.s.c.p.DDLParserSuite` to `AlterTableClusterByParserSuite` - use `checkError` to check exception in `o.a.s.s.e.c.AlterTableClusterBySuiteBase` ### Why are the changes needed? - Enable the doc `sql-ref-syntax-ddl-alter-table.md` to cover new syntax `ALTER TABLE ... CLUSTER BY ...`. - Align with other similar tests, eg: AlterTableRename* ### Does this PR introduce _any_ user-facing change? Yes, Make end-users can query the explanation of `CLUSTER BY` through the doc `sql-ref-syntax-ddl-alter-table.md`. ### How was this patch tested? Updated UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47254 from panbingkun/SPARK-48760_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? apache#47156 introduced a bug in `CatalogV2Util.applyClusterByChanges` that it will remove the existing `ClusterByTransform` first, regardless of whether there is a `ClusterBy` table change. This means any table change will remove the clustering columns from the table. This PR fixes the bug by removing the `ClusterByTransform` only when there is a `ClusterBy` table change. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Amend existing test to catch this bug. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47288 from zedtang/fix-apply-cluster-by-changes. Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

… change clustering columns ### What changes were proposed in this pull request? Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change the clustering columns: ```sql ALTER TABLE tbl CLUSTER BY (a, b); -- update clustering columns to a and b ALTER TABLE tbl CLUSTER BY NONE; -- remove clustering columns ``` This change updates the clustering columns for catalogs to utilize. Clustering columns are maintained in: * CatalogTable's `PROP_CLUSTERING_COLUMNS` for session catalog * Table's `partitioning` transform array for V2 catalog which is consistent with CREATE TABLE CLUSTER BY( apache#42577). ### Why are the changes needed? Provides a way to update the clustering columns. ### Does this PR introduce _any_ user-facing change? Yes, it introduces new SQL syntax and a new keyword NONE. ### How was this patch tested? New unit tests. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47156 from zedtang/alter-table-cluster-by. Lead-authored-by: Jiaheng Tang <jiaheng.tang@databricks.com> Co-authored-by: Wenchen Fan <cloud0fan@gmail.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

…yntax-ddl-alter-table.md` ### What changes were proposed in this pull request? The pr is following up apache#47156, aims to - add `CLUSTER BY` to doc `sql-ref-syntax-ddl-alter-table.md` - move parser tests from `o.a.s.s.c.p.DDLParserSuite` to `AlterTableClusterByParserSuite` - use `checkError` to check exception in `o.a.s.s.e.c.AlterTableClusterBySuiteBase` ### Why are the changes needed? - Enable the doc `sql-ref-syntax-ddl-alter-table.md` to cover new syntax `ALTER TABLE ... CLUSTER BY ...`. - Align with other similar tests, eg: AlterTableRename* ### Does this PR introduce _any_ user-facing change? Yes, Make end-users can query the explanation of `CLUSTER BY` through the doc `sql-ref-syntax-ddl-alter-table.md`. ### How was this patch tested? Updated UT. ### Was this patch authored or co-authored using generative AI tooling? No. Closes apache#47254 from panbingkun/SPARK-48760_FOLLOWUP. Authored-by: panbingkun <panbingkun@baidu.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

### What changes were proposed in this pull request? apache#47156 introduced a bug in `CatalogV2Util.applyClusterByChanges` that it will remove the existing `ClusterByTransform` first, regardless of whether there is a `ClusterBy` table change. This means any table change will remove the clustering columns from the table. This PR fixes the bug by removing the `ClusterByTransform` only when there is a `ClusterBy` table change. ### Why are the changes needed? ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? Amend existing test to catch this bug. ### Was this patch authored or co-authored using generative AI tooling? No Closes apache#47288 from zedtang/fix-apply-cluster-by-changes. Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com> Signed-off-by: Wenchen Fan <wenchen@databricks.com>

initial commit

11d0875

github-actions bot added the SQL label Jul 1, 2024

zedtang changed the title ~~[SPARK-48760] Introduce ALTER TABLE CLUSTER BY SQL syntax to change clustering columns~~ [SPARK-48760][SQL] Introduce ALTER TABLE CLUSTER BY SQL syntax to change clustering columns Jul 1, 2024

zedtang added 2 commits June 30, 2024 22:57

fix keywords.sql

4b45101

fix SQLKeywordSuite

7c92bc1

github-actions bot added the DOCS label Jul 1, 2024

zedtang changed the title ~~[SPARK-48760][SQL] Introduce ALTER TABLE CLUSTER BY SQL syntax to change clustering columns~~ [SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns Jul 1, 2024

cloud-fan approved these changes Jul 2, 2024

View reviewed changes

cloud-fan reviewed Jul 2, 2024

View reviewed changes

sql/catalyst/src/main/scala/org/apache/spark/sql/connector/catalog/CatalogV2Util.scala Outdated Show resolved Hide resolved

cloud-fan and others added 2 commits July 2, 2024 14:05

Update sql/catalyst/src/main/scala/org/apache/spark/sql/connector/cat…

9f552b1

…alog/CatalogV2Util.scala

fix ThriftServerWithSparkContextInHttpSuite

48aef17

cloud-fan closed this in b8cc91c Jul 3, 2024

panbingkun mentioned this pull request Jul 8, 2024

[SPARK-48760][SQL][DOCS][FOLLOWUP] Add CLUSTER BY to doc sql-ref-syntax-ddl-alter-table.md #47254

Closed

zedtang mentioned this pull request Jul 10, 2024

[SPARK-48760][SQL] Fix CatalogV2Util.applyClusterByChanges #47288

Closed

zedtang deleted the alter-table-cluster-by branch July 11, 2024 16:06

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns #47156

[SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns #47156

zedtang commented Jul 1, 2024 •

edited

Loading

zedtang commented Jul 1, 2024

cloud-fan commented Jul 3, 2024

[SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns #47156

[SPARK-48760][SQL] Introduce ALTER TABLE ... CLUSTER BY SQL syntax to change clustering columns #47156

Conversation

zedtang commented Jul 1, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

zedtang commented Jul 1, 2024

cloud-fan commented Jul 3, 2024

zedtang commented Jul 1, 2024 •

edited

Loading