[SPARK-45787][SQL] Support Catalog.listColumns for clustering columns #47451

zedtang · 2024-07-22T19:39:08Z

What changes were proposed in this pull request?

Support listColumns API for clustering columns.

Why are the changes needed?

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.

Does this PR introduce any user-facing change?

Yes, listColumns will now show an additional field isCluster to indicate whether the column is a clustering column.
Old output for spark.catalog.listColumns:

+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+

New output:

+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

zedtang · 2024-07-22T19:44:41Z

This PR depends on #47301

zedtang · 2024-07-25T05:01:25Z

Hi @cloud-fan , this PR is ready for review, thanks

cloud-fan · 2024-07-26T01:43:37Z

thanks, merging to master!

github-actions bot added SQL BUILD PYTHON R CONNECT labels Jul 22, 2024

zedtang force-pushed the list-clustering-columns branch from d583e11 to 91a00b6 Compare July 22, 2024 19:42

github-actions bot removed the BUILD label Jul 22, 2024

zedtang force-pushed the list-clustering-columns branch from 91a00b6 to d583e11 Compare July 22, 2024 19:44

github-actions bot added the BUILD label Jul 22, 2024

zedtang mentioned this pull request Jul 22, 2024

[SPARK-48762][SQL] Introduce clusterBy DataFrameWriter API for Python #47452

Closed

support clustering columns in listColumns API

13a661b

zedtang force-pushed the list-clustering-columns branch from d583e11 to 13a661b Compare July 25, 2024 05:00

github-actions bot removed the BUILD label Jul 25, 2024

fix

460a77f

cloud-fan approved these changes Jul 25, 2024

View reviewed changes

fix tests

7f1f0a4

cloud-fan closed this in e73ede7 Jul 26, 2024

zedtang deleted the list-clustering-columns branch July 26, 2024 02:00

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-45787][SQL] Support Catalog.listColumns for clustering columns #47451

[SPARK-45787][SQL] Support Catalog.listColumns for clustering columns #47451

zedtang commented Jul 22, 2024 •

edited

Loading

zedtang commented Jul 22, 2024

zedtang commented Jul 25, 2024

cloud-fan commented Jul 26, 2024

[SPARK-45787][SQL] Support Catalog.listColumns for clustering columns #47451

[SPARK-45787][SQL] Support Catalog.listColumns for clustering columns #47451

Conversation

zedtang commented Jul 22, 2024 • edited Loading

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

zedtang commented Jul 22, 2024

zedtang commented Jul 25, 2024

cloud-fan commented Jul 26, 2024

zedtang commented Jul 22, 2024 •

edited

Loading