Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[SPARK-45787][SQL] Support Catalog.listColumns for clustering columns #47451

Closed
wants to merge 3 commits into from

Conversation

zedtang
Copy link
Contributor

@zedtang zedtang commented Jul 22, 2024

What changes were proposed in this pull request?

Support listColumns API for clustering columns.

Why are the changes needed?

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.

Does this PR introduce any user-facing change?

Yes, listColumns will now show an additional field isCluster to indicate whether the column is a clustering column.
Old output for spark.catalog.listColumns:

+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+

New output:

+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+

How was this patch tested?

New unit tests.

Was this patch authored or co-authored using generative AI tooling?

No.

@zedtang
Copy link
Contributor Author

zedtang commented Jul 22, 2024

This PR depends on #47301

@zedtang
Copy link
Contributor Author

zedtang commented Jul 25, 2024

Hi @cloud-fan , this PR is ready for review, thanks

@cloud-fan
Copy link
Contributor

thanks, merging to master!

@cloud-fan cloud-fan closed this in e73ede7 Jul 26, 2024
@zedtang zedtang deleted the list-clustering-columns branch July 26, 2024 02:00
ilicmarkodb pushed a commit to ilicmarkodb/spark that referenced this pull request Jul 29, 2024
### What changes were proposed in this pull request?

Support listColumns API for clustering columns.
### Why are the changes needed?

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.
### Does this PR introduce _any_ user-facing change?

Yes, listColumns will now show an additional field `isCluster` to indicate whether the column is a clustering column.
Old output for `spark.catalog.listColumns`:
```
+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+
```

New output:
```
+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+
```

### How was this patch tested?

New unit tests.
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47451 from zedtang/list-clustering-columns.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
fusheng-rd pushed a commit to fusheng-rd/spark that referenced this pull request Aug 6, 2024
### What changes were proposed in this pull request?

Support listColumns API for clustering columns.
### Why are the changes needed?

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.
### Does this PR introduce _any_ user-facing change?

Yes, listColumns will now show an additional field `isCluster` to indicate whether the column is a clustering column.
Old output for `spark.catalog.listColumns`:
```
+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+
```

New output:
```
+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+
```

### How was this patch tested?

New unit tests.
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47451 from zedtang/list-clustering-columns.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
attilapiros pushed a commit to attilapiros/spark that referenced this pull request Oct 4, 2024
### What changes were proposed in this pull request?

Support listColumns API for clustering columns.
### Why are the changes needed?

Clustering columns should be supported, just like partition and bucket columns, for listColumns API.
### Does this PR introduce _any_ user-facing change?

Yes, listColumns will now show an additional field `isCluster` to indicate whether the column is a clustering column.
Old output for `spark.catalog.listColumns`:
```
+----+-----------+--------+--------+-----------+--------+
|name|description|dataType|nullable|isPartition|isBucket|
+----+-----------+--------+--------+-----------+--------+
|   a|       null|     int|    true|      false|   false|
|   b|       null|  string|    true|      false|   false|
|   c|       null|     int|    true|      false|   false|
|   d|       null|  string|    true|      false|   false|
+----+-----------+--------+--------+-----------+--------+
```

New output:
```
+----+-----------+--------+--------+-----------+--------+---------+
|name|description|dataType|nullable|isPartition|isBucket|isCluster|
+----+-----------+--------+--------+-----------+--------+---------+
|   a|       null|     int|    true|      false|   false|    false|
|   b|       null|  string|    true|      false|   false|    false|
|   c|       null|     int|    true|      false|   false|    false|
|   d|       null|  string|    true|      false|   false|    false|
+----+-----------+--------+--------+-----------+--------+---------+
```

### How was this patch tested?

New unit tests.
### Was this patch authored or co-authored using generative AI tooling?

No.

Closes apache#47451 from zedtang/list-clustering-columns.

Authored-by: Jiaheng Tang <jiaheng.tang@databricks.com>
Signed-off-by: Wenchen Fan <wenchen@databricks.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants