-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that bucketing and sort column names correspond to table column names #16796
Ensure that bucketing and sort column names correspond to table column names #16796
Conversation
f2b874b
to
e4f3034
Compare
List<String> bucketColumnNames = storageDescriptor.getBucketCols().stream() | ||
// Ensure that the names used for the bucket columns are the same as the names used for the table columns | ||
.map(bucketColumnName -> dataColumns.stream().filter(column -> column.getName().equalsIgnoreCase(bucketColumnName)) | ||
.findFirst() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
there should be exactly one such
also, linear search over column list is OK provided that there are only few bucketing columns, which is a reasonable assumption (otherwise we would build a set of data column names). add a code comment.
however, i wonder whether we have to do this validation here at all
i think it should be sufficient to lowercase the column names.
storageDescriptor.getBucketCols().stream()
.map(name -> name.toLowerCase(ENGLISH))
.collect(toImmList())
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I followed your suggestion and lowercased the bucketing and sorting columns to target them matching the data column names
@@ -52,7 +53,7 @@ public HiveBucketProperty( | |||
this.sortedBy = ImmutableList.copyOf(requireNonNull(sortedBy, "sortedBy is null")); | |||
} | |||
|
|||
public static Optional<HiveBucketProperty> fromStorageDescriptor(Map<String, String> tableParameters, StorageDescriptor storageDescriptor, String tablePartitionName) | |||
public static Optional<HiveBucketProperty> fromStorageDescriptor(Map<String, String> tableParameters, StorageDescriptor storageDescriptor, String tablePartitionName, List<Column> dataColumns) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is it legal to bucket on a partitioning column? (i know it doesn't make sense)
plugin/trino-hive/src/main/java/io/trino/plugin/hive/HiveBucketProperty.java
Outdated
Show resolved
Hide resolved
CI hit #14637 |
In the metastore, the bucketing and sorting column names can differ in case from its corresponding table column names. This change makes certain that, even though a table can be delivered by the metastore with such inconsistencies, Trino will lowercase the same bucketing and sort column names to ensure they correspond to the data column names.
e4f3034
to
7e043a7
Compare
CI hit #12818 |
Description
In the metastore, the bucketing and sorting column names can differ in case from its corresponding table column names. This change makes certain that, even though a table can be delivered by the metastore with such inconsistencies, Trino will make use of exactly the same bucketing and sort column names as their corresponding data column names.
Additional context and related issues
Reproduction scenario
Connect to
jdbc:hive2://localhost:10213/default
Spark
The inconsistency in case for the data column
segment_id
and the bucketing columnSEGMENT_ID
was causing in Trino the issue:Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(x) Release notes are required, with the following suggested text: