Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle missing StorageDescriptor in Hive Glue tables #17655

Merged
merged 3 commits into from
May 18, 2022

Conversation

v-jizhang
Copy link
Contributor

Cherry-pick of trinodb/trino#11092

Fixes #17643

Co-authored-by: Alex jo.alex2144@gmail.com

Test plan - Test added

== RELEASE NOTES ==
Hive Changes
* Handle missing StorageDescriptor in Hive Glue tables

@rohanpednekar rohanpednekar requested a review from beinan April 15, 2022 19:36
@prithvip
Copy link
Contributor

@ChunxuTang could you take a look and review this?

1 similar comment
@prithvip
Copy link
Contributor

@ChunxuTang could you take a look and review this?

@ChunxuTang
Copy link
Member

@prithvip Sure! Glad to help. Will take a look today.

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @v-jizhang, thanks a lot for your nice work!
I just left some minor suggestions. Feel free to have a look.
Additionally, as the PR was sent around two weeks ago, when you push new changes, could you also merge the current presto master branch to avoid potential merge conflicts? Thanks!

Comment on lines 514 to 528
public static boolean isDeltaLakeTable(Map<String, String> tableParameters)
{
return tableParameters.containsKey(SPARK_TABLE_PROVIDER_KEY)
&& tableParameters.get(SPARK_TABLE_PROVIDER_KEY).toLowerCase(ENGLISH).equals(DELTA_LAKE_PROVIDER);
}

public static boolean isIcebergTable(Table table)
{
return isIcebergTable(table.getParameters());
}

public static boolean isIcebergTable(Map<String, String> tableParameters)
{
return ICEBERG_TABLE_TYPE_VALUE.equalsIgnoreCase(tableParameters.get(ICEBERG_TABLE_TYPE_NAME));
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any there any specific reasons that you want to keep these functions in the FileHiveMetastore class? Looks like they are some helper functions. In that case, we can put them into the HiveUtil class.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveUtil is in presto-hive module and if put them there, to avoid circular dependency, they cannot be referenced in presto-hive-metastore.

@@ -106,6 +106,7 @@
import static com.google.common.base.MoreObjects.toStringHelper;
import static com.google.common.base.Preconditions.checkArgument;
import static com.google.common.collect.ImmutableList.toImmutableList;
import static java.util.Locale.ENGLISH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think most changes in this file could be moved to HiveUtil, as the changes are mostly related to some helper functions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

HiveUtil is in presto-hive and because presto-hive depends on presto-hive-metastore, presto-hive-metastore can't use anything in HiveUtil

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for your clarification!
In that case, maybe we could move the utility functions to MetastoreUtil? Most added functions here look to be utility functions that are independent of FileHiveMetastore. What do you think?

.withDatabaseName(table.getSchemaName())
.withName(table.getTableName());
AWSGlueAsync glueClient = AWSGlueAsyncClientBuilder.defaultClient();
try (Transaction transaction = newTransaction()) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this Transaction transaction = newTransaction() necessary? Looks like it's not used.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right. It's not used.

@v-jizhang v-jizhang requested a review from a team as a code owner April 30, 2022 02:03
@v-jizhang
Copy link
Contributor Author

@ChunxuTang @rohanpednekar: There is fix trinodb/trino#12130. I'd like to port it in this PR also.

Thanks

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @v-jizhang, thanks a lot for your work!
Sorry for my late reply. Let some minor suggestions~

BTW, I noticed that trinodb/trino#12130 also contains two new functions testIcebergTableNonNullStorageDescriptor and testDeltaTableNonNullStorageDescriptor in the TestGlueToTrinoConverter. Would you mind also cherry-picking them here?

@@ -106,6 +106,7 @@
import static com.google.common.base.MoreObjects.toStringHelper;
import static com.google.common.base.Preconditions.checkArgument;
import static com.google.common.collect.ImmutableList.toImmutableList;
import static java.util.Locale.ENGLISH;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Gotcha. Thanks for your clarification!
In that case, maybe we could move the utility functions to MetastoreUtil? Most added functions here look to be utility functions that are independent of FileHiveMetastore. What do you think?

@v-jizhang v-jizhang force-pushed the missing-sd branch 2 times, most recently from 478e040 to f7832f8 Compare May 16, 2022 19:25
@v-jizhang
Copy link
Contributor Author

v-jizhang commented May 16, 2022

Hi @v-jizhang, thanks a lot for your work! Sorry for my late reply. Let some minor suggestions~

BTW, I noticed that trinodb/trino#12130 also contains two new functions testIcebergTableNonNullStorageDescriptor and testDeltaTableNonNullStorageDescriptor in the TestGlueToTrinoConverter. Would you mind also cherry-picking them here?

@ChunxuTang Refactored utility functions and added tests. Thanks.

Copy link
Member

@ChunxuTang ChunxuTang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@v-jizhang LGTM! Thanks for your work! BTW, I noticed that there's a testing failure. Could you re-run the tests to pass all of them?

Hi @beinan, would you like to also take a quick look at this PR?

v-jizhang and others added 3 commits May 17, 2022 09:32
Cherry-pick of trinodb/trino#11092

Fixes prestodb#17643

Co-authored-by: Alex <jo.alex2144@gmail.com>
Cherry-pick of trinodb/trino#12130

Some table loaders (e.g. Iceberg FlinkSink) will update/add Glue
storage descriptors even while writing iceberg and delta format tables.
Although these storage descriptors are not technically needed by
delta/iceberg, it is not incompatible. Instead of assuming that a Glue
storage descriptor excludes delta/iceberg usage, we should instead check
directly for the table_type metadata flag to determine whether or not a
table is delta/iceberg.

Co-authored-by: Eric Hwang <eric.hwang@starburstdata.com>
Cherry-pick of trinodb/trino#12164

Delta Lake connector expects that the metastore storage descriptor be
fully translated if it exists. It only needs to return a dummy storage
value if no storage descriptor has been configured.

Co-authored-by: Eric Hwang <eric.hwang@starburstdata.com>
Copy link
Member

@beinan beinan left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm, thx!

@beinan beinan merged commit 9a73930 into prestodb:master May 18, 2022
@highker highker mentioned this pull request Jul 6, 2022
7 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Missing StorageDescriptor in Hive Glue tables
4 participants