-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Respect hive.metastore.thrift.delete-files-on-drop
config property for dropping partitions
#13822
Conversation
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
plugin/trino-hive/src/main/java/io/trino/plugin/hive/metastore/thrift/ThriftHiveMetastore.java
Outdated
Show resolved
Hide resolved
The Hive connector doc needs an update if we're going to do this. Specifically:
|
I don't think we need to update Lines 107 to 112 in bcfe05f
|
f7ff00b
to
577addc
Compare
import static java.nio.charset.StandardCharsets.UTF_8; | ||
import static java.util.Objects.requireNonNull; | ||
|
||
public class TestHiveThriftMetastoreWithS3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Using normal test environment because somehow I can't reproduce the issue on product test environment.
577addc
to
eceb235
Compare
client.dropPartition(databaseName, tableName, parts, deleteData); | ||
String partitionLocation = partition.getSd().getLocation(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we wrap the file deletion in a separate try/catch?
I'm just thinking, if the partition is dropped from the metastore but the file system doesn't allow us to delete the files for some reason, the query should still succeed right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel users should disable it in such cases. This feature can be controlled by config property, so exception looks better in my opinion.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it depends on what the behavior is for a following select
statement. If I run a drop partition and it fails, the partition should still show up in the next query. If it doesn't show up, the drop should succeed. Does that make sense?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was missing that subsequent deleteDirRecursive
already has try-catch block in the method. Do you think it still requires additional change?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're totally right. This is fine
There are several CI failures. |
Added empty commit to retrigger the build after #14323 merged. |
6af2c70
to
db0758b
Compare
CI hit #14441 |
.setBucketName(writableBucket) | ||
.setPopulateTpchData(false) | ||
.setThriftMetastoreConfig(new ThriftMetastoreConfig().setDeleteFilesOnDrop(true)) | ||
.setHiveProperties(ImmutableMap.of("hive.allow-register-partition-procedure", "true")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
redundant?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's not redundant. ThriftMetastoreConfig.setDeleteFilesOnDrop
is equals to hive.metastore.thrift.delete-files-on-drop
not hive.allow-register-partition-procedure
.
@@ -255,7 +262,9 @@ public DistributedQueryRunner build() | |||
queryRunner.createCatalog(HIVE_CATALOG, "hive", hiveProperties); | |||
queryRunner.createCatalog(HIVE_BUCKETED_CATALOG, "hive", hiveBucketedProperties); | |||
|
|||
populateData(queryRunner, metastore); | |||
if (populateTpchData) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
else validate initialTables is empty, since it's ignored otherwise
however, a better way for this would be to call the parameter createTpchSchemas
then createTpchSchemas(false) would skip schema cretion and initialTables(empty) (default) would skip tables creation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Renamed to createTpchSchemas
.
@@ -136,7 +144,9 @@ public DistributedQueryRunner build() | |||
requireNonNull(s3SecretKey, "s3SecretKey is null"); | |||
requireNonNull(bucketName, "bucketName is null"); | |||
String lowerCaseS3Endpoint = s3Endpoint.toLowerCase(Locale.ENGLISH); | |||
checkArgument(lowerCaseS3Endpoint.startsWith("http://") || lowerCaseS3Endpoint.startsWith("https://"), "Expected http URI for S3 endpoint; got %s", s3Endpoint); | |||
checkArgument( | |||
lowerCaseS3Endpoint.matches("s3.*\\.amazonaws\\.com") || lowerCaseS3Endpoint.startsWith("http://") || lowerCaseS3Endpoint.startsWith("https://"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
are we using s3.*.amazonaws.com
without https://
? should we add it explicitly? in secrets?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pushed another commit to add the prefix.
|
||
// Recreating a table throws "Target directory for table 'xxx' already exists" in real S3 (not MinIO) | ||
// when 'hive.metastore.thrift.delete-files-on-drop' config property is false | ||
assertUpdate("CREATE TABLE " + tableName + "(col int)"); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe it would make sense to check that the table in fact is empty?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Included additional assertions.
db0758b
to
3407144
Compare
Addressed comments. |
LGTM, but the build is red |
Failed jobs succeeded in #14557 (within the repo). I will take a look these failures and file issues if needed. |
@ebyhr Is there a similar property available for glue metastore. Upon drop table command, the table meta information is deleted from glue but the underlying data remains in azure. Anything you can suggest to allow dropping of data? |
Description
Respect
hive.metastore.thrift.delete-files-on-drop
config property for dropping partitionsFixes #13545 Fixes #13821
Documentation
(x) Sufficient documentation is included in this PR.
Release notes
(x) Release notes entries required with the following suggested text: