-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ignore empty folder placeholder files on S3 #4552
Ignore empty folder placeholder files on S3 #4552
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don’t think we need config for this.
beb927f
to
9d4fa82
Compare
Thanks, removed the config. |
@rohangarg messaged me on slack with the suggestion we move this change to https://github.com/prestosql/presto/blob/master/presto-hive/src/main/java/io/prestosql/plugin/hive/util/HiveFileIterator.java#L86 Do you have a preference @electrum ? |
presto-hive/src/test/java/io/prestosql/plugin/hive/s3/MockAmazonS3.java
Outdated
Show resolved
Hide resolved
9d4fa82
to
0098476
Compare
One benefit of the split generation change approach could be that it handles custom |
Not electrum here, but I think it would be a good idea not to That way it can be up to the caller to decide whether they want to filter those out or not (maybe via Hadoop's PathFilter API), but doing this directly for now in HiveFileIterator like you mentioned makes sense. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i propsoe a naming change
@@ -591,6 +592,11 @@ private static boolean isGlacierObject(S3ObjectSummary object) | |||
return Glacier.toString().equals(object.getStorageClass()); | |||
} | |||
|
|||
private static boolean isHadoopEmptyFolderObject(S3ObjectSummary object) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private static boolean isHadoopEmptyFolderObject(S3ObjectSummary object) | |
private static boolean isHadoopFolderMarker(S3ObjectSummary object) |
@@ -39,6 +39,7 @@ | |||
private GetObjectMetadataRequest getObjectMetadataRequest; | |||
private CannedAccessControlList acl; | |||
private boolean hasGlacierObjects; | |||
private boolean hasHadoopEmptyFolderObjects; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
private boolean hasHadoopEmptyFolderObjects; | |
private boolean hasHadoopFolderMarkers; |
@@ -570,6 +570,22 @@ private static void assertSkipGlacierObjects(boolean skipGlacierObjects) | |||
} | |||
} | |||
|
|||
@Test | |||
public void testSkipHadoopEmptyFolderObjectsEnabled() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rename appropriately
@pettyjamesm the |
A follow-up: #4566 |
0098476
to
7ee0e22
Compare
Renamed, thanks @findepi |
A note on filtering in HiveFileIterator -- ive does not ignore those empty files when running on hdfs:// |
7ee0e22
to
4f03197
Compare
These files are created by the Hadoop file system to indicate empty folders.
4f03197
to
c1ec76d
Compare
Merged, thanks! |
These files are created by the Hadoop file system to indicate empty folders.