-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve performance of drop table in iceberg connector #15981
Conversation
lib/trino-filesystem/src/main/java/io/trino/filesystem/fileio/BulkOperationsFileIO.java
Outdated
Show resolved
Hide resolved
lib/trino-filesystem/src/main/java/io/trino/filesystem/fileio/BulkOperationsFileIO.java
Outdated
Show resolved
Hide resolved
fileSystem.deleteFiles(pathList.build()); | ||
} | ||
catch (IOException e) { | ||
throw new RuntimeException(e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Throwing RuntimeException
is different from the interface's expectation. It would be nice to throw BulkDeletionFailureException
or leave a comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BulkDeletionFailureException
requires the number of files that failed to delete. Currently fileSystem.deleteFiles
does not give such information. I will prepare a PR for the same.
lib/trino-filesystem/src/main/java/io/trino/filesystem/fileio/BulkOperationsFileIO.java
Outdated
Show resolved
Hide resolved
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java
Outdated
Show resolved
Hide resolved
d5565c6
to
1f921c2
Compare
plugin/trino-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/glue/TrinoGlueCatalog.java
Outdated
Show resolved
Hide resolved
public class BulkOperationsFileIo | ||
extends ForwardingFileIo |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would implement this directly in ForwardingFileIo, not as a subclass.
(that would probably also satisfy https://github.com/trinodb/trino/pull/15981/files#r1097092879, right?)
cc @electrum
lib/trino-filesystem/src/main/java/io/trino/filesystem/fileio/BulkOperationsFileIo.java
Outdated
Show resolved
Hide resolved
fileSystem.deleteFiles(filesToDelete); | ||
} | ||
catch (IOException e) { | ||
// TODO find out the number of files that failed to delete and throw BulkDeletionFailureException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
what is org.apache.iceberg.io.BulkDeletionFailureException#numberFailedObjects
needed for?
if we don't need it, we can skip having a TODO
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SupportsBulkOperations::deleteFiles
throws BulkDeletionFailureException
exception. BulkDeletionFailureException
will be constructed by providing an integer value numberFailedObjects
which means the number of files that failed to delete.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
SupportsBulkOperations::deleteFiles throws BulkDeletionFailureException exception.
I know.
BTW it can also throw any unckeched exception, obviously.
numberFailedObjects which means the number of files that failed to delete.
I figured this from the name :)
How does Iceberg lib use that information?
lib/trino-filesystem/src/main/java/io/trino/filesystem/fileio/BulkOperationsFileIo.java
Outdated
Show resolved
Hide resolved
17b4302
to
30eff29
Compare
Addressed comments. CI is failing due to a flaky test
|
|
||
@Override | ||
public void deleteFiles(Iterable<String> pathsToDelete) | ||
throws BulkDeletionFailureException |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This isn’t thrown by the method. Should we be using this instead?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we cannot throw BulkDeletionFailureException becauase we cannot fill BulkDeletionFailureException#numberFailedObjects
see #15981 (comment) for more
fileSystem.deleteFiles(filesToDelete); | ||
} | ||
catch (IOException e) { | ||
throw new UncheckedIOException("Failed to delete some or all of files: " + filesToDelete, e); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think including all file paths to the error message is a good idea in case of large deletes. It may include 1,000 files.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ebyhr you convinced me, let's not make the message that long. OTOH, it could be problematic if we do not include any paths in the message. It can make identifying problems harder (eg which bucket did we try to access?).
I pushed a change to display only first 5 paths and then ellipsis ...
.
b986679
to
659a3b0
Compare
659a3b0
to
3310ec6
Compare
Description
While working on Iceberg small files benchmarking. I noticed that the drop table is very slow when the number of files is too large for a table. The main reason for slowness is
dropTableData
, which deletes the files concurrently when thefileIO
is not an instance ofSupportsBulkOperations
. we can delete files in bulk iffileIO
is an instance ofSupportsBulkOperations
. So implementsSupportsBulkOperations
instead ofFileIO
inForwardingFileIo.java
.I tested it on my local machine. A table with glue metastore containing data files approx 10K took ~26 minutes without this change whereas with this change it only took ~20 seconds.
Release notes
( ) This is not user-visible or docs only and no release notes are required.
( ) Release notes are required, please propose a release note for me.
(X) Release notes are required, with the following suggested text: