-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Don't delete files when commit to hive metastore fails #12209
Don't delete files when commit to hive metastore fails #12209
Conversation
According to apache/iceberg#2317 it is better not to delete files after commit to metastore failed.
...-iceberg/src/main/java/io/trino/plugin/iceberg/catalog/hms/HiveMetastoreTableOperations.java
Show resolved
Hide resolved
} | ||
catch (RuntimeException e) { | ||
try { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As per apache/iceberg#2317
basically we need to group failures into two categories:
- Failures that are reports from the Catalog that it could not perform the operation for some reason
- Failures where the client has lost contact for some reason (basically everything else)
This implementation never deletes the new metadata file. Is this intentional?
See https://github.com/apache/iceberg/pull/2328/files#diff-e502621d52f86cf0ec3187dda30ac61f6b76efb7b6276bc8d233ccb2c836fb98R482 where the metadata is deleted only in case of FAILURE
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is intentional, we do have a little different logic than iceberg library so it can't be copied one to one, also it is safer not to ever delete these files.
} | ||
catch (RuntimeException e) { | ||
try { | ||
io().deleteFile(newMetadataLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This usecase is probably pretty exotic to catch in a real-world test case.
Would it make sense to have a unit test case to simulate such a situation?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can try to unit test this but in my opinion it wouldn't be very useful, also although code here is pretty simple it would require a lot of mocking and it is only a little part of whole commit process that is initiated in org.apache.iceberg.BaseTransaction#commitTransaction
so testing this small piece wouldn't be meaningful.
} | ||
catch (RuntimeException e) { | ||
try { | ||
io().deleteFile(newMetadataLocation); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
deleteFile(newMetadataLocation)
should still be invoked when any of the getter instructions fails (e.g. fromMetastoreApiTable(thriftMetastore.getTable
)
the left-over should be left behind only when the mutating call fails.
further, we should have a comment saying that we can introduce error categorization to know when such cleanup would be safe after the mutating call first. We don't need to introduce the categorization in this PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Interestingly, GlueIcebergTableOperations
has no delete at all. cc @alexjo2144
obviously, we want to keep the behaviors in sync.
@alexjo2144 intentional?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took them out based on this comment from David #10845 (comment)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think I aired on the side of, not cleaning up is safe and figuring out which situations are safe is hard. So leave it for remove_orphan_files
to do
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@findepi so what should I do? add deletion here and in Glue for the case when getter
fails ? Or be on the super safe side and not to bother with deletion here at all
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's do what Alex did for Glue -- leave behind
According to apache/iceberg#2317
it is better not to delete files after commit to metastore failed.
Fixes: #10462
Description
Related issues, pull requests, and links
Documentation
( ) No documentation is needed.
( ) Sufficient documentation is included in this PR.
( ) Documentation PR is available with #prnumber.
( ) Documentation issue #issuenumber is filed, and can be handled later.
Release notes
( ) No release notes entries required.
( ) Release notes entries required with the following suggested text: