Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support drop iceberg table when metadata or snapshot is missing and refactor code for delta-lake #15065

Closed

Conversation

krvikash
Copy link
Contributor

@krvikash krvikash commented Nov 16, 2022

Description

Fixes #12318

This PR Includes below:

SPI:
Introduced new SPI method dropTable which takes SchemaTableName. It is the connector's responsibility to implement the method.

Note: dropTable with table handle deprecation/removal will be taken care in new PR

Iceberg connector:

  1. When the latest manifest file does not exist --> Fails to drop table (Fixed)
  2. When snapshot file does not exist --> drops table in metastore, But the fails to delete data and metadata files (Fixed)
  3. When ManifestList file does not exist --> No Fix required, Already working
  4. When data file does not exist --> No Fix required, Already working

Q: What if latest manifest file or snapshot file does not exist and data files referenced by TableMetadata located in different locations?

Delta-Lake connector:

  1. Consumed new SPI method dropTable with SchemaTableName in case delta log does not exist
  2. Made metadataEntry non-optional in DeltaLakeTableHandle

Non-technical explanation

NA

Release notes

( ) This is not user-visible or docs only and no release notes are required.
(X) Release notes are required, please propose a release note for me.
( ) Release notes are required, with the following suggested text:

# Section
* Fix some things. ({issue}`issuenumber`)

@cla-bot cla-bot bot added the cla-signed label Nov 16, 2022
@krvikash krvikash changed the title Support drop corrupted iceberg table Support drop iceberg table when metadata or snapshot is missing Nov 16, 2022
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch from bd6d342 to 604b6c9 Compare November 17, 2022 06:59
@krvikash krvikash marked this pull request as ready for review November 17, 2022 07:01
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch from 604b6c9 to 2ab0747 Compare November 17, 2022 07:17
if (!(e.getCause() instanceof FileNotFoundException)) {
throw e;
}
LOG.warn("Failed to load table " + schemaTableName, e);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the exception being swallowed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because we do not want to fail when FileNotFoundException exception occurs instead we want to drop the table from metastore and delete table location directory.

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch 4 times, most recently from 4234827 to eedddc1 Compare November 17, 2022 13:24
@@ -346,6 +348,12 @@ public IcebergTableHandle getTableHandle(
catch (TableNotFoundException e) {
return null;
}
catch (Exception e) {
if (e.getCause() instanceof FileNotFoundException) {
throw new TrinoException(METADATA_NOT_FOUND, "Metadata not found in metadata location for table " + tableName, e);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not positive this will always be a FileNotFoundException. If you're using S3 for example it's probably something like an AmazonServiceException with an error code "Specified key does not exist"

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Modified the exception catch to handle S3 exception. Verified GCP and ADLS exceptions as well.

@@ -1399,6 +1407,12 @@ public void dropTable(ConnectorSession session, ConnectorTableHandle tableHandle
catalog.dropTable(session, ((IcebergTableHandle) tableHandle).getSchemaTableName());
}

@Override
public void dropTable(ConnectorSession session, SchemaTableName schemaTableName)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should still validate that the table exists here, and throw a TableNotFound exception if it doesn't

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Respective catalog call will take care of validating the table exists.

@ebyhr
Copy link
Member

ebyhr commented Nov 18, 2022

Could you update PR title to reflect Delta Lake connector?

@krvikash krvikash changed the title Support drop iceberg table when metadata or snapshot is missing Support drop iceberg table when metadata or snapshot is missing and refactor code for delta-lake Nov 18, 2022
*
* @throws RuntimeException if the table cannot be dropped
*/
void dropTable(Session session, QualifiedObjectName tableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We want this method to replace the old one, void dropTable(Session session, TableHandle tableHandle).
So it would be good to

  • make ConnectorMetadata change backward compatibility
  • apply changes to DropTableTask

Doing this would reveal that the old drop handles redirections whereas the new drop-by-name cannot do that.
The MetadataManager could follow redirects as needed, but I am concerned this is not the right place to add that logic.

Also, decoupling ConnectorMetadata.redirectTable and ConnectorMetadata.dropTable calls makes it impossible to make a non-racy implementation.
I am leaving towards introducing a non-void return type here, so that drop-by-name can either declare success or return a redirection.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@martint wdyt?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unresolved. @krvikash @findinpath do you have a plan here?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

perhaps something like this (rough idea, not polished up)

DropResult dropTable(Session session, QualifiedObjectName tableName);
sealed class DropResult
  permits DropSuccess, DropRedirected
class DropSuccess {  .. }  // singleton (can this be an enum?)
records DropRedirected(CatalogSchemaTableName target) {}

@findepi findepi requested review from martint and phd3 November 18, 2022 09:24
BaseTable table = null;
try {
table = (BaseTable) loadTable(session, schemaTableName);
validateTableCanBeDropped(table);
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When the metadata file is not found, Then we won't be able to get the table properties and validate if the table can be dropped. How to handle this?

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch 2 times, most recently from 7dc64a3 to 649ccd0 Compare November 18, 2022 23:40
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch 2 times, most recently from da95d64 to 62b2221 Compare November 19, 2022 18:35
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch 2 times, most recently from 7cc24be to 7375027 Compare December 5, 2022 12:14
try {
redirectionAwareTableHandle = metadata.getRedirectionAwareTableHandle(session, originalTableName);
}
catch (TrinoException e) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why is this exception-driven?
what potential exceptions are we ignoring here?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to check the exception which fails with the METADATA_NOT_FOUND error.

Copy link
Contributor

@findinpath findinpath Dec 16, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably rethink here the way we obtain the tableName. We don't need actually the full tableHandle for dropping the table, right?

Swallowing the exception gets us to the state of adding the DropResult , DropRedirected and DropSuccess classes and extra logic in MetadataManager.

I think the approach to solve this issue should be thought over.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I want to check the exception which fails with the METADATA_NOT_FOUND error.

metadata.dropTable must support such case anyway
what does it currently throw?

*
* @throws RuntimeException if the table cannot be dropped
*/
void dropTable(Session session, QualifiedObjectName tableName);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks unresolved. @krvikash @findinpath do you have a plan here?

Copy link
Member

@findepi findepi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"Introduce dropTable SPI method that takes SchemaTableName"

DropTableTask lgtm

please clean up the commit (deprecation, unused class)

@krvikash
Copy link
Contributor Author

krvikash commented Jan 3, 2023

I have written a doc for the approaches we have taken for this PR https://docs.google.com/document/d/1uydrtqfj4S5dhyrQ8JqAsW3fqr1ACXmEvj9MYSpJpLw/

CC: @findepi

@findepi
Copy link
Member

findepi commented Feb 7, 2023

sorry for not following up earlier on this.
I see the build is quite totally red. do you want to fix it before my re-review?

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch from bf2bc50 to 2860eee Compare February 8, 2023 07:27
@krvikash
Copy link
Contributor Author

krvikash commented Feb 8, 2023

Rebased and resolved the conflicts.

@findepi, I am using using Approach 3 from https://docs.google.com/document/d/1uydrtqfj4S5dhyrQ8JqAsW3fqr1ACXmEvj9MYSpJpLw/ which is not working and test cases will fail. So need to finalize what approach we can take to make it work.

@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch from 2860eee to f83da0c Compare February 13, 2023 09:13
@krvikash
Copy link
Contributor Author

Rebased

@krvikash krvikash requested a review from findepi March 6, 2023 08:50
@krvikash krvikash force-pushed the support-drop-corrupted-iceberg-table branch from f83da0c to b07a719 Compare March 6, 2023 09:05
@krvikash
Copy link
Contributor Author

krvikash commented Mar 6, 2023

Rebased with the master and resolve conflicts.

@github-actions github-actions bot added delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector labels Mar 6, 2023
@findepi findepi force-pushed the support-drop-corrupted-iceberg-table branch from b07a719 to 1856081 Compare March 21, 2023 09:58
@findepi
Copy link
Member

findepi commented Mar 21, 2023

(rebased resolving a conflict)

@@ -264,9 +264,18 @@ Optional<TableExecuteHandle> getTableHandleForExecute(
* Drops the specified table
*
* @throws RuntimeException if the table cannot be dropped or table handle is no longer valid
*
* @deprecated use {@link #dropTable(Session, QualifiedObjectName)}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This overload is a dead code now, no need for keeping it (as deprecated)

@@ -614,6 +636,32 @@ public void testDeny()
onTrino().executeQuery("DROP TABLE " + icebergTableName);
}

@Test(groups = {HIVE_ICEBERG_REDIRECTIONS, PROFILE_SPECIFIC_TESTS})
public void testDropTableWithMissingMetadataFile()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have it tested outside of PTs?

@krvikash
Copy link
Contributor Author

krvikash commented Mar 25, 2023

Closing in favor of #16674 and #16651

@krvikash krvikash closed this Mar 25, 2023
@krvikash krvikash deleted the support-drop-corrupted-iceberg-table branch March 25, 2023 07:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla-signed delta-lake Delta Lake connector hive Hive connector iceberg Iceberg connector
Development

Successfully merging this pull request may close these issues.

Iceberg connector can't drop tables if the metadata is missing
5 participants