Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Iceberg connector can't drop tables if the metadata is missing #12318

Closed
ebyhr opened this issue May 10, 2022 · 2 comments · Fixed by #16674
Closed

Iceberg connector can't drop tables if the metadata is missing #12318

ebyhr opened this issue May 10, 2022 · 2 comments · Fixed by #16674
Assignees

Comments

@ebyhr
Copy link
Member

ebyhr commented May 10, 2022

trino> DROP TABLE iceberg.tpch.part;
Query 20220510_125406_00031_cyxbf failed: Failed to open input stream for file: /var/folders/8s/dkvf18z55lj_9yxhy1n54sph0000gn/T/TrinoTest16886600552589191001/iceberg_data/tpch/part/metadata/00000-f03ef6ff-f9b3-4350-9de0-24af56431ef2.metadata.json
org.apache.iceberg.exceptions.NotFoundException: Failed to open input stream for file: /var/folders/8s/dkvf18z55lj_9yxhy1n54sph0000gn/T/TrinoTest16886600552589191001/iceberg_data/tpch/part/metadata/00000-f03ef6ff-f9b3-4350-9de0-24af56431ef2.metadata.json
	at org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:177)
	at io.trino.plugin.hive.authentication.NoHdfsAuthentication.doAs(NoHdfsAuthentication.java:25)
	at io.trino.plugin.hive.HdfsEnvironment.doAs(HdfsEnvironment.java:97)
	at io.trino.plugin.iceberg.HdfsInputFile.newStream(HdfsInputFile.java:62)
	at org.apache.iceberg.TableMetadataParser.read(TableMetadataParser.java:250)
	at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.lambda$refreshFromMetadataLocation$1(AbstractIcebergTableOperations.java:223)
	at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
	at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:214)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:198)
	at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:190)
	at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.refreshFromMetadataLocation(AbstractIcebergTableOperations.java:222)
	at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.refresh(AbstractIcebergTableOperations.java:127)
	at io.trino.plugin.iceberg.catalog.AbstractIcebergTableOperations.current(AbstractIcebergTableOperations.java:110)
	at io.trino.plugin.iceberg.catalog.hms.TrinoHiveCatalog.lambda$loadTable$8(TrinoHiveCatalog.java:287)
	at java.base/java.util.concurrent.ConcurrentHashMap.computeIfAbsent(ConcurrentHashMap.java:1705)
	at io.trino.plugin.iceberg.catalog.hms.TrinoHiveCatalog.loadTable(TrinoHiveCatalog.java:285)
	at io.trino.plugin.iceberg.IcebergMetadata.getTableHandle(IcebergMetadata.java:261)
	at io.trino.plugin.iceberg.IcebergMetadata.getTableHandle(IcebergMetadata.java:202)
	at io.trino.plugin.base.classloader.ClassLoaderSafeConnectorMetadata.getTableHandle(ClassLoaderSafeConnectorMetadata.java:184)
	at io.trino.metadata.MetadataManager.lambda$getTableHandle$5(MetadataManager.java:281)
	at java.base/java.util.Optional.flatMap(Optional.java:294)
	at io.trino.metadata.MetadataManager.getTableHandle(MetadataManager.java:261)
	at io.trino.metadata.MetadataManager.getRedirectionAwareTableHandle(MetadataManager.java:1486)
	at io.trino.metadata.MetadataManager.getRedirectionAwareTableHandle(MetadataManager.java:1478)
	at io.trino.execution.DropTableTask.execute(DropTableTask.java:79)
	at io.trino.execution.DropTableTask.execute(DropTableTask.java:37)
	at io.trino.execution.DataDefinitionExecution.start(DataDefinitionExecution.java:145)
	at io.trino.execution.SqlQueryManager.createQuery(SqlQueryManager.java:243)
	at io.trino.dispatcher.LocalDispatchQuery.lambda$startExecution$7(LocalDispatchQuery.java:143)
	at io.trino.$gen.Trino_testversion____20220510_124855_3.run(Unknown Source)
	at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
	at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
	at java.base/java.lang.Thread.run(Thread.java:829)
Caused by: java.io.FileNotFoundException: File /var/folders/8s/dkvf18z55lj_9yxhy1n54sph0000gn/T/TrinoTest16886600552589191001/iceberg_data/tpch/part/metadata/00000-f03ef6ff-f9b3-4350-9de0-24af56431ef2.metadata.json does not exist
	at org.apache.hadoop.fs.RawLocalFileSystem.deprecatedGetFileStatus(RawLocalFileSystem.java:666)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileLinkStatusInternal(RawLocalFileSystem.java:987)
	at org.apache.hadoop.fs.RawLocalFileSystem.getFileStatus(RawLocalFileSystem.java:656)
	at org.apache.hadoop.fs.FilterFileSystem.getFileStatus(FilterFileSystem.java:454)
	at org.apache.hadoop.fs.ChecksumFileSystem$ChecksumFSInputChecker.<init>(ChecksumFileSystem.java:146)
	at org.apache.hadoop.fs.ChecksumFileSystem.open(ChecksumFileSystem.java:347)
	at io.trino.plugin.hive.fs.TrinoFileSystemCache$FileSystemWrapper.open(TrinoFileSystemCache.java:307)
	at org.apache.hadoop.fs.FileSystem.open(FileSystem.java:899)
	at org.apache.iceberg.hadoop.HadoopInputFile.newStream(HadoopInputFile.java:175)
	... 32 more
@findepi
Copy link
Member

findepi commented Nov 14, 2022

We had similar problem in Delta and we solve this by making some table handle fields Optional. This isn't ideal, since we have a bunch of "fakely" optional fields just for the sake of the Drop table flow.

Drop table flow calls getTableHandle to ensure the dropped relation is indeed a table (and not a view, nor materialized view). This is nice, but we want Drop table to work on things that "look like a table", but cannot be accessed (eg because metadata file is missing, etc.)

Proposed solution

  • introduce dropTable SPI method that takes SchemaTableName
    • default implementation will call getTableHandle and ensure backwards compatibility
    • it will be implementing connectotor's responsibility to ensure the dropped relation isn't a view / MV
  • use it it Delta lake (making some table handle field non-optional, AFAIR)
    • the DeltaMetadata should still validate the dropped relation looks like a Delta table (has appropriate properties), but should succeed also when the on-disk state is missing
  • use it in Iceberg
    • it should should still validate the dropped relation looks like an Iceberg table

@findepi
Copy link
Member

findepi commented Nov 14, 2022

cc @electrum

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment