You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
2021-05-14 05:20:21.121 WARN org.apache.flink.runtime.taskmanager.Task - IcebergFilesCommitter -> Sink: IcebergSink iceberg_zjyprc_hadoop.dw_business.dwd_ord_ord_df (1/1) (b4bf4cbece2ea2c9096b6230c5dea49a) switched from RUNNING to FAILED.
org.apache.iceberg.exceptions.NotFoundException: Failed to open input stream for file: hdfs://zjyprc-hadoop/user/h_data_platform/datalake/dw_business.db/dwd_ord_ord_df/metadata/00181-e5157483-2778-4fdd-8625-14877aa94557.metadata.json
I found there was a commit failure:
2021-05-13 08:56:10.362 INFO org.apache.flink.runtime.checkpoint.CheckpointCoordinator - Triggering checkpoint 195 (type=CHECKPOINT) @ 1620867370361 for job 32d73761e49c566d48ca1f0ba5e38883.
2021-05-13 09:01:43.832 INFO org.apache.flink.runtime.executiongraph.ExecutionGraph - IcebergFilesCommitter -> Sink: IcebergSink iceberg_zjyprc_hadoop.dw_business.ods_xmshop_xm_order_v4 (1/1) (71cfb3cf676056207c0c7958e283dc71) switched from RUNNING to FAILED on org.apache.flink.runtime.jobmaster.slotpool.SingleLogicalSlot@340127ef.
java.lang.RuntimeException: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.iceberg.relocated.com.google.common.base.Throwables.propagate(Throwables.java:241)
at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:80)
at org.apache.iceberg.hive.HiveTableOperations.lambda$persistTable$5(HiveTableOperations.java:310)
at org.apache.iceberg.hive.ClientPoolImpl.run(ClientPoolImpl.java:55)
at org.apache.iceberg.hive.CachedClientPool.run(CachedClientPool.java:76)
at org.apache.iceberg.hive.HiveTableOperations.persistTable(HiveTableOperations.java:306)
at org.apache.iceberg.hive.HiveTableOperations.doCommit(HiveTableOperations.java:222)
at org.apache.iceberg.BaseMetastoreTableOperations.commit(BaseMetastoreTableOperations.java:118)
at org.apache.iceberg.SnapshotProducer.lambda$commit$2(SnapshotProducer.java:300)
at org.apache.iceberg.util.Tasks$Builder.runTaskWithRetry(Tasks.java:404)
at org.apache.iceberg.util.Tasks$Builder.runSingleThreaded(Tasks.java:213)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:197)
at org.apache.iceberg.util.Tasks$Builder.run(Tasks.java:189)
at org.apache.iceberg.SnapshotProducer.commit(SnapshotProducer.java:282)
at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitOperation(IcebergFilesCommitter.java:308)
at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitDeltaTxn(IcebergFilesCommitter.java:295)
at org.apache.iceberg.flink.sink.IcebergFilesCommitter.commitUpToCheckpoint(IcebergFilesCommitter.java:219)
at org.apache.iceberg.flink.sink.IcebergFilesCommitter.notifyCheckpointComplete(IcebergFilesCommitter.java:189)
at org.apache.flink.streaming.runtime.tasks.StreamOperatorWrapper.notifyCheckpointComplete(StreamOperatorWrapper.java:107)
at org.apache.flink.streaming.runtime.tasks.SubtaskCheckpointCoordinatorImpl.notifyCheckpointComplete(SubtaskCheckpointCoordinatorImpl.java:283)
at org.apache.flink.streaming.runtime.tasks.StreamTask.notifyCheckpointComplete(StreamTask.java:958)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointCompleteAsync$7(StreamTask.java:929)
at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$notifyCheckpointOperation$9(StreamTask.java:945)
at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:47)
at org.apache.flink.streaming.runtime.tasks.mailbox.Mail.run(Mail.java:78)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.processMail(MailboxProcessor.java:282)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxStep(MailboxProcessor.java:190)
at org.apache.flink.streaming.runtime.tasks.mailbox.MailboxProcessor.runMailboxLoop(MailboxProcessor.java:181)
at org.apache.flink.streaming.runtime.tasks.StreamTask.runMailboxLoop(StreamTask.java:558)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:530)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:721)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:546)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.thrift.transport.TTransportException: java.net.SocketTimeoutException: Read timed out
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:129)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.thrift.transport.TSaslTransport.readLength(TSaslTransport.java:376)
at org.apache.thrift.transport.TSaslTransport.readFrame(TSaslTransport.java:453)
at org.apache.thrift.transport.TSaslTransport.read(TSaslTransport.java:435)
at org.apache.thrift.transport.TSaslClientTransport.read(TSaslClientTransport.java:37)
at org.apache.thrift.transport.TTransport.readAll(TTransport.java:86)
at org.apache.iceberg.shaded.org.apache.hadoop.hive.thrift.TFilterTransport.readAll(TFilterTransport.java:62)
at org.apache.thrift.protocol.TBinaryProtocol.readAll(TBinaryProtocol.java:429)
at org.apache.thrift.protocol.TBinaryProtocol.readI32(TBinaryProtocol.java:318)
at org.apache.thrift.protocol.TBinaryProtocol.readMessageBegin(TBinaryProtocol.java:219)
at org.apache.thrift.TServiceClient.receiveBase(TServiceClient.java:69)
at org.apache.iceberg.shaded.org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.recv_alter_table_with_environment_context(ThriftHiveMetastore.java:1375)
at org.apache.iceberg.shaded.org.apache.hadoop.hive.metastore.api.ThriftHiveMetastore$Client.alter_table_with_environment_context(ThriftHiveMetastore.java:1359)
at org.apache.iceberg.shaded.org.apache.hadoop.hive.metastore.HiveMetaStoreClient.alter_table(HiveMetaStoreClient.java:370)
at sun.reflect.GeneratedMethodAccessor51.invoke(Unknown Source)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at org.apache.iceberg.common.DynMethods$UnboundMethod.invokeChecked(DynMethods.java:65)
at org.apache.iceberg.common.DynMethods$UnboundMethod.invoke(DynMethods.java:77)
... 31 more
Caused by: java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116)
at java.net.SocketInputStream.read(SocketInputStream.java:171)
at java.net.SocketInputStream.read(SocketInputStream.java:141)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:246)
at java.io.BufferedInputStream.read1(BufferedInputStream.java:286)
at java.io.BufferedInputStream.read(BufferedInputStream.java:345)
at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:127)
... 50 more
I think it was because the commit was successful in the hive-metastore after the failure.
The text was updated successfully, but these errors were encountered:
@RussellSpitzer I have merged #2317 and #2328. In my case, It will return FAILURE status not UNKNOWN status. I think it can be solved by more retries, In #2596, I tried to put check-status in TableProperties.
Maybe It could always return UNKNOWN when committed failed, or return FAILURE but not delete files.
I was running a Flink job , there is the problem:
I found there was a commit failure:
I think it was because the commit was successful in the hive-metastore after the failure.
The text was updated successfully, but these errors were encountered: