Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Connection reset by peer while writing to hdfs #31

Open
StrongestNumber9 opened this issue Jul 8, 2024 · 0 comments
Open

Connection reset by peer while writing to hdfs #31

StrongestNumber9 opened this issue Jul 8, 2024 · 0 comments
Assignees
Labels
bug Something isn't working

Comments

@StrongestNumber9
Copy link
Contributor

Describe the bug

from cfe_39 logs

java[17409]: java.io.IOException: Connection reset by peer
java[17409]:         at sun.nio.ch.FileDispatcherImpl.read0(Native Method) ~[?:1.8.0_412]
java[17409]:         at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39) ~[?:1.8.0_412]
java[17409]:         at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223) ~[?:1.8.0_412]
java[17409]:         at sun.nio.ch.IOUtil.read(IOUtil.java:197) ~[?:1.8.0_412]
java[17409]:         at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379) ~[?:1.8.0_412]
java[17409]:         at org.apache.hadoop.net.SocketInputStream$Reader.performIO(SocketInputStream.java:57) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:141) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118) ~[cfe_39.jar:0.2.0]
java[17409]:         at java.io.FilterInputStream.read(FilterInputStream.java:83) ~[?:1.8.0_412]
java[17409]:         at java.io.FilterInputStream.read(FilterInputStream.java:83) ~[?:1.8.0_412]
java[17409]:         at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:519) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1811) [cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1728) [cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:713) [cfe_39.jar:0.2.0]
java[17409]: 17:19:18.487 [Thread-7] WARN  org.apache.hadoop.hdfs.DataStreamer - Abandoning BP-1857759457-XXXXX-1708423446635:blk_1075894953_2154131
java[17409]: 17:19:18.492 [Thread-7] WARN  org.apache.hadoop.hdfs.DataStreamer - Excluding datanode DatanodeInfoWithStorage[XXXXXX:9004,DS-dd4c87a8-8cd5-4c39-a777-b5e459e20f23,DISK]
java[17409]: 17:19:18.501 [Thread-7] WARN  org.apache.hadoop.hdfs.DataStreamer - Exception in createBlockOutputStream blk_1075894954_2154132

also

java[17409]: org.apache.hadoop.ipc.RemoteException: File /tmp/test/example/20.10369256 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
java[17409]:         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
java[17409]:         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
java[17409]:         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
java[17409]:         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
java[17409]:         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
java[17409]:         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
java[17409]:         at java.security.AccessController.doPrivileged(Native Method)
java[17409]:         at javax.security.auth.Subject.doAs(Subject.java:422)
java[17409]:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
java[17409]:         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
java[17409]:         at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1567) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.ipc.Client.call(Client.java:1513) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.ipc.Client.call(Client.java:1410) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:258) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Invoker.invoke(ProtobufRpcEngine2.java:139) ~[cfe_39.jar:0.2.0]
java[17409]:         at com.sun.proxy.$Proxy26.addBlock(Unknown Source) ~[?:?]
java[17409]:         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:531) ~[cfe_39.jar:0.2.0]
java[17409]:         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_412]
java[17409]:         at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_412]
java[17409]:         at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_412]
java[17409]:         at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_412]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:433) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:166) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:158) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:96) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:362) ~[cfe_39.jar:0.2.0]
java[17409]:         at com.sun.proxy.$Proxy27.addBlock(Unknown Source) ~[?:?]
java[17409]:         at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1088) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1915) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1717) ~[cfe_39.jar:0.2.0]
java[17409]:         at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:713) [cfe_39.jar:0.2.0]
java[17409]: Exception in thread "example1" java.lang.RuntimeException: org.apache.hadoop.ipc.RemoteException(java.io.IOException): File /tmp/test/example/20.10369256 could only be written to 0 of the 1 minReplication nodes. There are 3 datanode(s) running and 3 node(s) are excluded in this operation.
java[17409]:         at org.apache.hadoop.hdfs.server.blockmanagement.BlockManager.chooseTarget4NewBlock(BlockManager.java:2350)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.chooseTargetForNewBlock(FSDirWriteFileOp.java:294)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2989)
java[17409]:         at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:912)
java[17409]:         at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:595)
java[17409]:         at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:621)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:589)
java[17409]:         at org.apache.hadoop.ipc.ProtobufRpcEngine2$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine2.java:573)
java[17409]:         at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1227)
java[17409]:         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1094)
java[17409]:         at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:1017)
java[17409]:         at java.security.AccessController.doPrivileged(Native Method)
java[17409]:         at javax.security.auth.Subject.doAs(Subject.java:422)
java[17409]:         at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1899)
java[17409]:         at org.apache.hadoop.ipc.Server$Handler.run(Server.java:3048)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.HDFSWrite.commit(HDFSWrite.java:182)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.DatabaseOutput.accept(DatabaseOutput.java:333)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.DatabaseOutput.accept(DatabaseOutput.java:71)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.KafkaReader.read(KafkaReader.java:95)
java[17409]:         at com.teragrep.cfe_39.consumers.kafka.ReadCoordinator.run(ReadCoordinator.java:133)
java[17409]:         at java.lang.Thread.run(Thread.java:750)

from datanode logs

hdfs[2572896]: 2024-07-08 17:19:18,482 INFO datanode.DataNode: Failed to read expected SASL data transfer protection handshake from client at /XXXXX:46818. Perhaps the client is running an older version of Hadoop which does not support SASL data transfer protection
hdfs[2572896]: org.apache.hadoop.hdfs.protocol.datatransfer.sasl.InvalidMagicNumberException: Received 1c508e instead of deadbeef from client.
hdfs[2572896]:         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.doSaslHandshake(SaslDataTransferServer.java:374)
hdfs[2572896]:         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.getSaslStreams(SaslDataTransferServer.java:308)
hdfs[2572896]:         at org.apache.hadoop.hdfs.protocol.datatransfer.sasl.SaslDataTransferServer.receive(SaslDataTransferServer.java:135)
hdfs[2572896]:         at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:236)
hdfs[2572896]:         at java.lang.Thread.run(Thread.java:750)

Expected behavior

Writes properly

How to reproduce

QA environment, kerberized hdfs

Software version

0.2.0 beta

@StrongestNumber9 StrongestNumber9 added the bug Something isn't working label Jul 8, 2024
@StrongestNumber9 StrongestNumber9 changed the title Connection reset by peer whil Connection reset by peer while writing to hdfs Jul 8, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants