美文网首页HDFS
HDFS 客户端常见报错整理

HDFS 客户端常见报错整理

作者: xudong1991 | 来源:发表于2021-09-02 09:28 被阅读0次

    概述

    HDFS 客户端在使用过程中,有下面两个过程:

    1. 向 NameNode 进行 RPC 请求
    2. 向 DataNode 进行 IO 读写。
      无论哪个过程,如果出现异常,一般都不会导致业务失败,也都有重试机制,实际上,业务想要失败是很难的。在实际使用过程中,客户端和 NN 之间的 RPC 交互一般不会有什么报错,大部分报错都出现在和 DN 的 IO 交互过程中,这篇文章主要总结一下常见的 DN IO 报错。

    客户端常见的 IO 报错

    1. 客户端写过程中,因为种种原因,无法成功建立流水线,此时会放弃出错的 DN,重新申请新的 DN 并建立流水线,几个典型的情况如下:

      1. 流水线中的第一个 DN 挂掉,导致流水线建立失败,由于这个 DN 是和客户端直连的,因此客户端能拿到具体的出错原因:

      21/02/22 15:34:23 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741830_1006
      java.net.ConnectException: Connection refused
      at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
      at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
      at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
      at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
      at org.apache.hadoop.hdfs.DataStreamer.createSocketForPipeline(DataStreamer.java:254)
      at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1740)
      at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
      at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
      21/02/22 15:34:23 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741830_1006
      21/02/22 15:34:23 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]

      1. 流水线中的第一个 DN 负载超标,导致流水线建立失败,日志如下:

      21/02/22 16:03:12 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741842_1019
      java.io.EOFException: Unexpected EOF while trying to read response from server
      at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
      at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
      at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
      at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
      21/02/22 16:03:12 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741842_1019
      21/02/22 16:03:12 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]

      1. 流水线中的其它 DN 出现问题(挂掉或者负载超标),导致流水线建立失败,由于这些 DN 并不和客户端直连,因此客户端往往拿不到具体的出错原因,只能知道出错 DN 的 IP:

      21/02/22 15:51:21 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073741835_1012
      java.io.IOException: Got error, status=ERROR, status message , ack with firstBadLink as 192.168.202.12:9003
      at org.apache.hadoop.hdfs.protocol.datatransfer.DataTransferProtoUtil.checkBlockOpStatus(DataTransferProtoUtil.java:121)
      at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1792)
      at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
      at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
      21/02/22 15:51:21 WARN hdfs.DataStreamer: Abandoning BP-239523849-192.168.202.11-1613727437316:blk_1073741835_1012
      21/02/22 15:51:21 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.12:9003,DS-b76f5779-927e-4f8c-b4fe-9db592ecadfa,DISK]

      1. 由于某些原因(如 DN IO 并发负载实在太高导致严重争锁),导致流水线建立超时(3副本默认75s 超时),日志如下:

      21/06/17 15:51:28 INFO hdfs.DataStreamer: Exception in createBlockOutputStream blk_1073742830_2006
      java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.13:56994 remote=/192.168.202.13:9003]
      at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
      at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
      at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
      at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
      at java.io.FilterInputStream.read(FilterInputStream.java:83)
      at java.io.FilterInputStream.read(FilterInputStream.java:83)
      at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
      at org.apache.hadoop.hdfs.DataStreamer.createBlockOutputStream(DataStreamer.java:1776)
      at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1694)
      at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
      21/06/17 15:51:28 WARN hdfs.DataStreamer: Abandoning BP-358940719-192.168.202.11-1623894544733:blk_1073742830_2006
      21/06/17 15:51:28 WARN hdfs.DataStreamer: Excluding datanode DatanodeInfoWithStorage[192.168.202.13:50010,DS-5bfd7a2e-9963-40b0-9f5d-50ffecde15c1,DISK]

    2. 写过程中,流水线中的某个 DN 突然挂掉,此时会进行一次错误恢复,日志如下:

    21/02/22 15:47:39 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010
    java.io.EOFException: Unexpected EOF while trying to read response from server
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:461)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
    at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
    21/02/22 15:47:39 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741834_1010 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.

    1. 写过程中,若客户端写一个 packet 之后,收到 ack(即回应)的时间超过 30s,则打印慢速告警,日志如下:

    [2021-06-17 15:22:58,929] WARN Slow ReadProcessor read fields took 37555ms (threshold=30000ms); ack: seqno: 343 reply: SUCCESS reply: SUCCESS reply: SUCCESS downstreamAckTimeNanos: 16503757088 flag: 0 flag: 0 flag: 0, targets: [DatanodeInfoWithStorage[9.10.146.124:9003,DS-cdab7fb8-c6ec-4f6b-8b6a-2a0c92aed6b6,DISK], DatanodeInfoWithStorage[9.10.146.98:9003,DS-346a7f42-4b12-4bac-8e58-8b33d972eb79,DISK], DatanodeInfoWithStorage[9.180.22.26:9003,DS-ad6cbeb4-9ce8-495b-b978-5c7aac66686f,DISK]]

    1. 写过程中,若客户端写一个 packet 之后 75s 还未收到 ack(2副本写的阈值为70s),则超时出错,并开始错误恢复,日志如下:

    21/02/22 16:09:35 WARN hdfs.DataStreamer: Exception for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021
    java.net.SocketTimeoutException: 75000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:44868 remote=/192.168.202.11:9003]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PipelineAck.readFields(PipelineAck.java:213)
    at org.apache.hadoop.hdfs.DataStreamer$ResponseProcessor.run(DataStreamer.java:1092)
    21/02/22 16:09:35 WARN hdfs.DataStreamer: Error Recovery for BP-239523849-192.168.202.11-1613727437316:blk_1073741844_1021 in pipeline [DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK], DatanodeInfoWithStorage[192.168.202.13:9003,DS-c211a421-b13d-4d46-9c28-e52426509f8a,DISK], DatanodeInfoWithStorage[192.168.202.14:9003,DS-6424283e-fad1-4b9a-aaed-dc6683e55a4d,DISK]]: datanode 0(DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]) is bad.

    1. 关闭文件(或手动调用 hflush、hsync)时,客户端会将尚未写入集群的数据全部 flush 到集群,如果 flush 的时间超过 30s,则打印慢速告警,日志如下:

    20/12/15 11:22:25 WARN DataStreamer: Slow waitForAckedSeqno took 45747ms (threshold=30000ms). File being written: /stage/interface/TEG/g_teg_common_teg_plan_bigdata/plan/exportBandwidth/origin/company/2020/1215/1059.parquet/_temporary/0/_temporary/attempt_20201215112121_0008_m_000021_514/part-00021-94e67782-be1b-48ae-b736-204624fa498c-c000.snappy.parquet, block: BP-1776336001-100.76.59.150-1482408994930:blk_16194984410_15220615717, Write pipeline datanodes: [DatanodeInfoWithStorage[100.76.29.36:9003,DS-4a301194-a232-46c6-b606-44b15a83ebed,DISK], DatanodeInfoWithStorage[100.76.60.168:9003,DS-24645191-aa52-4643-9c97-213b2a0bb41d,DISK], DatanodeInfoWithStorage[100.76.60.160:9003,DS-27ca6eb7-75b9-47a2-ae9d-de6d720f4d9a,DISK]].

    1. 写过程中,由于 DN 增量上报太慢,导致客户端没法及时分配出新的 block,会打印一些日志,并重试:

    21/02/22 16:16:53 INFO hdfs.DFSOutputStream: Exception while adding a block
    org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.hdfs.server.namenode.NotReplicatedYetException): Not replicated yet: /a.COPYING
    at org.apache.hadoop.hdfs.server.namenode.FSDirWriteFileOp.validateAddBlock(FSDirWriteFileOp.java:171)
    at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.getAdditionalBlock(FSNamesystem.java:2572)
    at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.addBlock(NameNodeRpcServer.java:885)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.addBlock(ClientNamenodeProtocolServerSideTranslatorPB.java:540)
    at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:448)
    at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:991)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:863)
    at org.apache.hadoop.ipc.Server$RpcCall.run(Server.java:806)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:2286)
    at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2541)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1520)
    at org.apache.hadoop.ipc.Client.call(Client.java:1466)
    at org.apache.hadoop.ipc.Client.call(Client.java:1376)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:227)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:116)
    at com.sun.proxy.$Proxy10.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolTranslatorPB.addBlock(ClientNamenodeProtocolTranslatorPB.java:472)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invokeMethod(RetryInvocationHandler.java:409)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeMethod(RetryInvocationHandler.java:163)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invoke(RetryInvocationHandler.java:155)
    at org.apache.hadoop.io.retry.RetryInvocationHandler$Call.invokeOnce(RetryInvocationHandler.java:95)
    at org.apache.hadoop.io.retry.RetryInvocationHandler.invoke(RetryInvocationHandler.java:346)
    at com.sun.proxy.$Proxy11.addBlock(Unknown Source)
    at org.apache.hadoop.hdfs.DFSOutputStream.addBlock(DFSOutputStream.java:1074)
    at org.apache.hadoop.hdfs.DataStreamer.locateFollowingBlock(DataStreamer.java:1880)
    at org.apache.hadoop.hdfs.DataStreamer.nextBlockOutputStream(DataStreamer.java:1683)
    at org.apache.hadoop.hdfs.DataStreamer.run(DataStreamer.java:722)
    21/02/22 16:16:53 WARN hdfs.DFSOutputStream: NotReplicatedYetException sleeping /a.COPYING retries left 4

    1. 写过程中,由于 DN 增量上报太慢,导致客户端无法及时 close 文件,会打印一些日志,并重试:

    2021-02-22 16:19:23,259 INFO hdfs.DFSClient: Could not complete /a.txt retrying...

    1. 读过程中,若目标 DN 已经提前挂掉,会打印一些连接异常日志,然后尝试其他 DN:

    21/02/22 16:29:33 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
    java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
    at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at a.a.TestWrite.main(TestWrite.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
    21/02/22 16:29:33 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030, add to deadNodes and continue.
    java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:541)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3039)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:814)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:739)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
    at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
    at java.io.DataInputStream.read(DataInputStream.java:100)
    at a.a.TestWrite.main(TestWrite.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
    21/02/22 16:29:34 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741852_1030

    1. 读过程中,和目标 DN 建立 TCP 连接超时导致出错,会打印一些日志,然后尝试其它 DN:

    2021-02-25 23:57:11,000 WARN org.apache.hadoop.hdfs.DFSClient: Connection failure: Failed to connect to /9.10.34.27:9003 for file /data/SPARK/part-r-00320.tfr.gz for block BP-1815681714-100.76.60.19-1523177824331:blk_10324215185_9339836899:org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
    org.apache.hadoop.net.ConnectTimeoutException: 60000 millis timeout while waiting for channel to be ready for connect. ch : java.nio.channels.SocketChannel[connection-pending remote=/9.10.34.27:9003]
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:534)
    at org.apache.hadoop.hdfs.DFSClient.newConnectedPeer(DFSClient.java:3450)
    at org.apache.hadoop.hdfs.BlockReaderFactory.nextTcpPeer(BlockReaderFactory.java:777)
    at org.apache.hadoop.hdfs.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:694)
    at org.apache.hadoop.hdfs.BlockReaderFactory.build(BlockReaderFactory.java:355)
    at org.apache.hadoop.hdfs.DFSInputStream.actualGetFromOneDataNode(DFSInputStream.java:1173)
    at org.apache.hadoop.hdfs.DFSInputStream.fetchBlockByteRange(DFSInputStream.java:1094)
    at org.apache.hadoop.hdfs.DFSInputStream.pread(DFSInputStream.java:1449)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:1412)
    at org.apache.hadoop.fs.FSDataInputStream.read(FSDataInputStream.java:89)

    1. 读过程中,和目标 DN 建立读取通道时,DN 超时无响应(默认阈值60s),会打印一些日志,然后尝试其它 DN:

    21/02/22 16:52:32 WARN impl.BlockReaderFactory: I/O error constructing remote block reader.
    java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
    at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
    at java.io.DataInputStream.readFully(DataInputStream.java:195)
    at java.io.DataInputStream.readFully(DataInputStream.java:169)
    at a.a.TestWrite.main(TestWrite.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
    21/02/22 16:52:32 WARN hdfs.DFSClient: Failed to connect to /192.168.202.11:9003 for file /a.txt for block BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069, add to deadNodes and continue.
    java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45318 remote=/192.168.202.11:9003]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:118)
    at java.io.FilterInputStream.read(FilterInputStream.java:83)
    at org.apache.hadoop.hdfs.protocolPB.PBHelperClient.vintPrefixed(PBHelperClient.java:459)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.newBlockReader(BlockReaderRemote.java:407)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReader(BlockReaderFactory.java:845)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.getRemoteBlockReaderFromTcp(BlockReaderFactory.java:742)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderFactory.build(BlockReaderFactory.java:384)
    at org.apache.hadoop.hdfs.DFSInputStream.getBlockReader(DFSInputStream.java:642)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:572)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:755)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)
    at java.io.DataInputStream.readFully(DataInputStream.java:195)
    at java.io.DataInputStream.readFully(DataInputStream.java:169)
    at a.a.TestWrite.main(TestWrite.java:25)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:240)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:152)
    21/02/22 16:52:32 INFO hdfs.DFSClient: Successfully connected to /192.168.202.14:9003 for BP-239523849-192.168.202.11-1613727437316:blk_1073741891_1069

    1. 读过程中,已经开始传输数据,但传输太慢导致超时(默认阈值60s),会打印一些日志,然后尝试其他 DN:

    21/02/22 16:44:30 WARN hdfs.DFSClient: Exception while reading from BP-239523849-192.168.202.11-1613727437316:blk_1073741889_1067 of /a.txt from DatanodeInfoWithStorage[192.168.202.11:9003,DS-0ae6e8c8-0f51-4459-b89f-b5f40ea7234d,DISK]
    java.net.SocketTimeoutException: 60000 millis timeout while waiting for channel to be ready for read. ch : java.nio.channels.SocketChannel[connected local=/192.168.202.11:45254 remote=/192.168.202.11:9003]
    at org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
    at org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.readChannelFully(PacketReceiver.java:256)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doReadFully(PacketReceiver.java:207)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.doRead(PacketReceiver.java:134)
    at org.apache.hadoop.hdfs.protocol.datatransfer.PacketReceiver.receiveNextPacket(PacketReceiver.java:102)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.readNextPacket(BlockReaderRemote.java:183)
    at org.apache.hadoop.hdfs.client.impl.BlockReaderRemote.read(BlockReaderRemote.java:142)
    at org.apache.hadoop.hdfs.ByteArrayStrategy.readFromBlock(ReaderStrategy.java:118)
    at org.apache.hadoop.hdfs.DFSInputStream.readBuffer(DFSInputStream.java:703)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:764)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:828)

    1. 读过程中,在所有 DN 上都找不到目标 block(即遇到了 missing block),报错如下:

    2021-02-22 16:57:59,009 WARN hdfs.DFSClient: No live nodes contain block BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 after checking nodes = [], ignoredNodes = null
    2021-02-22 16:57:59,009 WARN hdfs.DFSClient: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a No live nodes contain current block Block locations: Dead nodes: . Throwing a BlockMissingException
    2021-02-22 16:57:59,010 WARN hdfs.DFSClient: DFS Read
    org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-239523849-192.168.202.11-1613727437316:blk_1073741893_1071 file=/a
    at org.apache.hadoop.hdfs.DFSInputStream.refetchLocations(DFSInputStream.java:1053)
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1036)
    at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:1015)
    at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:647)
    at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:926)
    at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:982)
    at java.io.DataInputStream.readFully(DataInputStream.java:195)
    at java.io.DataInputStream.readFully(DataInputStream.java:169)
    at a.a.TestWrite.main(TestWrite.java:23)

    相关文章

      网友评论

        本文标题:HDFS 客户端常见报错整理

        本文链接:https://www.haomeiwen.com/subject/bsnxwltx.html