HiveServer2连接ZooKeeper出现Too many connections问题的解决
作者:大圆那些事| 文章可以转载,请以超链接形式标明文章原始出处和作者信息
HiveServer2支持多客户端的并发访问,使用ZooKeeper来管理Hive表的读写锁。实际环境中,遇到了HiveServer2连接ZooKeeper出现Too many connections的问题,这里是对这一问题的排查和解决过程。
问题描述
HiveServer2服务无法执行hive命令,日志中提示如下错误:
data:image/s3,"s3://crabby-images/07ec8/07ec86aa8c1ef3b426b1e9ca8f08483ae2ea4eb4" alt=""
2013-03-2212:54:43,946WARN zookeeper.ClientCnxn (ClientCnxn.java:run(1089)) - Session0x0forserver hostname/***.***.***.***:2181, unexpected error, closing socket connection and attempting reconnect
java.io.IOException: Connection reset by peer
at sun.nio.ch.FileDispatcher.read0(Native Method)
at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:21)
at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:233)
at sun.nio.ch.IOUtil.read(IOUtil.java:200)
at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:236)
at org.apache.zookeeper.ClientCnxnSocketNIO.doIO(ClientCnxnSocketNIO.java:68)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:355)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1068)
data:image/s3,"s3://crabby-images/9a6d8/9a6d8abd989b20ab89e92076d04011979845e2ca" alt=""
问题排查
1. 首先,根据HiveServer2的错误日志,提示是由于Connection reset by peer,即连接被ZooKeeper拒绝。
2. 进一步查看HiveServer2上所配置的ZooKeeper集群日志(用户Hive表的读写锁管理),发现如下错误信息:
2013-03-2212:52:48,938[myid:] - WARN [NIOServerCxn.Factory:0.0.0.0/0.0.0.0:2181:NIOServerCnxnFactory@193] - Too many connections from /***.***.***.***- max is50
3. 结合HiveServer2的日志,可见是由于HiveServer2所在机器对ZooKeeper的连接数超过了ZooKeeper设置允许的单个client最大连接数(这里是50)。
4. 我们进一步确认了是不是完全都是HiveServer2占用了这50个连接,显示确实是HiveServer2进程内部占用了这50个连接(进程号26871即为HiveServer2进程):
data:image/s3,"s3://crabby-images/10772/10772d8ba8afea7a4f504af86cab85c4fd2bb82f" alt=""
[user@hostname ~]$sudonetstat -nap |grep2181tcp 00***.***.***.***:58089***.***.***.***:2181ESTABLISHED26871/java
tcp 00***.***.***.***:57837***.***.***.***:2181ESTABLISHED26871/java
tcp 00***.***.***.***:57853***.***.***.***:2181ESTABLISHED26871/java
……
(共计50个)
data:image/s3,"s3://crabby-images/1c557/1c5576a68d809d0cc515530d4cf0d5e94023c233" alt=""
5. 为什么HiveServer2会占用这么多连接?而实际并发请求量并没有这么多。只能从HiveServer2的实现原理找找线索,由于HiveServer2是通过Thrift实现的,怀疑是不是其内部维护连接池导致的?经过查看hive-default.xml中发现,其中默认配置了工作线程数(这里猜测每个工作线程会维护一个与ZooKeeper的连接,有待从代码级别进行验证):
data:image/s3,"s3://crabby-images/2de70/2de70f8db2a18a9450e1abb1469ab93f2f3a3559" alt=""
hive.server2.thrift.min.worker.threads5Minimum number of Thrift worker threadshive.server2.thrift.max.worker.threads100Maximum number of Thrift worker threads
data:image/s3,"s3://crabby-images/b05f7/b05f7a357ceab8cf53181b459b08760bbbbc0dcd" alt=""
问题解决
方法一:
通过在hive-site.xml中修改HiveServer2的Thrift工作线程数,减少与ZooKeeper的连接请求数。这样可能降低HiveServer2的并发处理能力。
方法二:
通过修改ZooKeeper的zoo.cfg文件中的maxClientCnxns选项,调大对于单个Client的连接数限制。
以上两个方法,需要根据自己的实际生产情况进行合理设置。
相关的配置选项:
1)hive-site.xml中:
data:image/s3,"s3://crabby-images/cbf06/cbf06d460b772deaf9467e462afcb996928cdc90" alt=""
hive.server2.thrift.min.worker.threads10Minimum number of Thrift worker threadshive.server2.thrift.max.worker.threads200Maximum number of Thrift worker threadshive.zookeeper.session.timeout60000Zookeeper client's session timeout. The client is disconnected, and as a result, all locks released, if a heartbeat is not sent in the timeout.
data:image/s3,"s3://crabby-images/27587/275870bf080238056dd62a01710fbf6b0179f5b8" alt=""
2)zoo.cfg中:
data:image/s3,"s3://crabby-images/90ba9/90ba9a7ca4d4ec0061ff69efea5ce4144de63ca4" alt=""
# Limits the number of concurrent connections (at the socket level) that a single client, identified by IP address
maxClientCnxns=200# The minimum session timeout in milliseconds that the server will allow the client to negotiate
minSessionTimeout=1000# The maximum session timeout in milliseconds that the server will allow the client to negotiate
maxSessionTimeout=60000
data:image/s3,"s3://crabby-images/a8838/a8838339db7115d017ef9b1093a054565f576683" alt=""
网友评论