美文网首页
hive on spark Timed out waiting

hive on spark Timed out waiting

作者: Rex_2013 | 来源:发表于2020-08-20 09:23 被阅读0次

在beeline中使用hive on spark ,报错

ERROR : FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection.
INFO  : Completed executing command(queryId=root_20200819100850_49d1303d-4b6a-4ef2-968b-419f5a9dd036); Time taken: 90.058 seconds
INFO  : Concurrency mode is disabled, not creating a lock manager
Error: Error while processing statement: FAILED: Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.spark.SparkTask. java.util.concurrent.ExecutionException: java.util.concurrent.TimeoutException: Timed out waiting for client connection. (state=08S01,code=1)

由于hive程序的是通过yarn 去跑spark的,到Hadoop目录下查看resourcemanager日志

[root@node09 logs]# tail -f  hadoop-root-resourcemanager-node09.log
2020-08-19 10:41:22,695 INFO org.apache.hadoop.yarn.client.api.impl.TimelineConnector: Exception caught by TimelineClientConnectionRetry, will try 1 more time(s).
Message: java.net.ConnectException: Call From null to node09:8188 failed on connection exception: java.net.ConnectException: Connection refused (Connection refused); For more details see:  http://wiki.apach   e.org/hadoop/ConnectionRefused
2020-08-19 10:41:23,696 ERROR org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher: Error when publishing entity [YARN_APPLICATION,application_1597802656468_0003]
java.lang.RuntimeException: Failed to connect to timeline server. Connection retries limit exceeded. The posted timeline event may be missing
        at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineClientConnectionRetry.retryOn(TimelineConnector.java:357)
        at org.apache.hadoop.yarn.client.api.impl.TimelineConnector$TimelineJerseyRetryFilter.handle(TimelineConnector.java:404)
        at com.sun.jersey.api.client.Client.handle(Client.java:652)
        at com.sun.jersey.api.client.WebResource.handle(WebResource.java:682)
        at com.sun.jersey.api.client.WebResource.access$200(WebResource.java:74)
        at com.sun.jersey.api.client.WebResource$Builder.post(WebResource.java:570)
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPostingObject(TimelineWriter.java:157)
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:115)
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter$1.run(TimelineWriter.java:112)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1729)
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.doPosting(TimelineWriter.java:112)
        at org.apache.hadoop.yarn.client.api.impl.TimelineWriter.putEntities(TimelineWriter.java:92)
        at org.apache.hadoop.yarn.client.api.impl.TimelineClientImpl.putEntities(TimelineClientImpl.java:177)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.putEntity(TimelineServiceV1Publisher.java:370)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher.access$100(TimelineServiceV1Publisher.java:52)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:395)
        at org.apache.hadoop.yarn.server.resourcemanager.metrics.TimelineServiceV1Publisher$TimelineV1EventHandler.handle(TimelineServiceV1Publisher.java:391)
        at org.apache.hadoop.yarn.event.AsyncDispatcher.dispatch(AsyncDispatcher.java:197)
        at org.apache.hadoop.yarn.event.AsyncDispatcher$1.run(AsyncDispatcher.java:126)
        at java.lang.Thread.run(Thread.java:748)

  • 原因分析: hive on spark 任务的状态一直是running,并且占用的内存资源也不能够释放
  • 问题描述:hive配置成spark引擎,提交任务到yarn,执行SQL 能够正确的返回结果,但是执行完毕,任务的状态一直是running,并且占用的内存资源也不能够释放

问题分析:spark on hive本质是spark-shell.sh,spark-shell.sh会一直占用进程,这样后面提交的hive on spark任务就不需要重复上传spark依赖,加速任务执行速度

  • 解决思路:如需执行mapreduce或者其他类型任务,切换其他队列或者强制结束spark进程

  • 解决步骤:

    • kill掉没有使用的application
[root@node09 logs]# yarn application --kill  application_1597802656468_0002
[root@node09 logs]# yarn application --kill  application_1597802656468_0003
0: jdbc:hive2://node09:10000/gmall>SET mapreduce.job.queuename 队列名;
  • 注意:不论使用的是beeline 还是DBeaver 连接hiveserver2,在spark on hive 如果配置是使用yarn的话。每一种客户端执行都会生成一个application。关闭DBeaver连接 或者关闭beeline。这个application还是会保留的。

参考 https://blog.csdn.net/weixin_43976998/article/details/107395836?utm_medium=distribute.pc_relevant.none-task-blog-baidulandingword-1&spm=1001.2101.3001.4242

相关文章

网友评论

      本文标题:hive on spark Timed out waiting

      本文链接:https://www.haomeiwen.com/subject/ivhrjktx.html