美文网首页
【2019-06-28】yarn服务nodemanger故障

【2019-06-28】yarn服务nodemanger故障

作者: 学师大术 | 来源:发表于2019-06-21 12:02 被阅读0次

问题描述

yarn服务故障,查看服务管理一个nodemanger状态异常

分析过程

1.首先分析启动日志,由于HEATH_CHECK_STOP停止了nodemanger

2019-06-19 15:13:29 | INFO  | PID-16052  | start to stop nodemanager | yarn-start-stop.sh
2019-06-19 15:13:29 | INFO  | PID-16052  | stop type: HEATH_CHECK_STOP. | yarn-start-stop.sh

2.分析nodemanger运行日志,全是delete app log dir的打印,直到最后收到RECEIVED SIGNAL 15,进程kill

2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11333211_DEL_1559995833078 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_10625928_DEL_1559995968558 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_12077960_DEL_1560384533291 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11315373_DEL_1559996652333 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11035836_DEL_1559996652333 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11127905_DEL_1559996105413 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11274943_DEL_1559996241710 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1547246203054_1961416_DEL_1550657777851 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1547246203054_2114204_DEL_1550657777851 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_10699626_DEL_1559996379568 | ResourceLocalizationService.java:1474
2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11842934_DEL_1560261203480 | ResourceLocalizationService.java:1474
2019-06-19 15:13:29,650 | ERROR | SIGTERM handler | RECEIVED SIGNAL 15: SIGTERM | LogAdapter.java:69

3.基于上述分析,nodemanger是在正常启动,只是启动时候需要清理大量的app的信息。由于还未清理完成,健康检查就失败,任务重启。

解决办法

1.手工先清理nodemanger日志,rm -rf /srv/BigData/hadoop/data*/nm
2.重启nodemanger

相关文章

网友评论

      本文标题:【2019-06-28】yarn服务nodemanger故障

      本文链接:https://www.haomeiwen.com/subject/ehkyqctx.html