美文网首页
【2019-06-28】yarn服务nodemanger故障

【2019-06-28】yarn服务nodemanger故障

作者: 学师大术 | 来源:发表于2019-06-21 12:02 被阅读0次

    问题描述

    yarn服务故障,查看服务管理一个nodemanger状态异常

    分析过程

    1.首先分析启动日志,由于HEATH_CHECK_STOP停止了nodemanger

    2019-06-19 15:13:29 | INFO  | PID-16052  | start to stop nodemanager | yarn-start-stop.sh
    2019-06-19 15:13:29 | INFO  | PID-16052  | stop type: HEATH_CHECK_STOP. | yarn-start-stop.sh
    

    2.分析nodemanger运行日志,全是delete app log dir的打印,直到最后收到RECEIVED SIGNAL 15,进程kill

    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11333211_DEL_1559995833078 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_10625928_DEL_1559995968558 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_12077960_DEL_1560384533291 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11315373_DEL_1559996652333 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11035836_DEL_1559996652333 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11127905_DEL_1559996105413 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11274943_DEL_1559996241710 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1547246203054_1961416_DEL_1550657777851 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1547246203054_2114204_DEL_1550657777851 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_10699626_DEL_1559996379568 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:26,899 | INFO  | main | delete app log dir,application_1550654406365_11842934_DEL_1560261203480 | ResourceLocalizationService.java:1474
    2019-06-19 15:13:29,650 | ERROR | SIGTERM handler | RECEIVED SIGNAL 15: SIGTERM | LogAdapter.java:69
    

    3.基于上述分析,nodemanger是在正常启动,只是启动时候需要清理大量的app的信息。由于还未清理完成,健康检查就失败,任务重启。

    解决办法

    1.手工先清理nodemanger日志,rm -rf /srv/BigData/hadoop/data*/nm
    2.重启nodemanger

    相关文章

      网友评论

          本文标题:【2019-06-28】yarn服务nodemanger故障

          本文链接:https://www.haomeiwen.com/subject/ehkyqctx.html