美文网首页
Unable to start NodeManager: no

Unable to start NodeManager: no

作者: 凡尔Issac | 来源:发表于2018-12-21 20:07 被阅读33次

    近日生产CDH5.7.2集群中的某一台机器在升级glibc和java后,Yarn NodeManager无法正常启动,提示报错如下:

    2018-11-27 07:04:47,023 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
    java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /tmp/libleveldbjni-64-1-2729554051258886702.8: libstdc++.so.6: cannot open shared object file: No such file or directory]
            at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
            at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
            at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
            at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:864)
            at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:195)
            at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:157)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:195)
            at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
    2018-11-27 07:04:47,033 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state STOPPED; cause: java.lang.NullPointerException
    

    处理该问题时还是走了一些的弯路,事后来看其实还是比较容易的。具体步骤如下:

    1、确定java.library.path

    首先需要确认java.library.path的具体路径,网上有一堆的介绍如何查看java.library.path,比如:

            System.out.println("java.library.path: ");
            System.out.println(System.getProperty("java.library.path"));
    

    但是要知道,不同的进程java.library.path也许会不一样的,这里需要要找到nodemanager这个进程的java.library.path,那首先需要了解cdh是如何启动该进程。在cm上启动NodeManager进程时,会打印启动命令,如下图:


    image.png

    在该启动脚本的最后,可看到如下java启动命令:

    if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
      YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
    fi
    exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $YARN_OPTS -classpath "$CLASSPATH" $CLASS "$@"
    

    只需要修改下脚本,将JAVA_LIBRARY_PATH或者YARN_OPTS打印出来即可知道java.library.path了,修改脚本为:

    if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
      echo "=============zx=================="
      echo $JAVA_LIBRARY_PATH
      echo "=============zx=================="
      YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
    fi
    
    echo "=============zx=================="
    echo $YARN_OPTS
    echo "=============zx=================="
    
    exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $YARN_OPTS -classpath "$CLASSPATH" $CLASS "$@"
    
    重启NodeManager,在stdout查看输出结果如下: image.png

    至此,可以确认java.library.path=/opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/lib/hadoop/lib/native

    2、添加leveldbjni库
    在1中,已经确定了java.library.path,该目录下的内容如下: image.png

    对比其他正常的yarn节点,目录下并未缺少任何文件。但是服务却无法启动,这可能与当前节点的glibc升级有关。既然不存在leveldbjni的库,那便给他安装一个。
    安装leveldbjni库的方式非常有趣:
    1) 首先下载leveldbjni-all-1.8.jar
    2)解压该jar包,根据系统(这里是64位Centos6.7)在\META-INF\native\linux64目录下找到libleveldbjni.so文件
    3)将libleveldbjni.so文件上传到1中java.library.path中,重启NodeManager。
    想着是不是到这里就解决了,但是依旧报错:

    2018-12-21 10:37:05,393 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
    java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, /usr/lib64/libleveldbjni.so: libstdc++.so.6: cannot open shared object file: No such file or directory, /tmp/libleveldbjni-64-1-4428949605705254708.8: libstdc++.so.6: cannot open shared object file: No such file or directory]
            at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
            at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
            at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
            at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:864)
            at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:195)
            at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:157)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:195)
            at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
            at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
    

    等等,是否与第一次报错不一样了?已经没报no leveldbjni in java.library.path,所以leveldbjni 库成功了,那接下来就是解决libstdc++.so.6。

    3、libstdc++.so.6
    首先检查libstdc是不是在系统上存在,如果不存在需要通过yum install libstdc**安装。在检查后发现系统已经存在这个库: image.png 既然存在只需要在java.library.path下创建一个软连接即可。 image.png 再重启NodeManager,终于成功了!

    相关文章

      网友评论

          本文标题:Unable to start NodeManager: no

          本文链接:https://www.haomeiwen.com/subject/mspvkqtx.html