近日生产CDH5.7.2集群中的某一台机器在升级glibc和java后,Yarn NodeManager无法正常启动,提示报错如下:
2018-11-27 07:04:47,023 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, no leveldbjni in java.library.path, /tmp/libleveldbjni-64-1-2729554051258886702.8: libstdc++.so.6: cannot open shared object file: No such file or directory]
at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:864)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:195)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:157)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:195)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
2018-11-27 07:04:47,033 INFO org.apache.hadoop.service.AbstractService: Service NodeManager failed in state STOPPED; cause: java.lang.NullPointerException
处理该问题时还是走了一些的弯路,事后来看其实还是比较容易的。具体步骤如下:
1、确定java.library.path
首先需要确认java.library.path的具体路径,网上有一堆的介绍如何查看java.library.path,比如:
System.out.println("java.library.path: ");
System.out.println(System.getProperty("java.library.path"));
但是要知道,不同的进程java.library.path也许会不一样的,这里需要要找到nodemanager这个进程的java.library.path,那首先需要了解cdh是如何启动该进程。在cm上启动NodeManager进程时,会打印启动命令,如下图:
image.png
在该启动脚本的最后,可看到如下java启动命令:
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
fi
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $YARN_OPTS -classpath "$CLASSPATH" $CLASS "$@"
只需要修改下脚本,将JAVA_LIBRARY_PATH或者YARN_OPTS打印出来即可知道java.library.path了,修改脚本为:
if [ "x$JAVA_LIBRARY_PATH" != "x" ]; then
echo "=============zx=================="
echo $JAVA_LIBRARY_PATH
echo "=============zx=================="
YARN_OPTS="$YARN_OPTS -Djava.library.path=$JAVA_LIBRARY_PATH"
fi
echo "=============zx=================="
echo $YARN_OPTS
echo "=============zx=================="
exec "$JAVA" -Dproc_$COMMAND $JAVA_HEAP_MAX $YARN_OPTS -classpath "$CLASSPATH" $CLASS "$@"
重启NodeManager,在stdout查看输出结果如下:
image.png
至此,可以确认java.library.path=/opt/cloudera/parcels/CDH-5.7.2-1.cdh5.7.2.p0.18/lib/hadoop/lib/native
2、添加leveldbjni库
在1中,已经确定了java.library.path,该目录下的内容如下: image.png对比其他正常的yarn节点,目录下并未缺少任何文件。但是服务却无法启动,这可能与当前节点的glibc升级有关。既然不存在leveldbjni的库,那便给他安装一个。
安装leveldbjni库的方式非常有趣:
1) 首先下载leveldbjni-all-1.8.jar
2)解压该jar包,根据系统(这里是64位Centos6.7)在\META-INF\native\linux64目录下找到libleveldbjni.so文件
3)将libleveldbjni.so文件上传到1中java.library.path中,重启NodeManager。
想着是不是到这里就解决了,但是依旧报错:
2018-12-21 10:37:05,393 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
java.lang.UnsatisfiedLinkError: Could not load library. Reasons: [no leveldbjni64-1.8 in java.library.path, no leveldbjni-1.8 in java.library.path, /usr/lib64/libleveldbjni.so: libstdc++.so.6: cannot open shared object file: No such file or directory, /tmp/libleveldbjni-64-1-4428949605705254708.8: libstdc++.so.6: cannot open shared object file: No such file or directory]
at org.fusesource.hawtjni.runtime.Library.doLoad(Library.java:182)
at org.fusesource.hawtjni.runtime.Library.load(Library.java:140)
at org.fusesource.leveldbjni.JniDBFactory.<clinit>(JniDBFactory.java:48)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMLeveldbStateStoreService.initStorage(NMLeveldbStateStoreService.java:864)
at org.apache.hadoop.yarn.server.nodemanager.recovery.NMStateStoreService.serviceInit(NMStateStoreService.java:195)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartRecoveryStore(NodeManager.java:157)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceInit(NodeManager.java:195)
at org.apache.hadoop.service.AbstractService.init(AbstractService.java:163)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:474)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:521)
等等,是否与第一次报错不一样了?已经没报no leveldbjni in java.library.path,所以leveldbjni 库成功了,那接下来就是解决libstdc++.so.6。
网友评论