standby 节点报错
2020-12-27 10:36:38,662 INFO common.Storage (Storage.java:tryLock(776)) - Lock on /export/hadoop/hdfs/namenode/in_use.lock acquired by nodename 35873@shyt-hadoop-4032.xx.com.cn
2020-12-27 10:36:38,665 WARN namenode.FSNamesystem (FSNamesystem.java:loadFromDisk(726)) - Encountered exception loading fsimage
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:234)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1077)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:724)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:697)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:761)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1001)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:985)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1710)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1778)
2020-12-27 10:36:38,669 INFO mortbay.log (Slf4jLog.java:info(67)) - Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@shyt-hadoop-4032.xx.com.cn:50070
2020-12-27 10:36:38,769 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(211)) - Stopping NameNode metrics system...
2020-12-27 10:36:38,770 INFO impl.MetricsSinkAdapter (MetricsSinkAdapter.java:publishMetricsFromQueue(141)) - timeline thread interrupted.
2020-12-27 10:36:38,772 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:stop(217)) - NameNode metrics system stopped.
2020-12-27 10:36:38,772 INFO impl.MetricsSystemImpl (MetricsSystemImpl.java:shutdown(606)) - NameNode metrics system shutdown complete.
2020-12-27 10:36:38,772 ERROR namenode.NameNode (NameNode.java:main(1783)) - Failed to start namenode.
java.io.IOException: NameNode is not formatted.
at org.apache.hadoop.hdfs.server.namenode.FSImage.recoverTransitionRead(FSImage.java:234)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFSImage(FSNamesystem.java:1077)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.loadFromDisk(FSNamesystem.java:724)
at org.apache.hadoop.hdfs.server.namenode.NameNode.loadNamesystem(NameNode.java:697)
at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:761)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:1001)
at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:985)
at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1710)
at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1778)
2020-12-27 10:36:38,774 INFO util.ExitUtil (ExitUtil.java:terminate(124)) - Exiting with status 1
2020-12-27 10:36:38,775 INFO timeline.HadoopTimelineMetricsSink (AbstractTimelineMetricsSink.java:getCurrentCollectorHost(278)) - No live collector to send metrics to. Metrics to be sent will be discarded. This message will be skipped for the next 20 times.
关键:Encountered exception loading fsimage 加载fsimage时遇到异常
排查路径
dfs.namenode.name.dir目录
解决问题:
方式 1 推荐
1、确保Active NameNode是正常工作,不要从Active NameNode节点/hadoop/hdfs/namenode目录下拷贝任何数据到Standby NameNode.
2、在Standby NameNode节点上执行
hdfs namenode -bootstrapStandby
Allows the standby NameNode's storage directories to be bootstrapped by copying the latest namespace snapshot from the active NameNode. This is used when first configuring an HA cluster.
该命令会恢复Standby NameNode节点的元数据
3、通过Ambari启动Standby NameNode
4、通过Ambari重启ZKFailoverController
方式 2
1、关闭整个集群,确认服务均已关闭
2、拷贝current数据至故障NN
scp -r root@xx.xx.40.32:/export/hadoop/hdfs/namenode/current .
3、授权
chown -R hdfs.hadoop current
4、删除/tmp 目录下的临时文件
5、重启集群
网友评论