Hadoop高可用集群搭建可参照官方文档 https://hadoop.apache.org/docs/r2.6.5/hadoop-project-dist/hadoop-hdfs/HDFSHighAvailabilityWithQJM.html
基础知识
高可用集群搭建(Hadoop2.x)基础知识,此篇文章介绍较好,可认真学习。
题目:【Hadoop的高可用(HA)】
链接:https://www.jianshu.com/p/002b405006ec
作者:小小少年Boy
服务器准备
本文搭建所用服务器环境是在上篇文章【Hadoop学习笔记二:全分布式搭建(Hadoop1.x)】https://www.jianshu.com/p/e326dc24638a 基础上进行的,Hadoop的高可用集群服务器规划方案(本HA使用Quorum Journal Manager 共享 edit logs)如下图。
一、免密登录
两个NameNode节点之间需要进行主备切换,两者之间需要免秘钥登录。因上篇文章Node01已可免密钥登录Node02/Node03/Node04,还需配置Node02免密钥登录Node01。
操作Node02
#家目录
cd
#显示所有文件
ll -a
cd .ssh/
#生成id_dsa和id_dsa.pub
ssh-keygen -t dsa -P '' -f ~/.ssh/id_dsa
#id_dsa.pub 内容追加到 authorized_keys 自己实现免密钥
cat ~/.ssh/id_dsa.pub >> ~/.ssh/authorized_keys
#试一下是否免密登录了
ssh node02
#成功,退出
exit
公钥分发到Node01,Node01追加node02.pub到 authorized_keys
scp id_dsa.pub node01:`pwd`/node02.pub
cat ./node02.pub >> ~/.ssh/authorized_keys
二、配置项
1.hdfs-site.xml
<property>
<name>dfs.replication</name>
<value>2</value>
</property>
#mycluster(逻辑名称)代表一对主从,如两对用逗号隔开
<property>
<name>dfs.nameservices</name>
<value>mycluster</value>
</property>
#mycluster集群下有nn1,nn2(逻辑名称)两个节点
<property>
<name>dfs.ha.namenodes.mycluster</name>
<value>nn1,nn2</value>
</property>
#nn1,nn2指向物理机
<property>
<name>dfs.namenode.rpc-address.mycluster.nn1</name>
<value>node01:8020</value>
</property>
<property>
<name>dfs.namenode.rpc-address.mycluster.nn2</name>
<value>node02:8020</value>
</property>
#给浏览器提供访问地址
<property>
<name>dfs.namenode.http-address.mycluster.nn1</name>
<value>node01:50070</value>
</property>
<property>
<name>dfs.namenode.http-address.mycluster.nn2</name>
<value>node02:50070</value>
</property>
#配置journalnode集群,共享edits(Hadoop客户端操作指令记录)
<property>
<name>dfs.namenode.shared.edits.dir</name>
<value>qjournal://node01:8485;node02:8485;node03:8485/mycluster</value>
</property>
# /mycluster 代表当前这个集群,他的共享日志的存储路径,可以与之前集群不一样名字
#journalnode文件存放目录
<property>
<name>dfs.journalnode.edits.dir</name>
<value>/var/sxt/hadoop/ha/jn</value>
</property>
#故障转移的代理类(照官网抄即可)
<property>
<name>dfs.client.failover.proxy.provider.mycluster</name>
<value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
</property>
#一旦故障立刻隔离,应立即防止脑裂
#即ann节点故障时,选举机制通知ZK集群,ZK封装事件交给sbnn。sbnn拿到事件后先把ann状态强制转换改变了,再提升自己。
#sbnn通过ssh登录ann,采用sshfence方式去隔离
#此时免密钥是通过★私钥★进行操作的,配置2就是告诉私钥文件在哪里
<property>
<name>dfs.ha.fencing.methods</name>
<value>sshfence</value>
</property>
<property>
<name>dfs.ha.fencing.ssh.private-key-files</name>
<value>/root/.ssh/id_dsa</value>
</property>
2.core-site.xml
#原来配置的是只有单节点的主节点,现在客户端找到的是mycluster服务名称,通过他找到两个NN入口节点
<property>
<name>fs.defaultFS</name>
<value>hdfs://mycluster</value>
</property>
#配置所有的NN DN数据放到ha目录下
<property>
<name>hadoop.tmp.dir</name>
<value>/var/sxt/hadoop/ha</value>
</property>
3.把Node01上的配置分发到其他节点
scp core-site.xml hdfs-site.xml node02:`pwd`
scp core-site.xml hdfs-site.xml node03:`pwd`
scp core-site.xml hdfs-site.xml node04:`pwd`
4.配置完成后,首先启动JonurnalNode集群
此时集群可以运行起来,但不是一个HA集群(主备节点还不能Automatic Failover),还需要ZK接入,故下节记录如何安装配置ZK,完成后再启动集群。
三、安装zookeeper集群
1.官网摘要
Automatic failover adds two new components to an HDFS deployment: a ZooKeeper quorum, and the ZKFailoverController process (abbreviated as ZKFC).
The implementation of automatic HDFS failover relies on ZooKeeper for the following things:
-
Failure detection - 在ZooKeeper里每一个NameNode都有一个session和他绑定,如果某个NameNode节点挂了session将会过期,注册信息就消失。通知另一个NameNode节点,故障转移触发。
-
Active NameNode election - ZooKeeper provides a simple mechanism to exclusively elect a node as active. If the current active NameNode crashes, another node may take a special exclusive lock in ZooKeeper indicating that it should become the next active.
The ZKFailoverController (ZKFC) is a new component which is a ZooKeeper client which also monitors and manages the state of the NameNode. Each of the machines which runs a NameNode also runs a ZKFC, and that ZKFC is responsible for:
- Health monitoring - the ZKFC pings its local NameNode on a periodic basis with a health-check command. So long as the NameNode responds in a timely fashion with a healthy status, the ZKFC considers the node healthy. If the node has crashed, frozen, or otherwise entered an unhealthy state, the health monitor will mark it as unhealthy.
- ZooKeeper session management - when the local NameNode is healthy, the ZKFC holds a session open in ZooKeeper. If the local NameNode is active, it also holds a special "lock" znode. This lock uses ZooKeeper's support for "ephemeral" nodes; if the session expires, the lock node will be automatically deleted.
- ZooKeeper-based election - if the local NameNode is healthy, and the ZKFC sees that no other node currently holds the lock znode, it will itself try to acquire the lock. If it succeeds, then it has "won the election", and is responsible for running a failover to make its local NameNode active. The failover process is similar to the manual failover described above: first, the previous active is fenced if necessary, and then the local NameNode transitions to active state.
Deploying ZooKeeper
In a typical deployment, ZooKeeper daemons are configured to run on three or five nodes. Since ZooKeeper itself has light resource requirements, it is acceptable to collocate the ZooKeeper nodes on the same hardware as the HDFS NameNode and Standby Node. Many operators choose to deploy the third ZooKeeper process on the same node as the YARN ResourceManager. It is advisable to configure the ZooKeeper nodes to store their data on separate disk drives from the HDFS metadata for best performance and isolation.
The setup of ZooKeeper is out of scope for this document. We will assume that you have set up a ZooKeeper cluster running on three or more nodes, and have verified its correct operation by connecting using the ZK CLI.
Before you begin
Before you begin configuring automatic failover, you should shut down your cluster. It is not currently possible to transition from a manual failover setup to an automatic failover setup while the cluster is running.
2.配置项
(1) hdfs-site.xml
#首先开启自动故障转移
<property>
<name>dfs.ha.automatic-failover.enabled</name>
<value>true</value>
</property>
(2)core-site.xml
#配置Zookeeper集群机器
<property>
<name>ha.zookeeper.quorum</name>
<value>node02:2181,node03:2181,node04:2181</value>
</property>
3.Node02/Node03/Node04节点安装启动Zookeeper
(1)解压
tar xf zookeeper-3.4.6.tar.gz -C /opt/sxt/
(2)配置修改
cd zookeeper-3.4.6/conf
mv zoo_sample.cfg zoo.cfg
vi zoo.cfg
#修改(ZK的文档扔到这个目录下)
dataDir=/var/sxt/zk
#文档后追加配置
#zk集群前提条件:1.一开始告诉集群数量 2.把每台服务器编制serverid
server.1=node02:2888:3888
server.2=node03:2888:3888
server.3=node04:2888:3888
#2888 是主从节点之间的通信端口
#3888 是当发生主挂断之后,选举机制采用的通信端口
(3)Zookeeper分发到node03/node04
scp -r zookeeper-3.4.6/ node03:`pwd`
(4)每个节点创建数据存放路径,并在此路径下新增myid文件
#每个节点新增路径
mkdir -p /var/sxt/zk
#每个节点添加id文件
echo 1 > /var/sxt/zk/myid
echo 2 > /var/sxt/zk/myid
echo 3 > /var/sxt/zk/myid
(5)配置环境变量
vi + /etc/profile
export ZOOKEEPER_HOME=/opt/sxt/zookeeper-3.4.6
PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:$ZOOKEEPER_HOME/bin
#配置完分发给node03/node04 后每个节点 source一下
(6)Zookeeper启动
#三台同时执行
zkServer.sh start
#查看是否启动 (是否有QuorumPeerMain进程)
jps
#查看集群状态
zkServer.sh status
注意:zookeeper默认id最大的作为主,其他为从。但一个一个启动(三台服务器至少启动两台,集群才启动)先启动的两个ID大的是主,后启动的第三个还是从。若是五台集群,先启动的前三个个ID大的是主。
四、启动HA集群
1.启动Zookeeper
2.启动JonurnalNode集群(先于format执行)
#分别在node01 node02 node03上启动journalnode
hadoop-daemon.sh start journalnode
#查看进程(1421 JournalNode)
jps
3.hadoop集群格式化
随意找一个nn节点格式化,启动该节点
hdfs namenode -format
hadoop-daemon.sh start namenode
另一nn节点同步,拷贝元数据信息:
hdfs namenode -bootstrapStandby
#(同步成功,会发现同步另一个nn节点的clusterID 不是秘钥分发,而是同步过来的)
cat /var/sxt/hadoop/ha/dfs/name/current/VERSION
4.格式化zkfc
格式化zkfc,在zookeeper中可见目录创建:
hdfs zkfc -formatZK
#(ha.ActiveStandbyElector: Successfully created /hadoop-ha/mycluster in ZK.)
在zookeeper 客户端可见:
zkCli.sh
[zk: localhost:2181(CONNECTED) 1] ls /
[hadoop-ha, zookeeper]
[zk: localhost:2181(CONNECTED) 2] ls /hadoop-ha
[mycluster]
5.启动集群
start-dfs.sh
Starting namenodes on [node01 node02]
node01: namenode running as process 3421. Stop it first.
node02: starting namenode, logging to /opt/sxt/hadoop-2.6.5/logs/hadoop-root-namenode-node02.out
node04: starting datanode, logging to /opt/sxt/hadoop-2.6.5/logs/hadoop-root-datanode-node04.out
node03: starting datanode, logging to /opt/sxt/hadoop-2.6.5/logs/hadoop-root-datanode-node03.out
node02: starting datanode, logging to /opt/sxt/hadoop-2.6.5/logs/hadoop-root-datanode-node02.out
Starting journal nodes [node01 node02 node03]
node03: journalnode running as process 1759. Stop it first.
node01: journalnode running as process 3302. Stop it first.
node02: journalnode running as process 2743. Stop it first.
Starting ZK Failover Controllers on NN hosts [node01 node02]
node02: starting zkfc, logging to /opt/sxt/hadoop-2.6.5/logs/hadoop-root-zkfc-node02.out
node01: starting zkfc, logging to /opt/sxt/hadoop-2.6.5/logs/hadoop-root-zkfc-node01.out
再次查看zk客户端,可见:
[zk: localhost:2181(CONNECTED) 9] ls /hadoop-ha/mycluster
[ActiveBreadCrumb, ActiveStandbyElectorLock]
#两个目录的数据,谁是主谁被创建:
[zk: localhost:2181(CONNECTED) 11] get /hadoop-ha/mycluster/ActiveBreadCrumb
浏览器查看集群
http://node01:50070
http://node02:50070
6.测试主备
停止node01的NN,可观察Node02成为Activy
hadoop-daemon.sh stop namenode
实际测试时出现问题:当停止node01的namenode时,node02并未成为Activity,当从新启动node01的namenode,此时node02成为Activity。
查了问题解决办法,最终如此解决:
查看logs->hadoop-root-zkfc-master.log发现是
java.io.FileNotFoundException: /home/exampleuser/.ssh/id_rsa 配置文件写错
7.关闭集群
#关闭hadoop集群
stop-dfs.sh
#关闭zk集群
zkServer.sh stop
下次启动时 先启动zk集群->启动hadoop集群,不用启动JonurnalNode
8.其他问题
多次格式化文件系统时会出现datanode无法启动,查看日志(/usr/local/hadoop/logs/hadoop-hadoop-datanode-xsh.log),发现错误为:
2016-07-17 21:22:14,616 FATAL org.apache.hadoop.hdfs.server.datanode.DataNode: Initialization failed for Block pool <registering> (Datanode Uuid unassigned) service to localhost/127.0.0.1:9000. Exiting.
java.io.IOException: Incompatible clusterIDs in /usr/local/hadoop/tmp/dfs/data: namenode clusterID = CID-fd069c99-8004-47e1-9f67-a619bf4e9b60; datanode clusterID = CID-9a628355-6954-473b-a66c-d34d7c2b3805
问题产生原因
当我们执行文件系统格式化时,会在namenode数据文件夹(即配置文件中dfs.namenode.name.dir在本地系统的路径)中保存一个current/VERSION文件,记录clusterID,标识了所格式化的 namenode的版本。如果我们频繁的格式化namenode,那么datanode中保存(即配置文件中dfs.data.dir在本地系统的路径)的current/VERSION文件只是你第一次格式化时保存的namenode的ID,因此就会造成datanode与namenode之间的id不一致。
解决办法
把配置文件中dfs.datanode.data.dir在本地系统的路径下的current/VERSION中的clusterID改为与namenode一样。重启即可!
网友评论