美文网首页
hadoop 完全分布式HA高可用自动切换集群环境搭建

hadoop 完全分布式HA高可用自动切换集群环境搭建

作者: 余长生 | 来源:发表于2019-11-05 15:37 被阅读0次

    参考了很多博客,根据一些博客综合之后自己搭建成功了,写了这篇文章记录一下过程,希望给各位朋友一些帮助

    环境准备

    自己准备虚拟机环境配置好虚拟机[vm搭建] (http://note.youdao.com/noteshare?id=1cd5582fbe7351bb04e45f9e740ecd6f)

    linux系统:CentOS-7-x86_64-Minimal-1810
    jdk版本:jdk8+
    zookeeper: zookeeper-3.4.14
    hadoop: hadoop-2.7.7
    

    jdk-8u231
    zookeeper-3.4.14
    hadoop-2.7.7

    卸载自带jdk,使用jdk1.8+以上

    rpm -qa | grep java
    使用rpm进行卸载
    rpm -e java-xxx
    rpm -e --nodeps java-xxx   #强制卸载
    

    一、分布式集群规划

    节点名称 IP地址 NAMENODE(NN) DATANODE(DN) JJN ZKFC ZK
    hadoop1 172.16.0.161 namenode 1 datanode1 journalnode zkfc zookeeper
    hadoop2 172.16.0.162 namenode 2 datanode2 journalnode zkfc zookeeper
    hadoop3 172.16.0.163 datanode3 journalnode zookeeper

    二、网络IP规划

    2.1 修改主机名

    以centos7为例修改主机名

    vi /etc/hostname
    
    在三台机器上分别执行
    hostnamectl set-hostname hadoop1
    hostnamectl set-hostname hadoop2
    hostnamectl set-hostname hadoop3
    
    补充:centos6修改主机名
    vi /etc/sysconfig/network  
    将里面原来的信息修改如下:
    NETWORKING=yes
    HOSTNAME=hadoop1
    
    
    2.2 修改相对应的hosts
    查看ip  ip addr
    
    vi /etc/hosts 添加主机映射
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    
    
    172.16.0.161 hadoop1   #由于我的虚拟机ip被占用所有改成了172.16.0.170
    172.16.0.162 hadoop2
    172.16.0.163 hadoop3
    
    重启系统 reboot
    
    重启网路  service network start
    
    

    三、关闭防火墙

    Centos7系统默认防火墙不是iptables,而是firewall,那就得使用以下方式关闭防火墙了。
    
    systemctl stop firewalld.service            #停止firewall
    systemctl disable firewalld.service        #禁止firewall开机启动
    
    补充:Centos6关闭防火墙
    service iptables status             #查看防火墙状态
    service iptables stop               #关闭防火墙,但是重启后会恢复原来状态
    chkconfig iptables --list           #查看系统中防火墙的自动
    chkconfig iptables off              #关闭防火墙自启动
    chkconfig iptables --list           #再次查看防火墙自启动的情况,所有启动状态都变成额off
    
    

    四、设置ssh免密登录

    关于ssh免密码的设置,要求每两台主机之间设置免密码,自己的主机与自己的主机之间也要求设置免密码。 这项操作可以在root用户下执行,执行完毕公钥在/home/root/.ssh/id_rsa.pub
    
    [root@hadoop1 ~]# ssh-keygen -t rsa
    [root@hadoop1 ~]# ssh-copy-id hadoop1
    [root@hadoop1 ~]# ssh-copy-id hadoop2
    [root@hadoop1 ~]# ssh-copy-id hadoop3
    
    

    免密登录有时候没有成功遇到permission错误,可能是ids_rsa权限给大了,降低下就可以了

    问题:Permissions 0644 for '/root/.ssh/id_rsa' are too open.
    It is required that your private key files are NOT accessible by others.
    This private key will be ignored.
    Load key "/root/.ssh/id_rsa": bad permissions
    root@192.168.108.130's password:
    Permission denied, please try again.
    没有改错,我是用了命令 chmod 600 /root/.ssh/id_rsa ,而且我看到权限也降低了,但是再次执行脚本还是出来上面这个错误,我输入root密码后就卡住了

    • 修改本机 /root/.ssh/id_rsa
      chmod 400 /root/.ssh/id_rsa

    五、安装jdk

    安装软件全部安装到/usr/local/hadoop目录下,统一管理

    1. 在/usr/local下 创建目录 hadoop , 并赋予权限
      mkdir hadoop
      sudo chmod -R 777 /usr/local/hadoop

    2. 解压jdk-8u231
      tar -zxvf jdk-8u231-linux-x64.tar.gz

    3. 配置环境变量vi /etc/profile

    export JAVA_HOME=/usr/local/hadoop/jdk1.8.0_211
    export JRE_HOME=$JAVA_HOME/jre
    export CLASSPATH=.:$JAVA_HOME/lib:$JRE_HOME/lib
    export PATH=$PATH:$JAVA_HOME/bin
    
    1. 刷新配置,使其生效
      source /etc/profile

    六、安装zookeeper

    1. 在/usr/local/hadoop/ 目录下解压zookeeper
      tar -zxvf zookeeper-3.4.14.tar.gz
    2. 配置环境变量vi /etc/profile
    export ZOOKEEPER_HOME=/usr/local/hadoop/zookeeper-3.4.14/
    export PATH=$PATH:$ZOOKEEPER_HOME/bin
    
    1. 刷新配置,使其生效
      source /etc/profile

    2. zookeeper集群安装
      4.1 进入/usr/local/hadoop/zookeeper-3.4.14/conf目录下 cp zoo_sample.cfg zoo.cfg
      4.2 修改zoo.cfg
      注意:需要在相对应的目录下创建myid

    例如在/data/zookeeper/ 目录下,命令如下:
    cd /data/zookeeper
    echo 1 > myid

    #1,2,3分别表示dataDir目录(/data/zookeeper/myid)中的内容
    server.1=hadoop1:2888:3888
    server.2=hadoop2:2888:3888
    server.3=hadoop3:2888:3888
    
    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    # do not use /tmp for storage, /tmp here is just
    # example sakes.
    # 此处填写自己的目录位置
    dataDir=/data/zookeeper
    dataLogDir=/logs/zookeeper
    # the port at which the clients will connect
    clientPort=2181
    # the maximum number of client connections.
    # increase this if you need to handle more clients
    #maxClientCnxns=60
    #
    # Be sure to read the maintenance section of the
    # administrator guide before turning on autopurge.
    #
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
    #
    # The number of snapshots to retain in dataDir
    #autopurge.snapRetainCount=3
    # Purge task interval in hours
    # Set to "0" to disable auto purge feature
    #autopurge.purgeInterval=1
    
    

    七、安装hadoop

    1. 在/usr/local/hadoop/ 目录下解压hadoop-2.7.7
      tar -zxvf hadoop-2.7.7.tar.gz
    2. 配置环境变量vi /etc/profile
    export HADOOP_HOME=/usr/local/hadoop/hadoop-2.7.7
    export PATH=$PATH:$HADOOP_HOME/bin
    export PATH=$PATH:$HADOOP_HOME/sbin
    
    1. 刷新配置,使其生效
      source /etc/profile

    2. 修改hadoop配置文件,进入/usr/local/hadoop/hadoop-2.7.7/etc/hadoop
      4.1 修改 hadoop-env.sh 中的JAVA_HOME,设置成绝对路径

    #export JAVA_HOME=${JAVA_HOME}
    export JAVA_HOME=/usr/local/hadoop/jdk1.8.0_211
    

    注意:配置中的注释在搭建环境中最好去掉,以免出现莫名其妙的问题,此处只是为了便于了解属性的含义
    4.2 core-site.xml

    <configuration>
    <!-- 指定文件系统的主节点-->
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
      </property>
    <!-- hadoop日志路径-->
      <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop/tmp</value>
      </property>
    
      <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
      </property>
    <!-- 指定可以在任何IP访问-->
      <property>
        <name>hadoop.proxuuser.hduser.hosts</name>
        <value>*</value>
      </property>
    <!-- 指定所有用户可以访问 -->
      <property>
        <name>hadoop.proxyuser.hduser.groups</name>
        <value>*</value>
      </property>
    <!-- zookeeper集群地址 -->
      <property>
        <name>ha.zookeeper.quorum</name>
        <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
      </property>
    </configuration>
    

    4.3 hdfs-site.xml

    <configuration>
    <!-- HA配置-->
    <!-- 指定hdfs的集群名为mycluster -->
      <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
      </property>
    
      <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
      </property>
    <!-- namenode1 RPC端口-->
      <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>hadoop1:9000</value>
      </property>
    <!-- namenode2 RPC端口-->
      <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>hadoop2:9000</value>
      </property>
    <!-- namenode1 HTTP端口-->
      <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>hadoop1:50070</value>
      </property>
    <!-- namenode2 HTTP端口-->
      <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>hadoop2:50070</value>
      </property>
    <!-- HA故障切换 -->
      <property>
        <name>dfs.ha.automic-failover.enabled.cluster</name>
        <value>true</value>
      </property>
    <!-- journalnode配置-->
      <property>
        <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://hadoop1:8485;hadoop2:8485;hadoop3:8485/mycluster</value>
      </property>
      <property>
        <name>dfs.namenode.edits.journal-plugin.qjournal </name>
        <value>org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager</value>
      </property>
    <!-- 发生failover时,Standby的节点要执行一系列方法把原来的Active节点中不健康的NameNode服务杀掉,这个叫fence过程。sshfence会公国ssh远程调用fuser命令去找到Active节点的NameNode服务并杀死它-->
      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
    sshfence
    shell(/bin/true)
        </value>
      </property>
    <!-- SSH私钥 -->
      <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
      </property>
    <!-- JournalNode 文件存储地址-->
      <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/data/hadoop/ha/jn</value>
      </property>
      <property>
        <name>dfs.permissions.enable</name>
        <value>false</value>
      </property>
    <!-- 负责故障切换实现类 -->
      <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
      </property>
    
      <property>
        <name>dfs.namenode.name.dir.restore</name>
        <value>true</value>
      </property>
    
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/hadoop/dfsdata/name</value>
      </property>
    
      <property>
        <name>dfs.blocksize</name>
        <value>67108864</value>
      </property>
    
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data/hadoop/dfsdata/data</value>
      </property>
    
      <property>
        <name>dfs.replication</name>
        <value>3</value>
      </property>
    <!-- 指定web可以方位hdfs目录 -->
      <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
      </property>
    </configuration>
    

    4.4 mapred-site.xml

    拷贝mapred-queues.xml.template 为 mapred-site.xml
    cp mapred-queues.xml.template mapred-site.xml

    <configuration>
      <property>
        <name>mapreduce.framework.name</name>
        <value>yarn</value>
      </property>
    </configuration>
    

    4.5 yarn-site.xml

    <!-- resourcemanager 失联后重新链接的时间 -->
    <configuration>
      <property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
      </property>
    <!-- 开启resourcemanager HA,默认为false -->
      <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
      </property>
    <!-- 开启resourcemanager 命名 -->
      <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
      </property>
      <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>hadoop2</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>hadoop3</value>
      </property>
    <!-- 开启resourcemanager故障自动切换,指定机器-->
      <property>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
      </property>
        <!--在 hadoop1 上配置 rm1,在 hadoop2 上配置 rm2, 注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在 YARN 的另一个机器上一定要修改,其他机器上不配置此项-->
      <property>
        <name>yarn.resourcemanager.ha.id</name>
        <value>rm1</value>
      </property>
    <!-- 开启resourcemanager故障自动恢复-->
      <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
      </property>
    <!-- 用户持久存储的类-->
      <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
      </property>
    <!-- zookeeper集群地址-->
      <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>hadoop1:2181,hadoop2:2181,hadoop3:2181</value>
      </property>
    <!-- 失联等待链接时间-->
      <property>
        <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
        <value>5000</value>
      </property>
    <!-- 集群ID -->
      <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>mycluster</value>
      </property>
    <!-- 开启resourcemanager故障自动恢复-->
      <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>hadoop1:8132</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>hadoop1:8130</value>
      </property>
    
        <!-- RM 的网页接口地址:端口-->
      <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>hadoop1:8088</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>hadoop1:8131</value>
      </property>
    
    <!-- RM 管理接口地址:端口-->
      <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>hadoop1:8033</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.ha.admin.address.rm1</name>
        <value>hadoop1:23142</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>hadoop2:8132</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>hadoop2:8130</value>
      </property>
    
    
      <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>hadoop2:8088</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>hadoop2:8131</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>hadoop2:8033</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.ha.admin.address.rm2</name>
        <value>hadoop2:23142</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/data/hadoop/dfsdata/yarn/local</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/data/hadoop/dfsdata/logs</value>
      </property>
    
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1024</value>
        <discription>每个节点可用内存,单位 MB</discription>
      </property>
    
      <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>258</value>
        <discription>单个任务可申请最少内存,默认 1024MB</discription>
      </property>
    
      <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>512</value>
        <discription>单个任务可申请最大内存,默认 8192MB</discription>
      </property>
    
      <property>
        <name>yarn.nodemanager.webapp.address</name>
        <value>0.0.0.0:8042</value>
      </property>
    </configuration>
    

    4.6 修改slaves文件

    vi slaves
    在其中添加

    hadoop1
    hadoop2
    hadoop3
    
    

    八、每个节点创建相对应的目录,分发hadoop所需文件到其他节点

    1. 创建目录, 并赋予读写权限
    mkdir -p /data/zookeeper
    mkdir -p /logs/zookeeper
    mkdir -p /data/hadoop/dfsdata/name
    mkdir -p /data/hadoop/dfsdata/data
    mkdir -p /data/hadoop/dfsdata/logs
    mkdir -p /data/hadoop/dfsdata/yarn/local
    mkdir -p /data/hadoop/ha/jn
    
    sudo chmod -R 777 /data
    
    1. 分发文件, 进入/usr/local目录下
    scp -r hadoop root@hadoop2:/usr/local
    scp -r hadoop root@hadoop3:/usr/local
    

    九、启动

    1. 启动zookeeper集群, 按顺序启动
    [root@hadoop1 zookeeper-3.4.14]# ./bin/zkServer.sh start
    
    [root@hadoop2 zookeeper-3.4.14]# ./bin/zkServer.sh start
    
    [root@hadoop3 zookeeper-3.4.14]# ./bin/zkServer.sh start
    
    
    1. 初始化hadoop,进入/usr/local/hadoop/hadoop-2.7.7
      2.1 格式化zk集群
    /bin/hdfs zkfc -formatZK
    

    2.2 开启journalnode进程,启动journalnode集群,在hadoop1,hadoop2,hadoop3上执行

    [root@hadoop1 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start journalnode
    
    [root@hadoop2 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start journalnode
    
    [root@hadoop3 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start journalnode
    

    2.3 在namenode1上执行格式化namenode

    [root@hadoop1 hadoop-2.7.7]# ./bin/hadoop namenode -format
    

    2.4 启动datanode,在hadoop1,hadoop2,hadoop3上执行

    [root@hadoop1 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start datanode
    [root@hadoop2 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start datanode
    [root@hadoop3 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start datanode
    

    2.5 启动namenode
    2.5.1 namenode1

    [root@hadoop1 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start namenode
    

    2.5.2 namenode2

    [root@hadoop2 hadoop-2.7.7]# ./bin/hdfs namenode -bootstrapStandby
    [root@hadoop2 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start namenode
    

    此时namenode1和namenode2同时处于 standby状态
    http://172.16.0.170:50070/dfshealth.html#tab-overview

    image.png

    http://172.16.0.164:50070/dfshealth.html#tab-overview

    image.png

    2.6启动zkfc服务

    在namenode1和namenode2上同时执行以下命令:
    [root@hadoop1 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start zkfc
    [root@hadoop2 hadoop-2.7.7]# ./sbin/hadoop-daemon.sh start zkfc
    

    启动zkfc服务后,namenode1和namenode2会自动选举出active节点


    image.png
    image.png

    十、验证

    1. 在/data目录下创建文件hello.txt
    [root@hadoop1 hadoop-2.7.7]# cd /data
    [root@hadoop1 hadoop-2.7.7]# echo hello world > hello.txt
    [root@hadoop1 hadoop-2.7.7]# cd /usr/local/hadoop/hadoop-2.7.7
    [root@hadoop1 hadoop-2.7.7]# ./bin/hdfs dfs -mkdir /test
    [root@hadoop1 hadoop-2.7.7]# ./bin/hdfs dfs -put /data/hello.txt /test
    [root@hadoop1 hadoop-2.7.7]# ./bin/hdfs dfs -cat /test/hello.txt
    
    
    1. HA故障自动切换
      [root@hadoop1 hadoop-2.7.7]# jps
      [root@hadoop1 hadoop-2.7.7]# kill -9 pid #namenode pid

    通过页面查看节点状态
    http://172.16.0.170:50070/dfshealth.html#tab-overview
    已经访问不了了

    image.png

    相关文章

      网友评论

          本文标题:hadoop 完全分布式HA高可用自动切换集群环境搭建

          本文链接:https://www.haomeiwen.com/subject/jensbctx.html