美文网首页
hadoop高可用集群部署

hadoop高可用集群部署

作者: 断水流大师兄vs魔鬼筋肉人 | 来源:发表于2023-03-21 13:33 被阅读0次

    前提:

    1.三台主机相互配置过免密钥(最好都ssh 一次,初次访问会验证【yes】,包括主机本身也需要ssh自己本身)
    实验主机:
    test-39     active namenode
    test-40     namenode(备用)
    test-41     namenode
    2.本地hosts解析 vim  /etc/hosts
    10.10.10.39 test-39  (active)     1
    10.10.10.40 test-40                   2
    10.10.10.41 test-41                  
    3.java环境变量配置
    export JAVA_HOME=/usr/local/jdk1.8.0_121
    export CLASSPATH=.:${JAVA_HOME}/lib/dt.jar:${JAVA_HOME}/lib/tools.jar
    export PATH=${JAVA_HOME}/bin:$PATH
    source /etc/profile
    

    开始部署

    三台主机都操作:

    1.创建目录 :mkdir  /data/hadoop       并解压安装包 :tar -xvf  hadoop-2.7.3.tar.gz
    2.修改环境变量:vim /etc/profile    
    export HADOOP_HOME=/data/hadoop
    export PATH=$PATH:$HADOOP_HOME/bin
    export P2ATH=$PATH:$HADOOP_HOME/sbin
    source  /etc/profile 
    hadoop version  (验证)
    
    2.修改配置:(配置位置/data/hadoop/etc/hadoop)
    1)hadoop-env.sh
    export JAVA_HOME=/usr/local/jdk1.8.0_121
    export HADOOP_PID_DIR=/data/hadoop/tmp
    2)hdfs-site.xml     mkdir   /data/hadoop/dfsdata
    <configuration>
    <!-- HA配置-->
    <!-- 指定hdfs的集群名为mycluster -->
      <property>
        <name>dfs.nameservices</name>
        <value>mycluster</value>
      </property>
      <property>
        <name>dfs.ha.namenodes.mycluster</name>
        <value>nn1,nn2</value>
      </property>
    <!-- namenode1 RPC端口-->
      <property>
        <name>dfs.namenode.rpc-address.mycluster.nn1</name>
        <value>test-39:9000</value>
      </property>
    <!-- namenode2 RPC端口-->
      <property>
        <name>dfs.namenode.rpc-address.mycluster.nn2</name>
        <value>test-40:9000</value>
      </property>
    <!-- namenode1 HTTP端口-->
      <property>
        <name>dfs.namenode.http-address.mycluster.nn1</name>
        <value>test-39:50070</value>
      </property>
    <!-- namenode2 HTTP端口-->
      <property>
        <name>dfs.namenode.http-address.mycluster.nn2</name>
        <value>test-40:50070</value>
      </property>
    <!-- HA故障切换 -->
      <property>
        <name>dfs.ha.automic-failover.enabled.cluster</name>
        <value>true</value>
      </property>
    <!-- journalnode配置-->
      <property>
        <name>dfs.namenode.shared.edits.dir</name>
      <value>qjournal://test-39:8485;test-40:8485;test-41:8485/mycluster</value>
      </property>
      <property>
        <name>dfs.namenode.edits.journal-plugin.qjournal </name>
        <value>org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager</value>
      </property>
    <!-- 发生failover时,Standby的节点要执行一系列方法把原来的Active节点中不健康的NameNode服务杀掉,这个叫fence过程。
    sshfence会公国ssh远程调用fuser命令去找到Active节点的NameNode服务并杀死它-->
      <property>
        <name>dfs.ha.fencing.methods</name>
        <value>
    sshfence
    shell(/bin/true)
        </value>
      </property>
    <!-- SSH私钥 -->
      <property>
        <name>dfs.ha.fencing.ssh.private-key-files</name>
        <value>/root/.ssh/id_rsa</value>
      </property>
    <!-- JournalNode 文件存储地址-->
      <property>
        <name>dfs.journalnode.edits.dir</name>
        <value>/data/hadoop/ha/jn</value>
      </property>
      <property>
        <name>dfs.permissions.enable</name>
        <value>false</value>
      </property>
    <!-- 负责故障切换实现类 -->
      <property>
        <name>dfs.client.failover.proxy.provider.mycluster</name>
        <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
      </property>
      <property>
        <name>dfs.ha.automatic-failover.enabled</name>
        <value>true</value>
      </property>
      <property>
        <name>dfs.namenode.name.dir.restore</name>
        <value>true</value>
      </property>
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/hadoop/dfsdata/name</value>
      </property>
      <property>
        <name>dfs.blocksize</name>
        <value>67108864</value>
      </property>
      <property>
        <name>dfs.datanode.data.dir</name>
        <value>file:///data/hadoop/dfsdata/data</value>
      </property>
      <property>
        <name>dfs.replication</name>
        <value>3</value>
      </property>
    <!-- 指定web可以方位hdfs目录 -->
      <property>
        <name>dfs.webhdfs.enabled</name>
        <value>true</value>
      </property>
    </configuration>
    3)mapred-site.xml
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>
    4)slaves
    test-39
    test-40
    test-41
    5)yarn-site.xml
    <!-- resourcemanager 失联后重新链接的时间 -->
    <configuration>
      <property>
        <name>yarn.resourcemanager.connect.retry-interval.ms</name>
        <value>2000</value>
      </property>
    <!-- 开启resourcemanager HA,默认为false -->
      <property>
        <name>yarn.resourcemanager.ha.enabled</name>
        <value>true</value>
      </property>
    <!-- 开启resourcemanager 命名 -->
      <property>
        <name>yarn.resourcemanager.ha.rm-ids</name>
        <value>rm1,rm2</value>
      </property>
      <property>
        <name>yarn.resourcemanager.hostname.rm1</name>
        <value>test-39</value>
      </property>
    
      <property>
        <name>yarn.resourcemanager.hostname.rm2</name>
        <value>test-40</value>
      </property>
    <!-- 开启resourcemanager故障自动切换,指定机器-->
      <property>
        <name>yarn.resourcemanager.ha.automatic-failover.enabled</name>
        <value>true</value>
      </property>
        <!--在 hadoop1 上配置 rm1,在 hadoop2 上配置 rm2, 
    注意:一般都喜欢把配置好的文件远程复制到其它机器上,但这个在 YARN 的另一个机器上一定要修改,其他机器上不配置此项-->
      <property>
        <name>yarn.resourcemanager.ha.id</name>
        <value>rm1</value>
      </property>
    <!-- 开启resourcemanager故障自动恢复-->
      <property>
        <name>yarn.resourcemanager.recovery.enabled</name>
        <value>true</value>
      </property>
    <!-- 用户持久存储的类-->
      <property>
        <name>yarn.resourcemanager.store.class</name>
        <value>org.apache.hadoop.yarn.server.resourcemanager.recovery.ZKRMStateStore</value>
      </property>
    <!-- zookeeper集群地址-->
      <property>
        <name>yarn.resourcemanager.zk-address</name>
        <value>test-39:2181,test-40:2181,test-41:2181</value>
      </property>
    <!-- 失联等待链接时间-->
      <property>
        <name>yarn.app.mapreduce.am.scheduler.connection.wait.interval-ms</name>
        <value>5000</value>
      </property>
    <!-- 集群ID -->
      <property>
        <name>yarn.resourcemanager.cluster-id</name>
        <value>mycluster</value>
      </property>
    <!-- 开启resourcemanager故障自动恢复-->
      <property>
        <name>yarn.resourcemanager.address.rm1</name>
        <value>test-39:8132</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address.rm1</name>
        <value>test-39:8130</value>
      </property>
        <!-- RM 的网页接口地址:端口-->
      <property>
        <name>yarn.resourcemanager.webapp.address.rm1</name>
        <value>test-39:8088</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm1</name>
        <value>test-39:8131</value>
      </property>
    <!-- RM 管理接口地址:端口-->
      <property>
        <name>yarn.resourcemanager.admin.address.rm1</name>
        <value>test-39:8033</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.admin.address.rm1</name>
        <value>test-39:23142</value>
      </property>
      <property>
        <name>yarn.resourcemanager.address.rm2</name>
        <value>test-40:8132</value>
      </property>
      <property>
        <name>yarn.resourcemanager.scheduler.address.rm2</name>
        <value>test-40:8130</value>
      </property>
      <property>
        <name>yarn.resourcemanager.webapp.address.rm2</name>
        <value>test-40:8088</value>
      </property>
      <property>
        <name>yarn.resourcemanager.resource-tracker.address.rm2</name>
        <value>test-40:8131</value>
      </property>
      <property>
        <name>yarn.resourcemanager.admin.address.rm2</name>
        <value>test-40:8033</value>
      </property>
      <property>
        <name>yarn.resourcemanager.ha.admin.address.rm2</name>
        <value>test-40:23142</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services</name>
        <value>mapreduce_shuffle</value>
      </property>
      <property>
        <name>yarn.nodemanager.aux-services.mapreduce.shuffle.class</name>
        <value>org.apache.hadoop.mapred.ShuffleHandler</value>
      </property>
      <property>
        <name>yarn.nodemanager.local-dirs</name>
        <value>/data/hadoop/dfsdata/yarn/local</value>
      </property>
      <property>
        <name>yarn.nodemanager.log-dirs</name>
        <value>/data/hadoop/dfsdata/logs</value>
      </property>
      <property>
        <name>yarn.nodemanager.resource.memory-mb</name>
        <value>1024</value>
        <discription>每个节点可用内存,单位 MB</discription>
      </property>
      <property>
        <name>yarn.scheduler.minimum-allocation-mb</name>
        <value>258</value>
        <discription>单个任务可申请最少内存,默认 1024MB</discription>
      </property>
      <property>
        <name>yarn.scheduler.maximum-allocation-mb</name>
        <value>512</value>
        <discription>单个任务可申请最大内存,默认 8192MB</discription>
      </property>
      <property>
        <name>yarn.nodemanager.webapp.address</name>
        <value>0.0.0.0:8042</value>
      </property>
    </configuration>
    6)core-site.xml    mkdir   /data/hadoop/tmp
    <configuration>
    <!-- 指定文件系统的主节点-->
      <property>
        <name>fs.defaultFS</name>
        <value>hdfs://mycluster</value>
      </property>
    <!-- hadoop日志路径-->
      <property>
        <name>hadoop.tmp.dir</name>
        <value>/data/hadoop/tmp</value>
      </property>
      <property>
        <name>io.file.buffer.size</name>
        <value>4096</value>
      </property>
    <!-- 指定可以在任何IP访问-->
      <property>
        <name>hadoop.proxuuser.hduser.hosts</name>
        <value>*</value>
      </property>
    <!-- 指定所有用户可以访问 -->
      <property>
        <name>hadoop.proxyuser.hduser.groups</name>
        <value>*</value>
      </property>
    <!-- zookeeper集群地址 -->
      <property>
        <name>ha.zookeeper.quorum</name>
        <value>test-39:2181,test-40:2181,test-41:2181</value>
      </property>
    </configuration>
    

    3台配置基本相同,唯一不一致的yarn-site.xml中的


    test-39 rm1
    test-40 rm2
    test-41 注释
    配置好以后可以scp 到另外两台主机(记得改yarn-site.xml)

    启动:

    1)三台分别启动journalnode     (负责NameNode之间信息同步)
    sh ./sbin/hadoop-daemon.sh start journalnode   
    [root@test-39 hadoop]# jps
    74371 JournalNode
    2)test-39(active) 上执行格式化操作
    hadoop namenode -format
    3)格式化ZKFC:将zkfc注册到zookeeper上,在zookeeper集群上创建/hadoop-ha
    ./bin/hdfs zkfc -formatZK
    4)主节点test-39启动:
    start-dfs.sh
    start-yarn.sh
    [root@test-39 hadoop]# jps
    74371 JournalNode
    74563 DFSZKFailoverController
    76146 NameNode
    74996 ResourceManager
    75116 NodeManager
    
    

    启动后jps查看:
    3台都有NodeManager (负责资源的供给和隔离)
    39、40有DFSZKFailoverController (监控,主备切换)
    39 、40 会有 NameNode (元数据的仲裁者和管理者)
    39 、40 会有resourcemanager (调度器负责资源的分配)
    如果没有对应的进程就手动启动:

    ./sbin/yarn-daemon.sh start resourcemanager   
    ./sbin/yarn-daemon.sh start nodemanager       
    ./sbin/hadoop-daemon.sh start namenode     (active上可以直接启动)
    

    如果副节点40的namenode 没有启动,需要复制数据才能启动

    需要先 scp -r dfsdata/*    test-40:/data/hadoop/dfsdata/    
    配置hdfs-site.xml中的目录
      <property>
        <name>dfs.namenode.name.dir</name>
        <value>file:///data/hadoop/dfsdata/name</value>
      </property>
    千万记住:两个 namenode 节点该目录中的数据结构是一致的
    然后启动:
    ./sbin/hadoop-daemon.sh start namenode
    检验:
    [root@test-39 hadoop]# hdfs haadmin -getServiceState nn1
    active
    [root@test-39 hadoop]# hdfs haadmin -getServiceState nn2
    standby
    

    然后3台分别手动添加datanode节点(这步很简单)
    ./sbin/hadoop-daemon.sh start datanode
    访问 http://10.10.10.39:50070看到节点成功加入



    部署总结:

    1.我拷贝的开发环境的hadoop包,/data/hadoop/dfsdata/ 有数据,主节点格式化过,所以没问题,另外两个节点启动会各种报错
    先关闭 ./sbin/hadoop-daemon.sh stop datanode
    rm -rf dfsdata/*
    启动 ./sbin/hadoop-daemon.sh start datanode (数据存储节点)

    相关文章

      网友评论

          本文标题:hadoop高可用集群部署

          本文链接:https://www.haomeiwen.com/subject/ihgurdtx.html