美文网首页
Hadoop学习笔记四:基于Yarn的MapReduce集群搭建

Hadoop学习笔记四:基于Yarn的MapReduce集群搭建

作者: 开发者连小超 | 来源:发表于2019-11-30 10:56 被阅读0次

    基于Yarn的MapReduce集群搭建可参照官方文档 https://hadoop.apache.org/docs/r2.6.5/hadoop-yarn/hadoop-yarn-site/ResourceManagerHA.html

    基础知识

    MapReduce
    关于MapReduce基础知识,可查阅此篇文章:https://blog.csdn.net/luzhensmart/article/details/90202313
    Yarn
    关于Yarn基础知识,可查阅此篇文章:https://www.jianshu.com/p/3f406cf438be

    服务器准备

    本文搭建所用服务器环境是在上篇文章【Hadoop学习笔记三:高可用集群搭建(Hadoop2.x)】https://www.jianshu.com/p/666ff9bbf784 基础上进行的,基于Yarn的MapReduce集群服务器规划方案如下图。

    基于Yarn的MapReduce集群.png

    一、免密登录

    两个ResourceManager节点类似于高可用集群NameNode一个级别,主备之间可能需要进行切换,所以主备节点需要免秘钥登录。按上图所示,需配置Node03和Node04之间免密钥登录。


    rm-ha-overview.png

    03节点 .ssh 目录下:

    ssh-keygen -t dsa -P '' -f ./id_dsa
    cat id_dsa.pub >> authorized_keys
    scp id_dsa.pub node04:`pwd`/node03.pub
    

    04节点 .ssh 目录下 :

    cat node03.pub >> authorized_keys
    ssh-keygen -t dsa -P '' -f ./id_dsa
    cat id_dsa.pub >> authorized_keys
    scp id_dsa.pub node03:`pwd`/node04.pub
    

    03节点 .ssh 目录下:

    cat node04.pub >> authorized_keys
    

    二、配置项

    1.mapred-site.xml

    #重命名
    mv mapred-site.xml.template mapred-site.xml 
    
    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
    </configuration>
    #参照 官网Single Node Cluster配置Yarn
    
    

    2.yarn-site.xml

    #配置数据洗牌阶段归于yarn管理
     <property>
       <name>yarn.nodemanager.aux-services</name>
       <value>mapreduce_shuffle</value>
     </property>
    #参照 官网Single Node Cluster配置Yarn
    
    #官网给出的最简配置
     <property>
       <name>yarn.resourcemanager.ha.enabled</name>
       <value>true</value>
     </property>
     <property>
       <name>yarn.resourcemanager.cluster-id</name>
       <value>cluster1</value>
     </property>
     <property>
       <name>yarn.resourcemanager.ha.rm-ids</name>
       <value>rm1,rm2</value>
     </property>
     <property>
       <name>yarn.resourcemanager.hostname.rm1</name>
       <value>node03</value>
     </property>
     <property>
       <name>yarn.resourcemanager.hostname.rm2</name>
       <value>node04</value>
     </property>
     <property>
       <name>yarn.resourcemanager.zk-address</name>
       <value>node02:2181,node03:2181,node04:2181</value>
     </property>
    

    3.分发两个文件到:02 03 04节点

    scp mapred-site.xml yarn-site.xml node02:`pwd`
    scp mapred-site.xml yarn-site.xml node03:`pwd`
    scp mapred-site.xml yarn-site.xml node04:`pwd`
    

    三、启动集群

    1.启动zookeeper

    #02/03/04节点同时执行
    zkServer.sh start
    

    2.启动hdfs

    #node01 节点启动
    start-dfs.sh
    #注意,有一个脚本不要用,start-all.sh
    #如果nn1和 nn2没有启动,需要在node01,node02分别手动启动:
    hadoop-daemon.sh start namenode 
    

    3.启动yarn

    #启动NodeManager
     start-yarn.sh
    #在03,04节点 启动ResourceManager
    yarn-daemon.sh start resourcemanager
    

    浏览器访问: node03:8088 node04:8088

    四、停止集群

    #node01上执行
    stop-dfs.sh 
    
    #node01上停止NodeManager
    stop-yarn.sh
    
    #node03,node04停止ResourceManager
    yarn-daemon.sh stop resourcemanager 
    
    #node02,node03,node04上停止zk
    zkServer.sh stop
    

    五、测试计算能力

    #test.txt 是 【Hadoop学习笔记二:全分布式搭建(Hadoop1.x)】 时上传
    hadoop jar hadoop-mapreduce-examples-2.6.5.jar wordcount test.txt /wordcount
    

    计算过程可在浏览器查看进度
    http://node04:8088/cluster

    19/12/01 04:40:46 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
    19/12/01 04:40:47 INFO input.FileInputFormat: Total input paths to process : 1
    19/12/01 04:40:47 INFO mapreduce.JobSubmitter: number of splits:2
    19/12/01 04:40:49 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1575143975107_0004
    19/12/01 04:40:49 INFO impl.YarnClientImpl: Submitted application application_1575143975107_0004
    19/12/01 04:40:50 INFO mapreduce.Job: The url to track the job: http://node04:8088/proxy/application_1575143975107_0004/
    19/12/01 04:40:50 INFO mapreduce.Job: Running job: job_1575143975107_0004
    19/12/01 04:41:49 INFO mapreduce.Job: Job job_1575143975107_0004 running in uber mode : false
    19/12/01 04:41:49 INFO mapreduce.Job:  map 0% reduce 0%
    19/12/01 04:43:30 INFO mapreduce.Job:  map 33% reduce 0%
    19/12/01 04:43:31 INFO mapreduce.Job:  map 50% reduce 0%
    19/12/01 04:46:44 INFO mapreduce.Job:  map 50% reduce 17%
    19/12/01 04:46:46 INFO mapreduce.Job:  map 100% reduce 17%
    19/12/01 04:46:50 INFO mapreduce.Job:  map 100% reduce 43%
    19/12/01 04:46:53 INFO mapreduce.Job:  map 100% reduce 100%
    19/12/01 04:46:55 INFO mapreduce.Job: Job job_1575143975107_0004 completed successfully
    

    结果查看

    hdfs dfs -cat /wordcount/part-r-00000
    

    相关文章

      网友评论

          本文标题:Hadoop学习笔记四:基于Yarn的MapReduce集群搭建

          本文链接:https://www.haomeiwen.com/subject/ifznwctx.html