美文网首页
Hadoop QJM的HA集群搭建

Hadoop QJM的HA集群搭建

作者: 凉城不暖少年心 | 来源:发表于2018-06-22 14:26 被阅读0次

    一、背景说明

    在Hadoop2.0.0出现之前,在HDFS集群中,NameNode是存在单点故障问题的。集群中只有一个NameNode,如果进程或者机器出现问题,那么整个HDFS集群将不可用,直到NameNode重启或者通过其他机器接入。

    主要有两个方面会影响集群的高可用:
    1、突发事件发生,比如机器Crash了,集群就变得不可用了,直到重启NameNode
    2、系统维护,比如软件或硬件的升级,需要临时NameNode,那么集群也会短时间内不可用。

    HDFS高可用特性解决以上问题是通过在同一个集群中运行两个冗余的NameNode在Active和Passive之间进行热备份。这样,在机器Crash发生故障时,可以快速的转移到新的NameNode上,也可以在系统计划升级的时候由管理员发起的administrator-initiated转移。

    二、体系

    在一个典型的高可用集群中,两个分离的机器被配置成NameNode。在任一时刻,实际上只有其中之一处在Active状态上,而另外一台出去Standby状态。处于Active状态的NameNode负责集群中所有客户端的操作,Standby状态的NameNode简单的表现为一个从节点,在必要的时候以便快速故障转移。

    为了是Standby节点保持和Active节点状态一致,两个节点都与一组名为"JournalNodes(JNs)"的相互独立的线程保持通信,当Active节点更新了namespace后,它把更新记录发送给这些JNs中的多数,Standby节点可以从JNs中读取这些更改,并持续关注这些变更,Standby节点把这些变更同步到自己的namespace中,当发生故障事件时,Standby节点确保它已经读取了全部的变更记录在它变更为Active状态前。这就确保了当故障发生时,namespace已经全部同步完毕。

    为了支持快速故障转移,Standby节点有必要知道集群上关于block的最新信息,为了得到这些,所有的DataNode节点除了配置Active节点还要配置Standby节点,以便和Standby保持心跳,发送消息。

    在高可用集群中正确的操作是至关重要的,即某一时刻,是有一个NameNode处于Active状态。否则,两个NameNode节点的namespace很快会变的不一致,会导致数据丢失或者其他不正确的结果。为了确保这个状况并且阻止出现所谓的“split-brain scenario”现象,JournalNodes同一时刻,只允许一个NameNode可以写数据。在故障发生的期间,变成Acitve状态的NameNode(之前Standby状态的节点)接过Active的所有职能,负责向JournalNodes写入数据,有效阻止了另外的NameNode继续处于Active状态,允许新的Active状态节点安全的进行失败故障转移。

    三、硬件资源

    如果部署一个高可用集群,你需要按照以下来准备:

    1、NameNode机器:用于运行Active状态和Standby状态的NameNode,具有相同硬件环境并且和非高可用集群相同的硬件资源。

    2、JournalNodes机器:用于运行JournalNodes进程。JournalNode守护进程是非常轻量级的,因此,这些进程可以运行在其他Hadoop进程机器上,比如,NameNodes、JobTracker、YARN ResourceManager等。JournalNode守护进程必须最少在三个以上,因为edit log文件的修改必须被写入JNs的多数中才算成功,这样,单机发生故障,系统仍然可以正常。你也可以运行多于三个JournalNode,但是为了系统能更好的容错,你最好运行奇数个JournalNode,这样可以更好的形成多数。如果你运行了N个JournalNode,系统能够容错,在最多(N-1)/2失败时,扔能继续工作。

    在一个高可用集群中,Standby NameNode节点也需要定时检查namespace的状态,但是,在高可用集群上运行Secondary NameNode,CheckpointNode,BackupNode是没有必要的。事实上,这么做,会产生错误,是不允许的。

    四、部署

    1、集群规划

    节点名称 IP 安装软件 进程服务
    work1 192.168.162.11 jdk、hadoop NameNode、DFSZKFailoverController
    work2 192.168.162.12 jdk、hadoop NameNode、DFSZKFailoverController
    work3 192.168.162.13 jdk、hadoop ResourceManager
    work4 192.168.162.14 jdk、hadoop ResourceManager
    work5 192.168.162.15 jdk、hadoop、zookeeper QuorumPeerMain、JournalNode、NodeManager、DataNode
    work6 192.168.162.16 jdk、hadoop、zookeeper QuorumPeerMain、JournalNode、NodeManager、DataNode
    work7 192.168.162.17 jdk、hadoop、zookeeper QuorumPeerMain、JournalNode、NodeManager、DataNode

    2、软件版本

    软件 版本
    jdk jdk1.8.0_172
    hadoop hadoop-2.6.0
    zookeeper zookeeper-3.4.5

    3、安装步骤

    注:所有操作均使用root用户,所有软件均安装在/usr/local/src目录下

    3.1 配置host

    修改集群中所有节点的hosts文件
    vim /etc/hosts
    将IP与hostname加进去

    3.2 配置SSH

    3.2.1 生成ssh秘钥

    在所有节点执行创建命令
    ssh-keygen
    直接回车即可安装成功

    3.2.2 配置服务器间免密登录

    在所有节点均执行
    ssh-copy-id work1
    ssh-copy-id work2
    ssh-copy-id work3
    ssh-copy-id work4
    ssh-copy-id work5
    ssh-copy-id work6
    ssh-copy-id work7

    3.3 安装zookeeper

    在work5、work6、work7安装zookeeper

    3.3.1 解压

    tar -zxvf zookeeper-3.4.5.tar.gz

    3.3.2 配置zookeeper

    进入配置目录zookeeper-3.4.5/conf/,拷贝配置文件
    cp zoo_sample.cfg zoo.cfg
    修改配置文件
    vim zoo.cfg
    1、修改dataDir的路径地址,需要提前创建好相关目录,本人设置的是
    dataDir=/usr/local/src/zookeeper-3.4.5
    2、在配置文件最下方增加以下内容
    server.0=work5:2888:3888
    server.1=work6:2888:3888
    server.2=work7:2888:3888
    关于内容说明:server.A=B:C:D
    其中,A表示服务器编号,看3.3.3;B是服务器ip,也可以是hostname;C表示zookeeper集群中follower与leader节点通信的端口;D表示Leader投票过程中通信的端口

    3.3.3 配置myid

    进入到3.3.2中配置的dataDir的目录中,执行命令
    echo 0 >> myid
    这里echo的值由3.3.2中的第二个配置决定,如本篇中,server.0对应的work5,所以在work5的节点上,执行
    echo 0 >> myid
    同理,在work6、work7分别执行
    echo 1 >> myid
    echo 2 >> myid
    至此,zookeeper集群搭建完毕。

    3.4 安装hadoop

    在全部节点都要安装hadoop,可以在work1安装配置完之后,同步拷贝到其他节点。

    3.4.1 解压

    tar -zxvf hadoop-2.6.0.tar.gz

    3.4.2 创建数据目录

    配置hadoop前,提前创建好相关目录,用于存储数据信息,本篇在hadoop的安装目录下创建了tmp目录,在其中创建了两个子目录dfs和journal,dfs创建两个子目录name和data,分别用于namenode节点和datanode节点产生的临时数据的存放,journal目录用于HA高可用配置产生的临时数据的存放。

    3.4.3 配置hadoop

    进入hadoop的配置文件目录,hadoop安装目录下的etc/hadoop

    1. 配置core-site.xml
    <configuration>
            <property>
                    <!-- 指定hdfs的nameservice -->
                    <name>fs.defaultFS</name>
                    <value>hdfs://ns/</value>
            </property>
            <property>
                    <!-- Size of read/write buffer used in SequenceFiles -->
                    <name>io.file.buffer.size</name>
                    <value>131072</value>
            </property>
            <property>
                    <name>hadoop.name.dir</name>
                    <value>/usr/local/src/hadoop-2.6.0/tmp</value>
            </property>
            <!-- 指定zookeeper地址 -->
            <property>
                    <name>ha.zookeeper.quorum</name>
                    <value>work5:2181,work6:2181,work7:2181</value>
            </property>
    </configuration>
    
    1. 配置hdfs-site.xml
    <configuration>
        <property>
            <!--指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 -->
            <name>dfs.nameservices</name>
            <value>ns</value>
        </property>
        <!-- ns下面有两个NameNode,分别是nn1,nn2 -->
        <property>
            <name>dfs.ha.namenodes.ns</name>
            <value>nn1,nn2</value>
        </property>
        <!-- nn1的RPC通信地址 -->
        <property>
            <name>dfs.namenode.rpc-address.ns.nn1</name>
            <value>work1:9000</value>
        </property>
        <!-- nn2的RPC通信地址 -->
        <property>
            <name>dfs.namenode.rpc-address.ns.nn2</name>
            <value>work2:9000</value>
        </property>
        <!-- nn1的http通信地址 -->
        <property>
            <name>dfs.namenode.http-address.ns.nn1</name>
            <value>work1:50070</value>
        </property>
        <!-- nn2的http通信地址 -->
        <property>
            <name>dfs.namenode.http-address.ns.nn2</name>
            <value>work2:50070</value>
        </property>
        <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
        <property>
            <name>dfs.namenode.shared.edits.dir</name>
            <value>qjournal://work5:8485;work6:8485;work7:8485/ns</value>
        </property>
        <!-- 指定JournalNode在本地磁盘存放数据的位置 -->
        <property> 
            <name>dfs.journalnode.edits.dir</name>
            <value>/usr/local/src/hadoop-2.6.0/tmp/journal</value>
        </property>
        <!-- 开启NameNode失败自动切换 -->
        <property>
            <name>dfs.ha.automatic-failover.enabled</name>
            <value>true</value>
        </property>
        <!-- 配置失败自动切换实现方式 -->
        <property>
            <name>dfs.client.failover.proxy.provider.ns</name>
            <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <!-- 配置隔离机制方法,多个机制用换行分割,即每个机制暂用一行-->
        <property>
            <name>dfs.ha.fencing.methods</name>
            <value>
                sshfence
                shell(/bin/true)
            </value>
        </property>
        <!-- 使用sshfence隔离机制时需要ssh免登陆 -->
        <property>
            <name>dfs.ha.fencing.ssh.private-key-files</name>
            <value>/root/.ssh/id_rsa</value>
        </property>
        <!-- 配置sshfence隔离机制超时时间 -->
        <property>
            <name>dfs.ha.fencing.ssh.connect-timeout</name>
            <value>30000</value>
        </property>
        <!-- namenode临时数据 -->
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/usr/local/src/hadoop-2.6.0-ha/tmp/dfs/name</value>
        </property>
        <!-- datanode临时数据 -->
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>/usr/local/src/hadoop-2.6.0/tmp/dfs/data</value>
        </property>
        <!-- 数据备份块数 -->
        <property>
            <name>dfs.replication</name>
            <value>3</value>
        </property>
    </configuration>
    
    1. 配置mapred-site.xml
      需要先拷贝一下
      cp mapred-site.xml.template mapred-site.xml
    <configuration>
            <property>
                    <name>mapreduce.framework.name</name>
                    <value>yarn</value>
            </property>
    </configuration>
    
    1. 配置yarn-site.xml
    <configuration>
        <property>
            <!-- 开启RM高可用 -->
            <name>yarn.resourcemanager.ha.enabled</name>
            <value>true</value>
        </property>
        <property>
            <!-- 指定RM的cluster id -->
            <name>yarn.resourcemanager.cluster-id</name>
            <value>yrc</value>
        </property>
        <property>
            <!-- 指定RM的名字 -->
            <name>yarn.resourcemanager.ha.rm-ids</name>
            <value>rm1,rm2</value>
        </property>
        <!-- 分别指定RM的地址 -->
        <property>
            <name>yarn.resourcemanager.hostname.rm1</name>
            <value>work3</value>
        </property>
        <property>
            <name>yarn.resourcemanager.hostname.rm2</name>
            <value>work4</value>
        </property>
        <!-- 指定zk集群地址 -->
        <property>
            <name>yarn.resourcemanager.zk-address</name>
            <value>work5:2181,work6:2181,work7:2181</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
    </configuration>
    
    1. 配置slaves
      修改slaves文件,将datanode节点添加进去,本篇添加work5、work6、work7三个节点。
    work5
    work6
    work7
    

    3.4.4 同步

    将配置好的hadoop拷贝到其他节点上
    scp -r /usr/local/src/hadoop-2.6.0 root@work2:/usr/local/src/
    scp -r /usr/local/src/hadoop-2.6.0 root@work3:/usr/local/src/
    scp -r /usr/local/src/hadoop-2.6.0 root@work4:/usr/local/src/
    scp -r /usr/local/src/hadoop-2.6.0 root@work5:/usr/local/src/
    scp -r /usr/local/src/hadoop-2.6.0 root@work6:/usr/local/src/
    scp -r /usr/local/src/hadoop-2.6.0 root@work7:/usr/local/src/

    3.5 配置环境变量

    因为hadoop配置文件hadoop-env.sh中用到了jdk的环境变量,所以,必须配置jdk的环境变量,其他环境变量的配置,可以允许在任何目录下执行相关脚本
    vim ~/.bashrc
    在work1、work2、work3、work4节点的文件尾部追加以下内容

    export JAVA_HOME=/usr/local/src/jdk1.8.0_172
    export HADOOP_HOME=/usr/local/src/hadoop-2.6.0
    export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    

    在work5、work6、work7节点的文件尾部追加以下内容

    export JAVA_HOME=/usr/local/src/jdk1.8.0_172
    export ZOOKEEPER_HOME=/usr/local/src/zookeeper-3.4.5
    export CLASSPATH=$CLASSPATH:$JAVA_HOME/lib
    export PATH=$PATH:$JAVA_HOME/bin:$ZOOKEEPER_HOME/bin
    

    五、启动集群

    请严格按照以下步骤执行

    1、启动zookeeper

    在work5、work6、work7上,分别执行启动命令

    zkServer.sh start
    

    全部启动成功后,查看zookeeper进程状态

    zkServer.sh status
    

    如下显示结果

    JMX enabled by default
    Using config: /usr/local/src/zookeeper-3.4.5/bin/../conf/zoo.cfg
    Mode: follower
    

    其中有两个节点的Mode显示为follower,另外一个节点显示为leader,即正常。
    可用jps命令查看进程

    2610 QuorumPeerMain
    

    2、启动journalnode进程

    在work5、work6、work7上,分别执行启动命令

    hadoop-daemon.sh start journalnode
    

    启动成功之后,用jps命令查看是否存在进程

    2677 JournalNode
    

    3、格式化HDFS

    在work1上执行

    hdfs namenode -format
    

    格式化后会在根据core-site.xml中的hadoop.tmp.dir配置生成个文件,可以去对应的目录下查看。
    因为work2作为standby的namenode节点,需要将work1里生成的namenode元信息拷贝一份到work2上。
    这里我们可以手动拷贝过去,如下

    scp -r /usr/local/src/hadoop-2.6.0/tmp/ root@work2:/usr/local/src/hadoop-2.6.0/
    

    也可以执行4之后启动work1上的namenode

    hadoop-daemon.sh start namenode
    

    然后在work2上执行

    hdfs namenode -bootstrapStandby
    

    4、格式化ZKFC

    在work1上执行

    hdfs zkfc -formatZK
    

    5、启动hdfs

    在work1上启动hdfs

    start-dfs.sh
    

    启动过程如下

    18/06/22 02:46:24 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting namenodes on [work1 work2]
    work2: starting namenode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-namenode-work2.out
    work1: starting namenode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-namenode-work1.out
    work7: starting datanode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-datanode-work7.out
    work6: starting datanode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-datanode-work6.out
    work5: starting datanode, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-datanode-work5.out
    Starting journal nodes [work5 work6 work7]
    work5: journalnode running as process 2677. Stop it first.
    work7: journalnode running as process 2688. Stop it first.
    work6: journalnode running as process 2671. Stop it first.
    18/06/22 02:46:42 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting ZK Failover Controllers on NN hosts [work1 work2]
    work1: starting zkfc, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-zkfc-work1.out
    work2: starting zkfc, logging to /usr/local/src/hadoop-2.6.0/logs/hadoop-root-zkfc-work2.out
    

    在work1、work2、work5、work6、work7使用jps命令查看分别如下

    2752 NameNode
    5096 Jps
    3039 DFSZKFailoverController
    
    11233 Jps
    2725 DFSZKFailoverController
    2623 NameNode
    
     30418 Jps
    2610 QuorumPeerMain
    2677 JournalNode
    2793 DataNode
    
    30297 Jps
    2781 DataNode
    2591 QuorumPeerMain
    2671 JournalNode
    
    2688 JournalNode
    30291 Jps
    2615 QuorumPeerMain
    2798 DataNode
    

    在work1、work2上分别启动了NameNode进程和DFSZKFailoverController进程
    在work5、work6、work7上分别启动了DataNode进程

    6、启动yarn

    在work3上启动yarn

    start-yarn.sh
    

    启动过程

    starting yarn daemons
    starting resourcemanager, logging to /usr/local/src/hadoop-2.6.0/logs/yarn-root-resourcemanager-work3.out
    work5: starting nodemanager, logging to /usr/local/src/hadoop-2.6.0/logs/yarn-root-nodemanager-work5.out
    work7: starting nodemanager, logging to /usr/local/src/hadoop-2.6.0/logs/yarn-root-nodemanager-work7.out
    work6: starting nodemanager, logging to /usr/local/src/hadoop-2.6.0/logs/yarn-root-nodemanager-work6.out
    

    在work3、work5、work6、work7使用jps命令查看分别如下

    2551 ResourceManager
    2616 Jps
    
    30418 Jps
    2610 QuorumPeerMain
    2677 JournalNode
    2919 NodeManager
    2793 DataNode
    
    30297 Jps
    2907 NodeManager
    2781 DataNode
    2591 QuorumPeerMain
    2671 JournalNode
    
    2688 JournalNode
    30291 Jps
    2615 QuorumPeerMain
    2924 NodeManager
    2798 DataNode
    

    在work3上启动了ResourceManager进程
    在work5、work6、work7分别启动了NodeManager进程
    在work4上启动standby状态的ResourceManager

    yarn-daemon.sh start resourcemanager
    

    启动过程

    starting resourcemanager, logging to /usr/local/src/hadoop-2.6.0-ha/logs/yarn-root-resourcemanager-work4.out
    

    使用jps查看进程

    2881 Jps
    2851 ResourceManager
    

    六、验证集群

    1、浏览器访问

    访问work1的NameNode节点
    http://192.168.162.11:50070
    网页显示

    'work1:9000' (active)
    

    访问work2的NameNode节点
    http://192.168.162.12:50070
    网页显示

    'work2:9000' (standby)
    

    访问work3的ResourceManager
    http://192.168.162.13:8088/cluster/cluster
    网页显示

    Cluster ID: 1529681241532
    ResourceManager state:  STARTED
    ResourceManager HA state:   active
    ResourceManager RMStateStore:   org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore
    ResourceManager started on: 22-Jun-2018 08:27:21
    ResourceManager version:    2.6.0 from e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 by jenkins source checksum 7e1415f8c555842b6118a192d86f5e8 on 2014-11-13T21:17Z
    Hadoop version: 2.6.0 from e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 by jenkins source checksum 18e43357c8f927c0695f1e9522859d6a on 2014-11-13T21:10Z
    

    访问work4的ResourceManager
    http://192.168.162.14:8088/cluster/cluster
    网页显示

    Cluster ID: 1529681248588
    ResourceManager state:  STARTED
    ResourceManager HA state:   standby
    ResourceManager RMStateStore:   org.apache.hadoop.yarn.server.resourcemanager.recovery.NullRMStateStore
    ResourceManager started on: 22-Jun-2018 08:27:28
    ResourceManager version:    2.6.0 from e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 by jenkins source checksum 7e1415f8c555842b6118a192d86f5e8 on 2014-11-13T21:17Z
    Hadoop version: 2.6.0 from e3496499ecb8d220fba99dc5ed4c99c8f9e33bb1 by jenkins source checksum 18e43357c8f927c0695f1e9522859d6a on 2014-11-13T21:10Z
    

    2、验证HDFS的HA

    (1)上传一个文件

    在work1执行

    hadoop fs -put /etc/profile /profile
    

    查看是否存在

    hadoop fs -ls /
    
    Found 1 items
    -rw-r--r--   3 root supergroup       1796 2018-06-22 08:36 /profile
    

    (2)杀掉active的NameNode节点

    work1上jps查看进程

    2752 NameNode
    10254 Jps
    3039 DFSZKFailoverController
    

    kill掉NameNode进程

    kill -9 2752
    

    (3) 查看work1、work2的NameNode

    访问work1的NameNode
    http://192.168.162.11:50070/
    已经不能访问
    访问work2的NameNode
    http://192.168.162.12:50070/
    发现work2的NameNode已经显示为active状态

    'work2:9000' (active)
    

    (4)查看上传文件是否存在

    在work2上执行

    hadoop fs -ls /
    

    显示如下

    Found 1 items
    -rw-r--r--   3 root supergroup       1796 2018-06-22 08:36 /profile
    

    之前上传的文件仍然存在

    (5)重启work1的NameNode

    hadoop-daemon.sh start namenode
    

    访问work1的NameNode
    http://192.168.162.11:50070/
    网页显示

    'work1:9000' (standby)
    

    验证完毕!

    3、验证yarn的HA

    (1)执行示例程序

    基于以上验证,在work2上执行示例程序

    hadoop jar /usr/local/src/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /profile /out
    

    执行日志

    18/06/22 08:45:25 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/06/22 08:45:28 INFO input.FileInputFormat: Total input paths to process : 1
    18/06/22 08:45:28 INFO mapreduce.JobSubmitter: number of splits:1
    18/06/22 08:45:29 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529681241532_0001
    18/06/22 08:45:30 INFO impl.YarnClientImpl: Submitted application application_1529681241532_0001
    18/06/22 08:45:30 INFO mapreduce.Job: The url to track the job: http://work3:8088/proxy/application_1529681241532_0001/
    18/06/22 08:45:30 INFO mapreduce.Job: Running job: job_1529681241532_0001
    18/06/22 08:45:48 INFO mapreduce.Job: Job job_1529681241532_0001 running in uber mode : false
    18/06/22 08:45:48 INFO mapreduce.Job:  map 0% reduce 0%
    18/06/22 08:46:02 INFO mapreduce.Job:  map 100% reduce 0%
    18/06/22 08:46:14 INFO mapreduce.Job:  map 100% reduce 100%
    18/06/22 08:46:14 INFO mapreduce.Job: Job job_1529681241532_0001 completed successfully
    18/06/22 08:46:14 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=2058
                    FILE: Number of bytes written=220233
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=1878
                    HDFS: Number of bytes written=1429
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=12274
                    Total time spent by all reduces in occupied slots (ms)=6928
                    Total time spent by all map tasks (ms)=12274
                    Total time spent by all reduce tasks (ms)=6928
                    Total vcore-seconds taken by all map tasks=12274
                    Total vcore-seconds taken by all reduce tasks=6928
                    Total megabyte-seconds taken by all map tasks=12568576
                    Total megabyte-seconds taken by all reduce tasks=7094272
            Map-Reduce Framework
                    Map input records=78
                    Map output records=258
                    Map output bytes=2573
                    Map output materialized bytes=2058
                    Input split bytes=82
                    Combine input records=258
                    Combine output records=156
                    Reduce input groups=156
                    Reduce shuffle bytes=2058
                    Reduce input records=156
                    Reduce output records=156
                    Spilled Records=312
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=218
                    CPU time spent (ms)=1750
                    Physical memory (bytes) snapshot=271466496
                    Virtual memory (bytes) snapshot=4126367744
                    Total committed heap usage (bytes)=138362880
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=1796
            File Output Format Counters 
                    Bytes Written=1429
    

    (2) 查看结果

    hadoop fs -ls /out
    

    显示

    Found 2 items
    -rw-r--r--   3 root supergroup          0 2018-06-22 08:46 /out/_SUCCESS
    -rw-r--r--   3 root supergroup       1429 2018-06-22 08:46 /out/part-r-00000
    

    执行成功!

    (3)杀掉work3中的ResourceManager,重新执行

    10131 Jps
    8890 ResourceManager
    
    kill -9 8890
    

    在work2重新执行jar,因为/out目录已经存在,换一个目录out1

    hadoop jar /usr/local/src/hadoop-2.6.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.6.0.jar wordcount /profile /out1
    

    执行日志

    18/06/22 09:34:37 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    18/06/22 09:34:38 INFO client.ConfiguredRMFailoverProxyProvider: Failing over to rm2
    18/06/22 09:34:40 INFO input.FileInputFormat: Total input paths to process : 1
    18/06/22 09:34:40 INFO mapreduce.JobSubmitter: number of splits:1
    18/06/22 09:34:40 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1529685157948_0001
    18/06/22 09:34:41 INFO impl.YarnClientImpl: Submitted application application_1529685157948_0001
    18/06/22 09:34:41 INFO mapreduce.Job: The url to track the job: http://work4:8088/proxy/application_1529685157948_0001/
    18/06/22 09:34:41 INFO mapreduce.Job: Running job: job_1529685157948_0001
    18/06/22 09:35:01 INFO mapreduce.Job: Job job_1529685157948_0001 running in uber mode : false
    18/06/22 09:35:01 INFO mapreduce.Job:  map 0% reduce 0%
    18/06/22 09:35:13 INFO mapreduce.Job:  map 100% reduce 0%
    18/06/22 09:35:27 INFO mapreduce.Job:  map 100% reduce 100%
    18/06/22 09:35:28 INFO mapreduce.Job: Job job_1529685157948_0001 completed successfully
    18/06/22 09:35:28 INFO mapreduce.Job: Counters: 49
            File System Counters
                    FILE: Number of bytes read=2058
                    FILE: Number of bytes written=220235
                    FILE: Number of read operations=0
                    FILE: Number of large read operations=0
                    FILE: Number of write operations=0
                    HDFS: Number of bytes read=1878
                    HDFS: Number of bytes written=1429
                    HDFS: Number of read operations=6
                    HDFS: Number of large read operations=0
                    HDFS: Number of write operations=2
            Job Counters 
                    Launched map tasks=1
                    Launched reduce tasks=1
                    Data-local map tasks=1
                    Total time spent by all maps in occupied slots (ms)=10069
                    Total time spent by all reduces in occupied slots (ms)=9950
                    Total time spent by all map tasks (ms)=10069
                    Total time spent by all reduce tasks (ms)=9950
                    Total vcore-seconds taken by all map tasks=10069
                    Total vcore-seconds taken by all reduce tasks=9950
                    Total megabyte-seconds taken by all map tasks=10310656
                    Total megabyte-seconds taken by all reduce tasks=10188800
            Map-Reduce Framework
                    Map input records=78
                    Map output records=258
                    Map output bytes=2573
                    Map output materialized bytes=2058
                    Input split bytes=82
                    Combine input records=258
                    Combine output records=156
                    Reduce input groups=156
                    Reduce shuffle bytes=2058
                    Reduce input records=156
                    Reduce output records=156
                    Spilled Records=312
                    Shuffled Maps =1
                    Failed Shuffles=0
                    Merged Map outputs=1
                    GC time elapsed (ms)=257
                    CPU time spent (ms)=2260
                    Physical memory (bytes) snapshot=258924544
                    Virtual memory (bytes) snapshot=4125798400
                    Total committed heap usage (bytes)=136077312
            Shuffle Errors
                    BAD_ID=0
                    CONNECTION=0
                    IO_ERROR=0
                    WRONG_LENGTH=0
                    WRONG_MAP=0
                    WRONG_REDUCE=0
            File Input Format Counters 
                    Bytes Read=1796
            File Output Format Counters 
                    Bytes Written=1429
    

    执行依然成功,日志已经提示失败故障转移到第二个resoucemanager了。
    至此,全部结束~

    相关文章

      网友评论

          本文标题:Hadoop QJM的HA集群搭建

          本文链接:https://www.haomeiwen.com/subject/mnkkyftx.html