美文网首页大数据我爱编程
三台主机的Hadoop3.1.0和zookeeper3.4.10

三台主机的Hadoop3.1.0和zookeeper3.4.10

作者: 大道至简非简 | 来源:发表于2018-05-12 14:27 被阅读126次

    主机环境选用Ubuntu,分别是192.168.1.141,192.168.1.142,192.168.1.143,一主二仆的模式。
    机器选用100多块的arm linux,竟然能跑起来。

    一、环境准备

    1、统一hosts名称

    Master:192.168.1.141
    Slave:192.168.1.142 192.168.1.143
    更改各个主机上的/etc/hosts

    #主机信息
    192.168.1.141     hadoop01
    #添加节点的信息
    192.168.1.142     hadoop02
    192.168.1.143     hadoop03
    

    2、配置Master主机到slave主机ssh免密码登录

    slave机器上创建 ~/.ssh
    
    
    root@OrangePi:/# ssh-keygen -t rsa 
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa): 
    Created directory '/root/.ssh'.
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /root/.ssh/id_rsa.
    Your public key has been saved in /root/.ssh/id_rsa.pub.
    The key fingerprint is:
    SHA256:eTjQhVzHIjWIAmP603tQYIf1/D+tSPDlrRD0D8bBEWY root@OrangePi
    The key's randomart image is:
    +---[RSA 2048]----+
    |  +.oooo ==.E.   |
    | o ooo.+=..*..   |
    |.    .o +...o    |
    | . . . . = o .   |
    |  o o   S + *    |
    |   . o   = * =   |
    |    . .   + + +  |
    |     .   . o +   |
    |          . o    |
    +----[SHA256]-----+
    root@OrangePi:/# 
    
    root@OrangePi:/# cd root
    root@OrangePi:~#  cd .ssh
    root@OrangePi:~/.ssh# cat id_rsa.pub >>authorized_keys
    ssh到hadoop03和02
    root@OrangePi:~/.ssh# scp authorized_keys root@hadoop02:/root/.ssh/authorized_keys
    root@hadoop02's password: 
    authorized_keys                                           100%  790     0.8KB/s   00:00    
    

    测试一下免密码登录

    root@OrangePi:~/.ssh# ssh hadoop02
    Welcome to Ubuntu 16.04.1 LTS (GNU/Linux 3.10.65 aarch64)
    
    
    记得slave机器上执行
    sudo chmod 600 ~/.ssh/authorized_keys
    

    主机全部互信

    scp ~/.ssh/authorized_keys hadoop01:/root/.ssh/authorized_keys
    scp ~/.ssh/authorized_keys hadoop02:/root/.ssh/authorized_keys
    scp ~/.ssh/authorized_keys hadoop03:/root/.ssh/authorized_keys
    

    3、各主机安装开启ntp

    # sudo apt-get install ntp
    # service ntp start
    

    4、安装jdk

    sudo add-apt-repository ppa:webupd8team/java
    sudo apt-get update
    sudo apt-get install oracle-java8-installer
    
    root@OrangePi:/# java -version
    java version "1.8.0_171"
    Java(TM) SE Runtime Environment (build 1.8.0_171-b11)
    Java HotSpot(TM) 64-Bit Server VM (build 25.171-b11, mixed mode)
    
    

    精简方式的jdk home路径为 /usr/lib/jvm/java-8-oracle
    写入etc/profile

    export JAVA_HOME=/usr/lib/jvm/java-8-oracle 
    export JRE_HOME=$JAVA_HOME/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOMR}/bin:$PATH
    

    二、Hadoop集群安装


    http://hadoop.apache.org/

    1、创建目录

    root@OrangePi:~# mkdir /home/data
    root@OrangePi:~# mkdir /home/data/hdfs
    root@OrangePi:~# cd /home/data/hdfs
    root@OrangePi:/home/data/hdfs# mkdir name
    root@OrangePi:/home/data/hdfs# mkdir data
    root@OrangePi:/home/data/hdfs# mkdir tmp
    root@OrangePi:/home/data/hdfs# sudo chmod -R 777 /home/data
    
    

    在slave机器上执行

    mkdir /home/data
    mkdir /home/data/hdfs
    cd /home/data/hdfs
    mkdir name
    mkdir data
    mkdir tmp
    

    配置etc/profile

    export JAVA_HOME=/usr/lib/jvm/java-8-oracle 
    export JRE_HOME=$JAVA_HOME/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=${JAVA_HOMR}/bin:$PATH
    
    export HADOOP_HOME=/home/hadoop-3.1.0
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    
    
    export HADOOP_COMMON_HOME=$HADOOP_HOME 
    export HADOOP_HDFS_HOME=$HADOOP_HOME 
    export HADOOP_MAPRED_HOME=$HADOOP_HOME
    export HADOOP_YARN_HOME=$HADOOP_HOME 
    
    export HADOOP_INSTALL=$HADOOP_HOME 
    export HADOOP_COMMON_LIB_NATIVE_DIR=$HADOOP_HOME/lib/native 
    export HADOOP_CONF_DIR=$HADOOP_HOME 
    export HADOOP_PREFIX=$HADOOP_HOME 
    export HADOOP_LIBEXEC_DIR=$HADOOP_HOME/libexec 
    export JAVA_LIBRARY_PATH=$HADOOP_HOME/lib/native:$JAVA_LIBRARY_PATH 
    export HADOOP_CONF_DIR=$HADOOP_PREFIX/etc/hadoop
    
    export HDFS_DATANODE_USER=root
    export HDFS_DATANODE_SECURE_USER=root
    export HDFS_SECONDARYNAMENODE_USER=root
    export HDFS_NAMENODE_USER=root
    
    

    刷新启用命令
    source /etc/profile

    2、安装配置Hadoop

    http://hadoop.apache.org/releases.html

    cd /home/
    mkdir hadoop
    wget http://mirror.bit.edu.cn/apache/hadoop/common/hadoop-3.1.0/hadoop-3.1.0.tar.gz
    tar zxvf hadoop-3.1.0.tar.gz -C /home/
    
    

    3、配置core-site.xml

    /home/hadoop-3.1.0/etc/hadoop\core-site.xml

    <configuration>
        <property>
            <name>fs.default.name</name>
            <value>hdfs://hadoop01:9000</value>
        </property>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/home/data/hdfs/tmp</value>
        </property>
    </configuration>
    

    4、配置hdfs-site.xml

    基本配置包括副本数量,数据存放目录等。

    <configuration>
     
        <property>
            <name>dfs.replication</name>
            <value>2</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>/home/data/hdfs/name</value>
        </property>
        <property>
            <name>dfs.namenode.data.dir</name>
            <value>/home/data/hdfs/data</value>
        </property>
    </configuration>
    
    

    5、配置yarn-site.xml

    <configuration>
    
          <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop01</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
    </configuration>
    
    

    6、配置mapred-site.xml

    <configuration>
        <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.application.classpath</name>
            <value>
                /home/hadoop-3.1.0/etc/hadoop,
                /home/hadoop-3.1.0/share/hadoop/common/*,
                /home/hadoop-3.1.0/share/hadoop/common/lib/*,
                /home/hadoop-3.1.0/share/hadoop/hdfs/*,
                /home/hadoop-3.1.0/share/hadoop/hdfs/lib/*,
                /home/hadoop-3.1.0/share/hadoop/mapreduce/*,
                /home/hadoop-3.1.0/share/hadoop/mapreduce/lib/*,
                /home/hadoop-3.1.0/share/hadoop/yarn/*,
                /home/hadoop-3.1.0/share/hadoop/yarn/lib/*
            </value>
        </property>
    </configuration>
    
    

    7、配置slave

    etc/hadoop/workers

    hadoop01
    hadoop02
    hadoop03
    
    
    

    8、配置java_home(根据具体的java home配置)

    etc/hadoop/hadoop-env.sh

    # The java implementation to use. By default, this environment
    # variable is REQUIRED on ALL platforms except OS X!
    #export JAVA_HOME= /usr/lib/jvm/java-8-oracle
    
    

    9、复制配置到slave

    cd /home
    scp -r  hadoop-3.1.0  hadoop02:/home/
    scp -r  hadoop-3.1.0  hadoop03:/home/
    
    

    10、配置path

    /etc/profile

    export HADOOP_HOME=/home/hadoop-3.1.0
    export PATH=$PATH:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    

    source /etc/profile

    三、Hadoop集群启动运行(master机器上执行)

    1、启动namenode

    格式化HDFS文件系统

    #hadoop namenode -format
    
    root@Hadoop01:~# ps -ef | grep hadoop
    root      3047  2756  0 10:06 pts/0    00:00:00 grep --color=auto hadoop
    
    

    现在启动namenode守护进程

    # hadoop-daemon.sh start namenode
    
    

    2、启动datanode

    hdfs --daemon start namenode
    
    hdfs --daemon start datanode
    
    yarn --daemon start resourcemanager
    
    yarn --daemon start nodemanager
    
    root@Hadoop01:/home# jps
    5104 ResourceManager
    5351 NodeManager
    5000 DataNode
    5375 Jps
    
    
    

    3、一步启动方式成功

    start-all.sh
    stop-all.sh
    

    http://192.168.1.141:8088/cluster/nodes
    相关端口

    http://192.168.1.141:9870/dfshealth.html#tab-overview

    4、验证sample

    home下建test.txt
    内容

    hello word china chinese korea
    groupby
    
    建立目录
    hadoop fs -mkdir /input
    #hadoop fs -put test.txt /input
    列出目录
    hadoop fs -ls /
    
    Found 1 items
    drwxr-xr-x   - root supergroup          0 2018-05-11 06:47 /input
    

    删除文件夹
    hadoop fs -rm -r /output

    
    #hadoop jar /home/hadoop-3.1.0/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.0.jar  wordcount /input /output
    
    
    
    
        Map-Reduce Framework
            Map input records=2
            Map output records=6
            Map output bytes=63
            Map output materialized bytes=81
            Input split bytes=100
            Combine input records=6
            Combine output records=6
            Reduce input groups=6
            Reduce shuffle bytes=81
            Reduce input records=6
            Reduce output records=6
            Spilled Records=12
            Shuffled Maps =1
            Failed Shuffles=0
            Merged Map outputs=1
            GC time elapsed (ms)=1088
            CPU time spent (ms)=4840
            Physical memory (bytes) snapshot=326569984
            Virtual memory (bytes) snapshot=3757453312
            Total committed heap usage (bytes)=144109568
            Peak Map Physical memory (bytes)=210546688
            Peak Map Virtual memory (bytes)=2002776064
            Peak Reduce Physical memory (bytes)=116023296
            Peak Reduce Virtual memory (bytes)=1754677248
        Shuffle Errors
            BAD_ID=0
            CONNECTION=0
            IO_ERROR=0
            WRONG_LENGTH=0
            WRONG_MAP=0
            WRONG_REDUCE=0
        File Input Format Counters 
            Bytes Read=38
        File Output Format Counters 
            Bytes Written=51
    
    

    查看结果

    root@Hadoop01:/home#  hadoop fs -ls /output
    WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
    2018-05-11 13:31:47,807 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Found 2 items
    -rw-r--r--   2 root supergroup          0 2018-05-11 13:30 /output/_SUCCESS
    -rw-r--r--   2 root supergroup         51 2018-05-11 13:30 /output/part-r-00000
    
    

    统计单词结果

    root@Hadoop01:/home# hadoop fs -cat /output/part-r-00000
    WARNING: HADOOP_PREFIX has been replaced by HADOOP_HOME. Using value of HADOOP_PREFIX.
    2018-05-11 13:32:48,377 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    china   1
    chinese 1
    groupby 1
    hello   1
    korea   1
    word    1
    
    

    每个文件默认blocksize=128mb

    5、解决超出节点内存的问题

    mapred-site.xml

        <property>
      <name>mapreduce.map.memory.mb</name>
        <value>512</value>
        </property>
        <property>
          <name>mapreduce.map.java.opts</name>
          <value>-Xmx512M</value>
        </property>
        <property>
          <name>mapreduce.reduce.memory.mb</name>
          <value>512</value>
        </property>
        <property>
          <name>mapreduce.reduce.java.opts</name>
          <value>-Xmx256M</value>
        </property>
    

    6、解决hadoop时间跟系统不一致

    # cat hadoop-env.sh
    .........
    export HADOOP_OPTS="$HADOOP_OPTS -Duser.timezone=GMT+08"
    .........
    
    # cat yarn-env.sh
    ......... 
    YARN_OPTS="$YARN_OPTS -Duser.timezone=GMT+08"
    .........
    

    涉及到hbase的也设置时区

    # cat hbase-env.sh
    .........
    export TZ="Asia/Shanghai"
    .........
    
    

    三、安装zookeeper集群

    1、下载安装zookeeper 3.4.10版本

    wget http://mirror.bit.edu.cn/apache/zookeeper/zookeeper-3.4.10/zookeeper-3.4.10.tar.gz
    tar zxvf zookeeper-3.4.10.tar.gz

    2、配置文件

    mkdir /home/zookeeper-3.4.10/data
     mkdir -p  /home/zookeeper-3.4.10/datalog
    cd /home/zookeeper-3.4.10/conf
    复制配置文件
    cp zoo_sample.cfg zoo.cfg
    

    配置文件内容

    # The number of milliseconds of each tick
    tickTime=2000
    # The number of ticks that the initial 
    # synchronization phase can take
    initLimit=10
    # The number of ticks that can pass between 
    # sending a request and getting an acknowledgement
    syncLimit=5
    # the directory where the snapshot is stored.
    # do not use /tmp for storage, /tmp here is just 
    # example sakes.
    dataDir=/home/zookeeper-3.4.10/data
    dataLogDir=/home/zookeeper-3.4.10/datalog
    # the port at which the clients will connect
    clientPort=2181
    # the maximum number of client connections.
    # increase this if you need to handle more clients
    #maxClientCnxns=60
    #
    # Be sure to read the maintenance section of the 
    # administrator guide before turning on autopurge.
    #
    # http://zookeeper.apache.org/doc/current/zookeeperAdmin.html#sc_maintenance
    #
    # The number of snapshots to retain in dataDir
    #autopurge.snapRetainCount=3
    # Purge task interval in hours
    # Set to "0" to disable auto purge feature
    #autopurge.purgeInterval=1
    server.0=hadoop01:2888:3888
    server.1=hadoop02:2888:3888
    server.2=hadoop03:2888:3888
    

    3、制作myid文件

    在zookeeper的data目录下创建myid文件,master机内容0,其他未1和2;

    4、复制zookeeper到从机(复制完成记得修改myid)

    scp -r  zookeeper-3.4.10  hadoop02:/home/
    scp -r  zookeeper-3.4.10  hadoop03:/home/
    

    5、配置各台主机的Profile文件

    etc/profile添加

    export ZOOKEEPER_HOME=/home/zookeeper-3.4.10/data
    export PATH=$PATH:$ZOOKEEPER_HOME/bin:$ZOOKEEPER_HOME/conf
    

    记得 source /etc/profile生效

    四、启动zookeeper集群

    1、各个主机启动zookeeper

    root@Hadoop01:/home# zkServer.sh start
    ZooKeeper JMX enabled by default
    Using config: /home/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED
    root@Hadoop01:/home# jps
    7105 DataNode
    6982 NameNode
    7272 SecondaryNameNode
    7580 ResourceManager
    8860 QuorumPeerMain
    8878 Jps
    7695 NodeManager
    root@Hadoop01:/home# 
    
    
    

    1和3默认成 follower2号机默认为leader

    root@Hadoop03:~#  zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /home/zookeeper-3.4.10/bin/../conf/zoo.cfg
    Mode: follower
    root@Hadoop03:~# 
    
    

    停止命令

    zkServer.sh stop
    

    五、配置hadoop相关zookeeper

    1、在各主机上建立journal目录

      mkdir  /home/data/journal
    

    2、修改core-site.xml

         <!-- 指定hdfs的nameservice为ns -->
         <property>
              <name>fs.defaultFS</name>
              <value>hdfs://ns</value>
         </property>
         <!--指定hadoop数据临时存放目录-->
         <property>
              <name>hadoop.tmp.dir</name>
              <value>/home/data/hdfs/tmp</value>
         </property>
    
         <property>
              <name>io.file.buffer.size</name>
              <value>4096</value>
         </property>
         <!--指定zookeeper地址-->
         <property>
              <name>ha.zookeeper.quorum</name>
              <value>hadoop01:2181,hadoop02:2181,hadoop03:2181</value>
         </property>
    

    2、修改hdfs-site.xml

    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <!--指定hdfs的nameservice为ns,需要和core-site.xml中的保持一致 -->
        <property>
            <name>dfs.nameservices</name>
            <value>ns</value>
        </property>
        <!-- ns下面有两个NameNode,分别是nn1,nn2 -->
        <property>
           <name>dfs.ha.namenodes.ns</name>
           <value>nn1,nn2</value>
        </property>
        <!-- nn1的RPC通信地址 -->
        <property>
           <name>dfs.namenode.rpc-address.ns.nn1</name>
           <value>hadoop01:9820</value>
        </property>
        <!-- nn1的http通信地址 -->
        <property>
            <name>dfs.namenode.http-address.ns.nn1</name>
            <value>hadoop01:9870</value>
        </property>
        <!-- nn2的RPC通信地址 -->
        <property>
            <name>dfs.namenode.rpc-address.ns.nn2</name>
            <value>hadoop02:9820</value>
        </property>
        <!-- nn2的http通信地址 -->
        <property>
            <name>dfs.namenode.http-address.ns.nn2</name>
            <value>hadoop02:9870</value>
        </property>
        <!-- 指定NameNode的元数据在JournalNode上的存放位置 -->
        <property>
             <name>dfs.namenode.shared.edits.dir</name>
             <value>qjournal://hadoop01;hadoop02;hadoop03/ns</value>
        </property>
        <!-- 指定JournalNode在本地磁盘存放数据的位置 -->
        <property>
              <name>dfs.journalnode.edits.dir</name>
              <value>/home/data/journal</value>
        </property>
        <!-- 开启NameNode故障时自动切换 -->
        <property>
              <name>dfs.ha.automatic-failover.enabled</name>
              <value>true</value>
        </property>
        <!-- 配置失败自动切换实现方式 -->
        <property>
                <name>dfs.client.failover.proxy.provider.ns</name>
                <value>org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider</value>
        </property>
        <!-- 配置隔离机制,如果ssh是默认22端口,value直接写sshfence即可(hadoop:22022) -->
        <property>
                 <name>dfs.ha.fencing.methods</name>
                 <!-- <value>sshfence</value> -->
                     <value>
                        sshfence
                        shell(/bin/true)
                    </value>
        </property>
        <!-- 使用隔离机制时需要ssh免登陆 -->
        <property>
                <name>dfs.ha.fencing.ssh.private-key-files</name>
                <value>/root/.ssh/id_rsa</value>
        </property>
    
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:/home/data/hdfs/name</value>
        </property>
    
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:/home/data/hdfs/data</value>
        </property>
    
        <property>
           <name>dfs.replication</name>
           <value>2</value>
        </property>
        <!-- 在NN和DN上开启WebHDFS (REST API)功能,不是必须 -->
        <property>
           <name>dfs.webhdfs.enabled</name>
           <value>true</value>
        </property>
    </configuration>
    
    

    同步文件

    scp -r  /home/hadoop-3.1.0/etc/hadoop  hadoop02:/home/hadoop-3.1.0/etc
    scp -r  /home/hadoop-3.1.0/etc/hadoop  hadoop03:/home/hadoop-3.1.0/etc
    

    3、首次启动

    1、首先启动各个节点的Zookeeper,在各个节点上执行以下命令:
    zkServer.sh start
    2、在某一个namenode节点执行如下命令,创建命名空间
    hdfs zkfc -formatZK
    3、在每个journalnode节点用如下命令启动journalnode
    hdfs --daemon start journalnode
    4、在主namenode节点格式化namenode和journalnode目录
    hdfs namenode -format ns
    5、在主namenode节点启动namenode进程
    hdfs --daemon start namenode
    6、在备namenode节点执行第一行命令,这个是把备namenode节点的目录格式化并把元数据从主namenode节点copy过来,并且这个命令不会把journalnode目录再格式化了!然后用第二个命令启动备namenode进程!
    hdfs namenode -bootstrapStandby
    hdfs --daemon start namenode
    7、在两个namenode节点都执行以下命令
    hdfs --daemon start zkfc
    8、在所有datanode节点都执行以下命令启动datanode
    hadoop-daemon.sh start datanode

    http://192.168.1.142:9870/dfshealth.html#tab-overview

    http://192.168.1.141:9870/dfshealth.html#tab-overview

    后续日常
    start-all.sh
    stop-all.sh
    即可

    3、故障测试

    在02上

    root@Hadoop02:~# jps
    3410 QuorumPeerMain
    5636 DFSZKFailoverController
    5765 NodeManager
    5367 DataNode
    5287 NameNode
    5498 JournalNode
    5979 Jps
    

    kill namenode

    root@Hadoop02:~# kill -9 5287
    

    回去看standby的是否变成active自动切换成功图片


    至此,安装全部完成,从安装系统到完全跑通,历时2.5天时间。

    相关文章

      网友评论

      • 凌峦:<property> <name>dfs.namenode.shared.edits.dir</name> <value>qjournal://hadoop01;hadoop02;hadoop03/ns</value> </property> <!-- 指定JournalNode在本地磁盘存放数据的位置 --> <property> <name>dfs.journalnode.edits.dir</name> <value>/home/data/journal</value> </property>请问这两个配置可以配置指定NFS挂载目录吗?这样数据远程也有一份保证可靠性
        大道至简非简:@凌峦 配置几份数据是hdfs-site.xml
        <property>
        <name>dfs.replication</name>
        <value>2</value>
        </property>
        这个数量值决定的。
        不建议用hadoop3.1,这个不支持hbase2.0目前。hbase搭配hadoop2.8.3是官网推荐的。

      本文标题:三台主机的Hadoop3.1.0和zookeeper3.4.10

      本文链接:https://www.haomeiwen.com/subject/snxfdftx.html