美文网首页
(三)Hadoop集群环境搭建(完全分布式)

(三)Hadoop集群环境搭建(完全分布式)

作者: 小猪Harry | 来源:发表于2018-09-10 23:32 被阅读0次

    克隆三个主机,修改主机名分别为hadoop01,hadoop02,hadoop03:

    [root@hadoop01 ~]# hostname
    hadoop01
    [root@hadoop01 ~]# cat /etc/sysconfig/network
    NETWORKING=yes
    HOSTNAME=hadoop01
    [root@hadoop01 ~]# vi /etc/sysconfig/network
    [root@hadoop01 ~]# cat /etc/sysconfig/network
    NETWORKING=yes
    HOSTNAME=hadoop02
    [root@hadoop01 ~]# reboot
    

    配置三台机器:

    [root@hadoop01 ~]# vi /etc/hosts
    [root@hadoop01 ~]# cat /etc/hosts
    127.0.0.1   localhost localhost.localdomain localhost4 localhost4.localdomain4
    ::1         localhost localhost.localdomain localhost6 localhost6.localdomain6
    192.168.216.135     hadoop01
    192.168.216.136     hadoop02
    192.168.216.137     hadoop03
    
    服务器功能规划
    hadoop01 hadoop02 hadoop03
    NameNode
    DataNode DataNode DataNode
    NodeManager NodeManager NodeManager
    HistoryServer ResourceManager SecondaryNameNode

    1,在第一台机器上安装新的Hadoop
    为了和之前机器上安装伪分布式Hadoop区分开来,我们将第一台机器上的Hadoop服务都停止掉,然后在一个新的目录/opt/modules/app下安装另外一个Hadoop。我们采用先在第一台机器上解压、配置Hadoop,然后再分发到其他两台机器上的方式来安装集群。

    2,解压Hadoop目录

    3,配置Hadoop JDK路径修改hadoop-env.sh、mapred-env.sh、yarn-env.sh文件中的JDK路径

    4,配置core-site.xml

    [root@hadoop01 hadoop]# vi core-site.xml 
    [root@hadoop01 hadoop]# cat core-site.xml 
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
       <name>fs.defaultFS</name>
       <value>hdfs://hadoop01:8020</value>
     </property>
     <property>
       <name>hadoop.tmp.dir</name>
       <value>/opt/modules/app/hadoop-2.5.0/data/tmp</value>
     </property>
    </configuration>
    [root@hadoop01 hadoop]# 
    

    fs.defaultFS为NameNode的地址。

    hadoop.tmp.dir为hadoop临时目录的地址,默认情况下,NameNode和DataNode的数据文件都会存在这个目录下的对应子目录下。应该保证此目录是存在的,如果不存在,先创建。

    5,配置hdfs-site.xml

    [root@hadoop01 hadoop]# vi hdfs-site.xml 
    [root@hadoop01 hadoop]# cat hdfs-site.xml 
    <?xml version="1.0" encoding="UTF-8"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
       <name>dfs.namenode.secondary.http-address</name>
       <value>hadoop03:50090</value>
     </property>
    </configuration>
    

    dfs.namenode.secondary.http-address是指定secondaryNameNode的http访问地址和端口号,因为在规划中,我们将hadoop03规划为SecondaryNameNode服务器。

    6,配置slaves

    [root@hadoop01 hadoop]# vi /opt/modules/app/hadoop/etc/hadoop/slaves 
    [root@hadoop01 hadoop]# cat /opt/modules/app/hadoop/etc/hadoop/slaves 
    hadoop01
    hadoop02
    hadoop03
    

    slaves文件是指定HDFS上有哪些DataNode节点。

    7,配置yarn-site.xml

    [root@hadoop01 hadoop]# vi yarn-site.xml 
    [root@hadoop01 hadoop]# cat yarn-site.xml 
    <?xml version="1.0"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    <configuration>
    
    <!-- Site specific YARN configuration properties -->
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop02</value>
        </property>
        <property>
            <name>yarn.log-aggregation-enable</name>
            <value>true</value>
        </property>
        <property>
            <name>yarn.log-aggregation.retain-seconds</name>
            <value>106800</value>
        </property>
    
    </configuration>
    

    根据规划yarn.resourcemanager.hostname这个指定resourcemanager服务器指向hadoop02。

    yarn.log-aggregation-enable是配置是否启用日志聚集功能。

    yarn.log-aggregation.retain-seconds是配置聚集的日志在HDFS上最多保存多长时间。

    8,配置mapred-site.xml

    [root@hadoop01 hadoop]# vi mapred-site.xml
    [root@hadoop01 hadoop]# cat mapred-site.xml
    <?xml version="1.0"?>
    <?xml-stylesheet type="text/xsl" href="configuration.xsl"?>
    <!--
      Licensed under the Apache License, Version 2.0 (the "License");
      you may not use this file except in compliance with the License.
      You may obtain a copy of the License at
    
        http://www.apache.org/licenses/LICENSE-2.0
    
      Unless required by applicable law or agreed to in writing, software
      distributed under the License is distributed on an "AS IS" BASIS,
      WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
      See the License for the specific language governing permissions and
      limitations under the License. See accompanying LICENSE file.
    -->
    
    <!-- Put site-specific property overrides in this file. -->
    
    <configuration>
    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.address</name>
            <value>hadoop01:10020</value>
        </property>
        <property>
            <name>mapreduce.jobhistory.webapp.address</name>
            <value>hadoop01:19888</value>
        </property>
    
    </configuration>
    

    mapreduce.framework.name设置mapreduce任务运行在yarn上。

    mapreduce.jobhistory.address是设置mapreduce的历史服务器安装在hadoop01机器上。

    mapreduce.jobhistory.webapp.address是设置历史服务器的web页面地址和端口号。

    9,设置SSH无密码登录

    Hadoop集群中的各个机器间会相互地通过SSH访问,每次访问都输入密码是不现实的,所以要配置各个机器间的SSH是无密码登录的。

    a. 在hadoop01上生成公钥

    [root@hadoop01 hadoop]# ssh-keygen -t rsa
    
    Generating public/private rsa key pair.
    Enter file in which to save the key (/root/.ssh/id_rsa): 
    Created directory '/root/.ssh'.
    Enter passphrase (empty for no passphrase): 
    Enter same passphrase again: 
    Your identification has been saved in /root/.ssh/id_rsa.
    Your public key has been saved in /root/.ssh/id_rsa.pub.
    The key fingerprint is:
    6c:c6:80:64:00:ec:ab:b0:94:21:71:2e:a8:8b:c2:40 root@hadoop01
    The key's randomart image is:
    +--[ RSA 2048]----+
    |o...o            |
    |...o .           |
    |o+  . .          |
    |+E.    +         |
    |+.+     S        |
    |++     o         |
    |Bo               |
    |*.               |
    |.                |
    +-----------------+
    

    一路回车,都设置为默认值,然后再当前用户的Home目录下的.ssh目录中会生成公钥文件(id_rsa.pub)和私钥文件(id_rsa)。

    b. 分发公钥

    [root@hadoop01 hadoop]# yum install -y openssh-clients
    
    [root@hadoop01 hadoop]# ssh-copy-id hadoop01
    The authenticity of host 'hadoop01 (192.168.216.135)' can't be established.
    RSA key fingerprint is bd:5c:85:99:82:b4:b9:9d:92:fa:35:48:63:e1:5c:ce.
    Are you sure you want to continue connecting (yes/no)? yes
    Warning: Permanently added 'hadoop01,192.168.216.135' (RSA) to the list of known hosts.
    root@hadoop01's password: 
    Now try logging into the machine, with "ssh 'hadoop01'", and check in:
    
      .ssh/authorized_keys
    
    to make sure we haven't added extra keys that you weren't expecting.
    
    [root@hadoop01 hadoop]# ssh-copy-id hadoop02
    [root@hadoop01 hadoop]# ssh-copy-id hadoop03
    

    同样的在hadoop02、hadoop03上生成公钥和私钥后,将公钥分发到三台机器上。

    分发Hadoop文件

    1,首先在其他两台机器上创建存放Hadoop的目录

    [root@hadoop02 ~]# mkdir -p /opt/modules/app
    [root@hadoop03 ~]# mkdir -p /opt/modules/app
    

    2,通过Scp分发
    Hadoop根目录下的share/doc目录是存放的hadoop的文档,文件相当大,建议在分发之前将这个目录删除掉,可以节省硬盘空间并能提高分发的速度。

    [root@hadoop01 hadoop]# du -sh /opt/modules/app/hadoop/share/doc
    [root@hadoop01 hadoop]# rm -rf /opt/modules/app/hadoop/share/doc/
    [root@hadoop01 hadoop]# scp -r /opt/modules/app/hadoop/ hadoop02:/opt/modules/app
    [root@hadoop01 hadoop]# scp -r /opt/modules/app/hadoop/ hadoop03:/opt/modules/app
    

    3,格式NameNode
    在NameNode机器上执行格式化:

    [root@hadoop01 hadoop]# /opt/modules/app/hadoop/bin/hdfs namenode -format
    

    如果需要重新格式化NameNode,需要先将原来NameNode和DataNode下的文件全部删除,不然会报错,NameNode和DataNode所在目录是在core-site.xml中hadoop.tmp.dir、dfs.namenode.name.dir、dfs.datanode.data.dir属性配置的。

    因为每次格式化,默认是创建一个集群ID,并写入NameNode和DataNode的VERSION文件中(VERSION文件所在目录为dfs/name/current 和 dfs/data/current),重新格式化时,默认会生成一个新的集群ID,如果不删除原来的目录,会导致namenode中的VERSION文件中是新的集群ID,而DataNode中是旧的集群ID,不一致时会报错。

    另一种方法是格式化时指定集群ID参数,指定为旧的集群ID。

    启动集群
    [root@hadoop01 sbin]# /opt/modules/app/hadoop/sbin/start-dfs.sh
    18/09/11 07:07:01 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    Starting namenodes on [hadoop01]
    hadoop01: starting namenode, logging to /opt/modules/app/hadoop/logs/hadoop-root-namenode-hadoop01.out
    hadoop03: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop03.out
    hadoop02: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop02.out
    hadoop01: starting datanode, logging to /opt/modules/app/hadoop/logs/hadoop-root-datanode-hadoop01.out
    Starting secondary namenodes [hadoop03]
    hadoop03: starting secondarynamenode, logging to /opt/modules/app/hadoop/logs/hadoop-root-secondarynamenode-hadoop03.out
    18/09/11 07:07:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
    [root@hadoop01 sbin]# 
    
    
    
    [root@hadoop01 sbin]# jps
    3185 Jps
    2849 NameNode
    2974 DataNode
    [root@hadoop02 ~]# jps
    2305 Jps
    2227 DataNode
    [root@hadoop03 ~]# jps
    2390 Jps
    2312 SecondaryNameNode
    2217 DataNode
    
    启动yarn
    [root@hadoop01 sbin]# /opt/modules/app/hadoop/sbin/start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-resourcemanager-hadoop01.out
    hadoop02: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop02.out
    hadoop03: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop03.out
    hadoop01: starting nodemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-nodemanager-hadoop01.out
    [root@hadoop01 sbin]# jps
    3473 Jps
    3329 NodeManager
    2849 NameNode
    2974 DataNode
    [root@hadoop01 sbin]# 
    
    [root@hadoop02 ~]# jps
    2337 NodeManager
    2227 DataNode
    2456 Jps
    [root@hadoop02 ~]# 
    
    [root@hadoop03 ~]# jps
    2547 Jps
    2312 SecondaryNameNode
    2217 DataNode
    2428 NodeManager
    [root@hadoop03 ~]# 
    

    在hadoop02上启动ResourceManager:

    [root@hadoop02 ~]# /opt/modules/app/hadoop/sbin/yarn-daemon.sh start resourcemanager
    starting resourcemanager, logging to /opt/modules/app/hadoop/logs/yarn-root-resourcemanager-hadoop02.out
    [root@hadoop02 ~]# jps
    2337 NodeManager
    2227 DataNode
    2708 Jps
    2484 ResourceManager
    [root@hadoop02 ~]# 
    
    启动日志服务器

    因为我们规划的是在hadoop03服务器上运行MapReduce日志服务,所以要在hadoop03上启动。

    [root@hadoop03 ~]# /opt/modules/app/hadoop/sbin/mr-jobhistory-daemon.sh start historyserver
    starting historyserver, logging to /opt/modules/app/hadoop/logs/mapred-root-historyserver-hadoop03.out
    [root@hadoop03 ~]# jps
    2312 SecondaryNameNode
    2217 DataNode
    2602 JobHistoryServer
    2428 NodeManager
    2639 Jps
    [root@hadoop03 ~]# 
    

    配置windows里面的host

    查看HDFS Web页面

    hadoop01:50070

    查看YARN Web 页面

    hadoop02:8088

    测试Job

    我们这里用hadoop自带的wordcount例子来在本地模式下测试跑mapreduce。

    1、 准备mapreduce输入文件wc.input

    [hadoop@bigdata-senior01 modules]$ cat /opt/data/wc.input
    hadoop mapreduce hive
    hbase spark storm
    sqoop hadoop hive
    spark hadoop
    
    

    2、 在HDFS创建输入目录input

    [hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -mkdir /input
    
    

    3、 将wc.input上传到HDFS

    [hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -put /opt/data/wc.input /input/wc.input
    
    

    4、 运行hadoop自带的mapreduce Demo

    [hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/yarn jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.5.0.jar wordcount /input/wc.input /output
    
    

    5、 查看输出文件

    [hadoop@bigdata-senior01 hadoop-2.5.0]$ bin/hdfs dfs -ls /output
    Found 2 items
    -rw-r--r--   3 hadoop supergroup          0 2016-07-14 16:36 /output/_SUCCESS
    -rw-r--r--   3 hadoop supergroup         60 2016-07-14 16:36 /output/part-r-00000
    
    

    相关文章

      网友评论

          本文标题:(三)Hadoop集群环境搭建(完全分布式)

          本文链接:https://www.haomeiwen.com/subject/irregftx.html