美文网首页大数据
hadoop第四章:完全分布式(开发重点)

hadoop第四章:完全分布式(开发重点)

作者: 张磊_e325 | 来源:发表于2019-10-20 23:08 被阅读0次

    Hadoop集群设置

    1、准备3台客户机(关闭防火墙、静态ip、主机名称)

    2、快速分发java和hadoop

    将/opt/module文件夹,及配置文件/etc/profile分发到hadoop102|103|104上
    分发配置文件时要使用root账户

    [atguigu@hadoop101 opt]$ scp -r hadoop101:/opt/module hadoop102:/opt/
    [atguigu@hadoop101 opt]$ scp -r hadoop101:/opt/module hadoop103:/opt/
    [atguigu@hadoop101 opt]$ scp -r hadoop101:/opt/module hadoop104:/opt/
    [atguigu@hadoop101 opt]$ su - root
    [root@hadoop101 ~]$ scp -r hadoop101:/etc/profile hadoop102:/etc/
    [root@hadoop101 ~]$ scp -r hadoop101:/etc/profile hadoop103:/etc/
    [root@hadoop101 ~]$ scp -r hadoop101:/etc/profile hadoop104:/etc/
    

    刷新配置文件,使java环境变量生效
    [atguigu@hadoop102 ~]$ source /etc/profile
    [atguigu@hadoop103 ~]$ source /etc/profile
    [atguigu@hadoop104 ~]$ source /etc/profile

    3、集群规划(最小配置)

    image.png

    4、修改配置文件

    4.1 核心配置文件core-site.xml

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/core-site.xml

    <!-- 指定HDFS中NameNode的地址 -->
    <property>
            <name>fs.defaultFS</name>
          <value>hdfs://hadoop102:9000</value>
    </property>
    
    <!-- 指定Hadoop运行时产生文件的存储目录 -->
    <property>
            <name>hadoop.tmp.dir</name>
            <value>/opt/module/hadoop-2.7.2/data/tmp</value>
    </property>
    

    4.2 HDFS配置文件

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/hdfs-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144
    

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/hdfs-site.xml

    <property>
            <name>dfs.replication</name>
            <value>3</value>
    </property>
    <!-- 指定Hadoop辅助名称节点主机配置 -->
    <property>
          <name>dfs.namenode.secondary.http-address</name>
          <value>hadoop104:50090</value>
    </property>
    

    4.3 YARN配置文件

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/yarn-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144
    

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/yarn-site.xml

    <!-- Reducer获取数据的方式 -->
    <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
    </property>
    <!-- 指定YARN的ResourceManager的地址 -->
    <property>
            <name>yarn.resourcemanager.hostname</name>
            <value>hadoop103</value>
    </property>
    

    4.4 MapReduce配置文件

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/mapred-env.sh

    export JAVA_HOME=/opt/module/jdk1.8.0_144
    

    [atguigu@hadoop102 hadoop-2.7.2]$ cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/mapred-site.xml

    <!-- 指定MR运行在Yarn上 -->
    <property>
            <name>mapreduce.framework.name</name>
            <value>yarn</value>
    </property>
    

    5、分发配置文件

    [atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop/ hadoop103:/opt/module/hadoop-2.7.2/etc/
    [atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop/ hadoop104:/opt/module/hadoop-2.7.2/etc/
    

    6、集群单点启动

    6.0 如果集群是第一次启动,需要格式化NameNode

    [atguigu@hadoop102 hadoop-2.7.2]$ hdfs namenode -format

    6.1 hadoop102启动namenode

    [atguigu@hadoop102 hadoop-2.7.2]$ hadoop-daemon.sh start namenode

    6.2 分别启动datanode

    [atguigu@hadoop102 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
    [atguigu@hadoop102 hadoop-2.7.2]$ jps
    15761 Jps
    15609 NameNode
    15690 DataNode
    
    [atguigu@hadoop103 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
    [atguigu@hadoop103 hadoop-2.7.2]$ jps
    15250 DataNode
    15321 Jps
    
    [atguigu@hadoop104 hadoop-2.7.2]$ hadoop-daemon.sh start datanode
    [atguigu@hadoop104 hadoop-2.7.2]$ jps
    15253 DataNode
    15324 Jps
    

    6.3 HDFS-WEB验证

    image.png

    7、配置ssh

    3台机器分别执行ssh

    $ cd .ssh
    $ ssh-keygen -t rsa
    $ ssh-copy-id hadoop102
    $ ssh-copy-id hadoop103
    $ ssh-copy-id hadoop104
    

    3台机器分别验证,都不需要密码则成功

    $ ssh hadoop102
    $ exit
    $ ssh hadoop103
    $ exit
    $ ssh hadoop104
    $ exit
    

    8、集群启动

    8.1 指定datanode

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/slaves

    hadoop102
    hadoop103
    hadoop104
    

    分发到其他两台虚拟机

    [atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop/slaves hadoop103:/opt/module/hadoop-2.7.2/etc/hadoop/
    [atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop/slaves hadoop104:/opt/module/hadoop-2.7.2/etc/hadoop/
    

    8.2 群起hdfs

    在hadoop102(namenode节点)上启动hdfs
    [atguigu@hadoop102 hadoop-2.7.2]$ start-dfs.sh
    在hadoop103(sourceManager节点)上启动yarn
    [atguigu@hadoop103 hadoop-2.7.2]$ start-yarn.sh

    8.3 web界面查看

    hadoop102:50070

    image.png
    hadoop103:8088
    image.png

    9、运行wordcount

    [atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -put wcinput /
    [atguigu@hadoop102 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /wcinput /wcoutput
    [atguigu@hadoop102 hadoop-2.7.2]$ hadoop fs -cat /wcoutput/*
    atguigu 2
    hadoop  3
    hdfs    1
    mapreduce   1
    yarn    1
    

    hadoop102:50070

    image.png
    hadoop103:8088
    image.png

    10、启动日志聚集功能

    日志聚集功能能够解决:当集群庞大,yarn的任务有错时无法定位到是某台节点的异常,日志聚集可以将所有节点日志都收集到历史服务器上,并通过yarn管理界面查看

    10.0 关闭hadoop集群

    [atguigu@hadoop103 hadoop-2.7.2]$ stop-yarn.sh
    [atguigu@hadoop102 hadoop-2.7.2]$ stop-dfs.sh
    

    10.1 配置日志服务器

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/mapred-site.xml
    在mapred-site.xml中添加如下内容

    <!-- 历史服务器端地址 -->
    <property>
        <name>mapreduce.jobhistory.address</name>
        <value>hadoop104:10020</value>
    </property>
    <!-- 历史服务器web端地址 -->
    <property>
        <name>mapreduce.jobhistory.webapp.address</name>
        <value>hadoop104:19888</value>
    </property>
    

    [atguigu@hadoop102 hadoop-2.7.2]$ vi etc/hadoop/yarn-site.xml
    在yarn-site.xml中添加如下内容

    <!-- 日志聚集功能使能 -->
    <property>
        <name>yarn.log-aggregation-enable</name>
        <value>true</value>
    </property>
    <!-- 日志保留时间设置7天 -->
    <property>
        <name>yarn.log-aggregation.retain-seconds</name>
        <value>604800</value>
    </property>
    

    10.2 同步配置文件

    [atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop hadoop103:/opt/module/hadoop-2.7.2/etc
    [atguigu@hadoop102 hadoop-2.7.2]$ scp -r hadoop102:/opt/module/hadoop-2.7.2/etc/hadoop hadoop104:/opt/module/hadoop-2.7.2/etc
    

    10.3 重启hadoop集群&重启history服务器

    在hadoop102(namenode节点)上启动hdfs
    [atguigu@hadoop102 hadoop-2.7.2]$ start-dfs.sh
    在hadoop103(sourceManager节点)上启动yarn
    [atguigu@hadoop103 hadoop-2.7.2]$ start-yarn.sh
    在hadoop104上启动history
    [atguigu@hadoop104 hadoop-2.7.2]$ mr-jobhistory-daemon.sh start historyserver
    此时hadoop04出现JobHistoryServer服务

    [atguigu@hadoop104 hadoop-2.7.2]$ jps
    8097 DataNode
    8276 NodeManager
    8424 JobHistoryServer
    8472 Jps
    8201 SecondaryNameNode
    

    10.4 重跑wordcount任务并查看历史和日志

    [atguigu@hadoop102 hadoop-2.7.2]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.2.jar wordcount /wcinput /historytest
    

    发现还是访问不了日志页面,原来是没配pc端hosts文件。哈哈,不能偷懒呀

    image.png
    image.png

    相关文章

      网友评论

        本文标题:hadoop第四章:完全分布式(开发重点)

        本文链接:https://www.haomeiwen.com/subject/pbbamctx.html