Hadoop安装配置

作者: 64d1c00cca37 | 来源:发表于2017-05-30 17:50 被阅读1046次

    配置所需工具

    Hadoop部署准备 本地VMware安装Linux系统家族中Ubuntu 16.04
    Java对应版本jdk-8u73-linux-x64.tar.gz
    Hadoop版本hadoop-2.8.0.tar.gz

    VMware虚拟机建议使用 NAT 网络设置,因为在配置 Hadoop 环境过程中如果IP地址发生改变,则配置失效。

    网络配置

    伪分布式安装配置

    修改主机相关配置

    1.首先使用VMware安装Ubuntu系统,所有的操作都要在root下进行。修改主机名,由于创建的是伪分布式即单机版,修改主机名:
    修改hostname文件:vim /etc/hostname ,修改内容如下:

    hadoop-alone
    

    2./etc/hosts 文件里注册修改的主机名称,如果此处不进行修改操作,Hadoop服务启动会报错,使用命令 vim /etc/hosts修改hosts文件,修改完成之后,重启主机使修改配置文件生效,可以使用 reboot 命令。

    127.0.0.1       localhost
    127.0.1.1       localhost
    127.0.0.1       hadoop-alone
    
    hosts文件

    免密登录授权配置

    • 删除已有的 ssh 配置,使用命令 rm -r ~/.ssh 进行删除 ;
    • 生成新的 ssh-key ,使用命令 ssh-keygen -t rsa
      -为本机进行公钥的注册:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    伪分布式配置

    Hadoop配置

    1.首先安装配置JDK,解压jdk文件到/usr/local 目录下:

    tar -xzvf /srv/ftp/jdk-8u73-linux-x64.tar.gz -C /usr/local/
    move /usr/local/jdk1.8.0_73  /usr/local/jdk
    

    2.配置jdk环境属性,编辑 /etc/profle 文件,使用命令 vim /etc/profle

    export JAVA_HOME=/usr/local/jdk
    export PATH=$PATH:$JAVA_HOME/bin:
    

    编辑完成之后保存退出,使用 source /etc/profile 使配置文件立即生效。
    完成之后,可以查看安装的jdk详情:

    JDK版本信息
    3.配置hadoop
    tar xzvf /srv/ftp/hadoop-2.8.0.tar.gz -C /usr/local/
    mv /usr/local/hadoop-2.8.0/ /usr/local/hadoop
    

    4.修改 /etc/profile 配置文件,打开profile配置文件:vim /etc/profile 具体配置如下:

    export JAVA_HOME=/usr/local/jdk
    export HADOOP_HOME=/usr/local/hadoop
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin
    

    保存退出后使配置立即生效:source /etc/profile

    5.Hadoop依赖于Java的开发包(JAVA_HOME),虽然现在已经在profile文件里面定义了JAVA_HOME,但是很多时候Hadoop找不到这个配置,所以建议在整个的配置之中手工配置一下要使用的JDK
    vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh
    手工配置要使用的JAVA_HOME环境:

    export JAVA_HOME=/usr/local/jdk
    
    hadoop-env

    6.Hadoop安装完成之后它直接给出了一个测试的环境命令,这个测试的主要功能是进行一个单词统计。

    • 如果要想进行本操作的实现,那么首先要设置有一个输入的路径,这个路径里面要设置进行统计的普通文本文件:
      mkdir -p /usr/test/hadoop/input
    • 通过Hadoop安装目录下将README.txt文件拷贝到此目录之中:
      cp /usr/local/hadoop/README.txt /usr/test/hadoop/input/

    7.如果要进行统计操作则可以使用hadoop默认提供的工具。

    • 路径:/usr/local/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.8.0-sources.jar
    • 程序类:org.apache.hadoop.examples.WordCount
    • 如果要想执行Hadoop程序,一定要使用Hadoop提供的开发包,所以不再是简单的java执行:
    hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/sources/hadoop-mapreduce-examples-2.8.0-sources.jar org.apache.hadoop.examples.WordCount /usr/test/hadoop/input /usr/test/hadoop/output
    

    执行完成之后,观察之前配置好的输出目录,cat /usr/test/hadoop/output/part-r-00000

    伪分布式搭建

    1.Hadoop所有的配置文件目录都保存在:/usr/local/hadoop/etc/hadoop
    2.修改core-site.xml配置文件,这个配置文件为整个Hadoop运行中的核心配置文件:

    • 官方文档:http://hadoop.apache.org/docs/r2.8.0/hadoop-project-dist/hadoop-common/SingleCluster.html

    注意:Hadoop默认启动的时候使用的是系统的 /tmp 目录,但是这个目录每一次重新启动之后都会自动清空,也就是说如果你现在不配置好一个临时的存储目录,那么下一次你的Hadoop就无法启动了。

    • 建立一个保存临时目录的路径:mkdir -p /usr/data/hadoop/tmp
    • 编辑core-site.xml配置文件:vim /usr/local/hadoop/etc/hadoop/core-site.xml
    <configuration>
        <property>
            <name>hadoop.tmp.dir</name>
            <value>/usr/data/hadoop/tmp</value>
            <description>Abase for other temporary directories.</description>
        </property>
        <property>
            <name>fs.defaultFS</name>
            <value>hdfs://hadoop-alone:9000</value>
        </property>
    </configuration>
    

    3. 修改hdfs-site.xml文件,该配置文件主要的功能是进行HDFS分布式存储的配置

    注意:如果你现在Hadoop网络环境发生了改变,这两个目录一定要清空,否则Hadoop就启动不了了。

    • 建立namenode进程的保存路径:mkdir -p /usr/data/hadoop/dfs/name
    • 建立datanode进程的保存路径:mkdir -p /usr/data/hadoop/dfs/data ·
      编辑配置文件:vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml
    <configuration>
        <property>
            <name>dfs.replication</name>
            <value>1</value>
        </property>
        <property>
            <name>dfs.namenode.name.dir</name>
            <value>file:///usr/data/hadoop/dfs/name</value>
        </property>
        <property>
            <name>dfs.datanode.data.dir</name>
            <value>file:///usr/data/hadoop/dfs/data</value>
        </property>
    <!--
        <property>
            <name>dfs.namenode.http-address</name>
            <value>hadoop-alone:50070</value>
        </property>
        <property>
            <name>dfs.namenode.secondary.http-address</name>
            <value>hadoop-alone:50090</value>
        </property>
    -->
        <property> 
            <name>dfs.permissions</name>
            <value>false</value>
         </property>
    </configuration>
    

    4.修改yarn-site.xml配置文件,这个是进行yarn分析结构使用的
    修改配置文件:vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

    <configuration>
        <property>
            <name>yarn.resourcemanager.admin.address</name>
            <value>hadoop-alone:8033</value>
            </property>
        <property>
            <name>yarn.nodemanager.aux-services</name>
            <value>mapreduce_shuffle</value>
        </property>
        <property>
            <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name>
            <value>org.apache.hadoop.mapred.ShuffleHandler</value>
          </property>
        <property>
            <name>yarn.resourcemanager.resource-tracker.address</name>
            <value>hadoop-alone:8025</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>hadoop-alone:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.address</name>
            <value>hadoop-alone:8050</value>
        </property>
        <property>
            <name>yarn.resourcemanager.scheduler.address</name>
            <value>hadoop-alone:8030</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.address</name>
            <value>hadoop-alone:8088</value>
        </property>
        <property>
            <name>yarn.resourcemanager.webapp.https.address</name>
            <value>hadoop-alone:8090</value>
        </property>
    </configuration>
    

    5.由于现在属于单主机伪分布式运行方式,所以还需要修改一下从节点的配置文件
    修改配置文件:vim /usr/local/hadoop/etc/hadoop/slaves

    hadoop-alone
    

    6.现在对于数据的保存保存在了/usr/data/hadoop/{name,data}目录下,所以如果要想使用,则必须针对于这两个目录进行格式化的处理操作:hdfs namenode -format

    INFO util.ExitUtil: Exiting with status 0
    

    其中的0表示的是现在的程序没有任何的错误,你的配置成功。
    7.启动Hadoop相关进程(不建议使用此启动命令):/usr/local/hadoop/sbin/start-all.sh

    启动结果
    启动完成之后,查看进行:jps 服务名称
    8.浏览器访问:
    伪分布式访问

    Hadoop分布式集群配置

    Hadoop集群

    环境准备

    1.准备Hadoop分布式集群主机,本次使用七台主机,具体说明如下:
    [1] hadoop-namenode主机:专门负责保存有NameNode进程;
    [1] hadoop-secondarynamenode主机:负责SencondaryNameNode进程;
    [3 + 1] hadoop-datanode-*主机:负责数据的存储(DataNode)、数据分析(NodeManager);
    [1] hadoop-resourcemanager主机:负责ResourceManager进程运行.

    2.【hadoop-namenode】修改hosts文件

    hadoop-namenode 192.168.125.141 运行NameNode进程
    hadoop-resourcemanager 192.168.125.142 运行ResourceManager进程
    hadoop-secondarynamenode 192.168.125.143 运行secondaryNameNode进程
    hadoop-datanode-a 192.168.125.144 运行DataNode、NodeManager进程
    hadoop-datanode-b 192.168.125.145 运行DataNode、NodeManager进程
    hadoop-datanode-c 192.168.125.146 运行DataNode、NodeManager进程
    hadoop-datanode-back 192.168.125.147 [动态扩充主机]运行DataNode、NodeManager进程

    修改内容如下:

    127.0.0.1       localhost
    127.0.1.1       localhost
    192.168.125.141         hadoop-namenode
    192.168.125.142         hadoop-resourcemanager
    192.168.125.143         hadoop-secondarynamenode
    192.168.125.144         hadoop-datanode-a
    192.168.125.145         hadoop-datanode-b
    192.168.125.146         hadoop-datanode-c
    192.168.125.147         hadoop-datanode-back
    

    修改完成之后reboot重启主机,使配置生效

    3.【hadoop-namenode主机】进行ssh免密登录配置
    删除掉已有的ssh配置:rm -r ~/.ssh
    生成新的ssh-key信息:ssh-keygen -t rsa
    设置本机的免登录配置:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    将本机的ssh-key信息发送到所有的其它主机上:

    ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-resourcemanager
    ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-secondarynamenode
    ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-datanode-a
    ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-datanode-b
    ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-datanode-c
    ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-datanode-back
    

    4.【hadoop-namenode主机】所有主机的hosts文件内容必须保持一致,所以此时一定要将hadoop-namenode节点中的/etc/hosts文件拷贝到所有主机之中:

    复制本机的hosts文件到其他主机,复制完成后,其他主机重启:

    发送到hadoop-resourcemanager主机:scp /etc/hosts hadoop-resourcemanager:/etc/
    发送到hadoop-secondarynamenode主机:scp /etc/hosts hadoop-secondarynamenode:/etc/
    发送到hadoop-datanode-a主机:scp /etc/hosts hadoop-datanode-a:/etc/
    发送到hadoop-datanode-b主机:scp /etc/hosts hadoop-datanode-b:/etc/
    发送到hadoop-datanode-c主机:scp /etc/hosts hadoop-datanode-c:/etc/

    配置NameNode节点

    1.【hadoop-namenode主机】将hadoop开发包上传到主机之中,而后进行解压缩控制:
    tar xzvf /srv/ftp/hadoop-2.8.0.tar.gz -C /usr/local/
    2.【hadoop-namenode主机】将解压缩后的hadoop文件夹进行更名处理:
    mv /usr/local/hadoop-2.8.0/ /usr/local/hadoop
    3.【hadoop-namenode主机】为了方便后续的处理,建议将doc目录删除掉,但是千万别删除错了;
    rm -r /usr/local/hadoop/share/doc/
    4.【hadoop-namenode主机】依然要进行profile编辑
    打开配置文件:vim /etc/profile
    设置如下内容:

    export JAVA_HOME=/usr/local/jdk
    export HADOOP_HOME=/usr/local/hadoop
    export PATH=$PATH:$JAVA_HOME/bin:$HADOOP_HOME/bin:$HADOOP_HOME/sbin:
    

    让配置立即生效:source /etc/profile
    5.【hadoop-namenode主机】将profile配置发送到其它主机之中:

    发送到hadoop-resourcemanager主机:scp /etc/profile hadoop-resourcemanager:/etc/
    发送到hadoop-secondarynamenode主机:scp /etc/profile hadoop-secondarynamenode:/etc/
    发送到hadoop-datanode-a主机:scp /etc/profile hadoop-datanode-a:/etc/
    发送到hadoop-datanode-b主机:scp /etc/profile hadoop-datanode-b:/etc/
    发送到hadoop-datanode-c主机:scp /etc/profile hadoop-datanode-c:/etc/
    发送到hadoop-datanode-back主机:scp /etc/profile hadoop-datanode-back:/etc/

    6.【hadoop-*】所有主机使profile配置立即生效:source /etc/profile

    7.【hadoop-namenode主机】现在的需求:datanode与nodemanager都在一台主机上运行,所以现在修改slaves配置文件,将已有的主机名称进行配置:vim /usr/local/hadoop/etc/hadoop/slaves

    hadoop-datanode-a
    hadoop-datanode-b
    hadoop-datanode-c
    

    8.【hadoop-namenode主机】修改hadoop运行环境配置文件:vim /usr/local/hadoop/etc/hadoop/hadoop-env.sh

    export JAVA_HOME=/usr/local/jdk
    

    9.【hadoop-namenode主机】建立几个hadoop保存目录:

    • 建立一级目录:临时目录和dfs目录:mkdir -p /usr/data/hadoop/{tmp,dfs}
    • 在dfs目录下创建name、data两个子目录;mkdir -p /usr/data/hadoop/dfs/{name,data}

    10.【hadoop-namenode主机】修改core-site.xml配置文件:vim /usr/local/hadoop/etc/hadoop/core-site.xml

    <configuration> 
        <property> 
            <name>hadoop.tmp.dir</name> 
            <value>/usr/data/hadoop/tmp</value> 
        </property> 
        <property> 
            <name>fs.defaultFS</name> 
            <value>hdfs://hadoop-namenode:9000</value> 
        </property> 
    </configuration>
    

    11.【hadoop-namenode主机】修改hdfs-site.xml文件:vim /usr/local/hadoop/etc/hadoop/hdfs-site.xml

    <configuration> 
        <property> 
            <name>dfs.replication</name> 
            <value>3</value> 
        </property> 
        <property> 
            <name>dfs.namenode.name.dir</name> 
            <value>file:///usr/data/hadoop/dfs/name</value> 
        </property> 
        <property> 
            <name>dfs.datanode.data.dir</name> 
            <value>file:///usr/data/hadoop/dfs/data</value> 
        </property> 
        <property>
            <name>dfs.namenode.secondary.http-address</name> 
            <value>hadoop-secondarynamenode:50090</value> 
        </property> 
        <property> 
            <name>dfs.permissions</name> 
            <value>false</value> 
        </property> 
    </configuration>
    

    此时DataNode节点很多,所以文件的副本量可以设置多个。

    12.【hadoop-namenode主机】修改yarn-site.xml配置文件:vim /usr/local/hadoop/etc/hadoop/yarn-site.xml

    <configuration> 
        <property> 
            <name>yarn.resourcemanager.admin.address</name> 
            <value>hadoop-resourcemanager:8033</value> 
        </property> 
        <property> 
            <name>yarn.nodemanager.aux-services</name> 
            <value>mapreduce_shuffle</value> 
        </property> 
        <property> 
            <name>yarn.nodemanager.aux-services.mapreduce_shuffle.class</name> 
            <value>org.apache.hadoop.mapred.ShuffleHandler</value> 
        </property> 
        <property> 
            <name>yarn.resourcemanager.resource-tracker.address</name> 
            <value>hadoop-resourcemanager:8025</value> 
        </property> 
        <property> 
            <name>yarn.resourcemanager.scheduler.address</name> 
            <value>hadoop-resourcemanager:8030</value> 
        </property> 
        <property> 
            <name>yarn.resourcemanager.address</name> 
            <value>hadoop-resourcemanager:8050</value> 
        </property> 
        <property> 
            <name>yarn.resourcemanager.scheduler.address</name> 
            <value>hadoop-resourcemanager:8030</value> 
        </property> 
        <property> 
            <name>yarn.resourcemanager.webapp.address</name> 
            <value>hadoop-resourcemanager:8088</value> 
        </property> 
        <property>
            <name>yarn.resourcemanager.webapp.https.address</name> 
            <value>hadoop-resourcemanager:8090</value> 
        </property> 
    </configuration>
    

    13.【hadoop-namenode主机】现在一个基本环境的Hadoop搭建完成了,但是需要注意一个问题,Hadoop现在是属于集群环境,如果是集群环境,这种情况下一定要考虑一个pid的问题,默认情况下进程的pid都会保存在“/tmp”目录下,但是这个目录会被定期清空,所以需要更改一下pid的目录。
    修改hadoop-dameon.sh配置文件:vim /usr/local/hadoop/sbin/hadoop-daemon.sh

    HADOOP_PID_DIR=/usr/data/hadoop/pids
    

    修改yarn-daemon.sh配置文件:vim /usr/local/hadoop/sbin/yarn-daemon.sh

    YARN_PID_DIR=/usr/data/hadoop/yarn-pids
    

    14.【hadoop-namenode主机】将配置好的hadoop开发包发送到其它主机上:

    发送到hadoop-resourcemanager主机:scp -r /usr/local/hadoop/ hadoop-resourcemanager:/usr/local/
    发送到hadoop-secondarynamenode主机:scp -r /usr/local/hadoop/ hadoop-secondarynamenode:/usr/local/
    发送到hadoop-datanode-a主机:scp -r /usr/local/hadoop/ hadoop-datanode-a:/usr/local/
    发送到hadoop-datanode-b主机:scp -r /usr/local/hadoop/ hadoop-datanode-b:/usr/local/
    发送到hadoop-datanode-c主机:scp -r /usr/local/hadoop/ hadoop-datanode-c:/usr/local/
    发送到hadoop-datanode-back主机:scp -r /usr/local/hadoop/ hadoop-datanode-back:/usr/local/

    配置其他节点

    1.【hadoop-*主机】建立文件保存路径:

    mkdir -p /usr/data/hadoop/{tmp,dfs}
    mkdir -p /usr/data/hadoop/dfs/{name,data}
    

    2.【hadoop-namenode主机】格式化namenode节点:hdfs namenode -format

    3.【hadoop-namenode主机】启动dfs相关进程:/usr/local/hadoop/sbin/start-dfs.sh

    root@hadoop-namenode:~# /usr/local/hadoop/sbin/start-dfs.sh
    Starting namenodes on [hadoop-namenode]
    hadoop-namenode: starting namenode, logging to /usr/local/hadoop/logs/hadoop-root-namenode-hadoop-namenode.out
    hadoop-datanode-c: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-doop-datanode-c.out
    hadoop-datanode-a: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop-datanode-a.out
    hadoop-datanode-b: starting datanode, logging to /usr/local/hadoop/logs/hadoop-root-datanode-hadoop-datanode-b.out
    Starting secondary namenodes [hadoop-secondarynamenode]
    hadoop-secondarynamenode: starting secondarynamenode, logging to /usr/local/hadoop/logs/hadoop-root-secondarynamenode-hadoop-secondarynamenode.out
    root@hadoop-namenode:~# 
    

    查看启动结果

    root@hadoop-namenode:~# jps
    5824 Jps
    5588 NameNode
    
    root@hadoop-secondarynamenode:~# jps
    5545 SecondaryNameNode
    5595 Jps
    
    root@hadoop-datanode-a:~# jps
    5147 Jps
    5070 DataNode
    
    root@hadoop-datanode-b:~# jps
    5098 Jps
    5021 DataNode
    
    root@hadoop-datanode-c:~# jps
    5100 Jps
    5022 DataNode
    
    

    4.【hadoop-resourcemanager主机】生成SSH-KEY:
    删除掉已有的ssh配置:rm -r ~/.ssh
    生成新的ssh-key信息:ssh-keygen -t rsa
    设置本机的免登录配置:cat ~/.ssh/id_rsa.pub >> ~/.ssh/authorized_keys

    将本机的ssh-key信息发送到所有的其它主机上:
    发送到hadoop-datanode-a主机:ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-datanode-a
    发送到hadoop-datanode-b主机:ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-datanode-b
    发送到hadoop-datanode-c主机:ssh-copy-id -i ~/.ssh/id_rsa.pub hadoop-datanode-c

    5.【hadoop-resourcemanager主机】启动YARN相关进程:/usr/local/hadoop/sbin/start-yarn.sh

    root@doop-resourcemanager:~# /usr/local/hadoop/sbin/start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /usr/local/hadoop/logs/yarn-root-resourcemanager-doop-resourcemanager.out
    hadoop-datanode-c: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-doop-datanode-c.out
    hadoop-datanode-a: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop-datanode-a.out
    hadoop-datanode-b: starting nodemanager, logging to /usr/local/hadoop/logs/yarn-root-nodemanager-hadoop-datanode-b.out
    root@doop-resourcemanager:~# 
    

    6.【hadoop-namenode主机】启动WEB浏览器界面:
    修改一下hosts文件:

    192.168.125.141 hadoop-namenode
    
    • 访问地址:hadoop-namenode:50070;
    配置成功如图所示

    上传测试文件到根目录中:

    hadoop fs -put /srv/ftp/apache-tomcat-9.0.0.M10.tar.gz /
    
    文件上传 文件详情

    动态扩充DataNode

    1.【hadoop-namenode】修改hosts文件(此处已经在namenode主机中修改过了,如果未修改此处必须进行修改):vim /etc/hosts

    192.168.125.147 hadoop-datanode-back
    

    2.【hadoop-datanode-back】修改hosts文件:vim /etc/hosts

    127.0.0.1       localhost
    127.0.1.1       localhost
    192.168.125.141         hadoop-namenode
    192.168.125.142         hadoop-resourcemanager
    192.168.125.147         hadoop-datanode-back
    

    备注:因为datanode-back主机一旦要进行添加肯定向namenode主机注册(DataNode),以及向resourcesmanager主机注册(NodeManager)。

    3.【hadoop-datanode-back】分别启动两个进程:
    启动datanode进程:/usr/local/hadoop/sbin/hadoop-daemon.sh start datanode
    启动nodemanager进程:/usr/local/hadoop/sbin/yarn-daemon.sh start nodemanager

    动态扩展结果

    相关文章

      网友评论

        本文标题:Hadoop安装配置

        本文链接:https://www.haomeiwen.com/subject/yverfxtx.html