美文网首页
Hadoop环境搭建

Hadoop环境搭建

作者: 鹅鹅鹅_ | 来源:发表于2019-01-01 11:41 被阅读0次

    一、创建Hadoop用户


    创建hadoop用户,并分配以用户名为家目录的目录/home/hadoop,将其赋予sudo权限。创建好用户之后,以hadoop用户登录:

    useradd -m hadoop -s /bin/bash
    passwd hadoop
    su - hadoop
    

    二、安装JDK以及Hadoop


    1.java环境

    若系统中有java 1.7+,则可以不必安装JDK,否则需要安装。查看系统是否存在jdk以及其版本,可使用如下命令:

    [hadoop@localhost hadoop-2.7.3]$ java -version
    java version "1.7.0_79"
    OpenJDK Runtime Environment (rhel-2.5.5.4.el6-x86_64 u79-b14)
    OpenJDK 64-Bit Server VM (build 24.79-b02, mixed mode)
    [hadoop@localhost hadoop-2.7.3]$
    

    如上命令输出说明系统当前已安装java 1.7+

    2.Hadoop下载

    到官方网站:http://hadoop.apache.org/releases.html#Download 下载binary包,注意,一定要下载200多M的binary包,,而不是源代码src包,因为下载源代码的话,就需要自己编译了,过程很麻烦。
    下载完后,将其解压到hadoop用户的home目录:

    tar xfv hadoop-2.7.3.tar.gz
    

    可以看一下目录下的内容:

    [hadoop@localhost ~]$ cd hadoop-2.7.3/
    [hadoop@localhost hadoop-2.7.3]$ ls
    bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
    [hadoop@localhost hadoop-2.7.3]$
    

    目录内容还是很简洁明了的。

    Hadoop简单配置

    为了测试Hadoop命令是否可以正常运行,这里需要先告诉Hadoop系统java环境。
    需要编辑Hadoop解压目录中的文件 etc/hadoop/hadoop-env.sh:

    export JAVA_HOME=${JAVA_HOME}
    

    可见,若系统已经配置了JAVA_HOME变量,就会使用系统当前JAVA_HOME配置。但是系统当前并没有配置JAVA_HOME变量,导致Hadoop命令也无法运行:

    [hadoop@localhost hadoop-2.7.3]$ ls
    bin  etc  include  lib  libexec  LICENSE.txt  NOTICE.txt  README.txt  sbin  share
    [hadoop@localhost hadoop-2.7.3]$ echo $JAVA_HOME
    
    [hadoop@localhost hadoop-2.7.3]$ ./bin/hadoop
    Error: JAVA_HOME is not set and could not be found.
    [hadoop@localhost hadoop-2.7.3]$ 
    [hadoop@localhost hadoop-2.7.3]$ source ./etc/hadoop/hadoop-env.sh 
    [hadoop@localhost hadoop-2.7.3]$ ./bin/hadoop
    Error: JAVA_HOME is not set and could not be found.
    [hadoop@localhost hadoop-2.7.3]$ 
    

    好,接下来的任务就是确定系统当前JAVA_HOME等参数的值:

    #首先执行whereis java,确定java可执行文件的位置
    [hadoop@localhost hadoop-2.7.3]$ whereis java
    java: /usr/bin/java /etc/java /usr/lib/java /usr/lib64/java /usr/share/java /usr/share/man/man1/java.1.gz
    #可见,java可执行文件在/usr/bin目录下,果不其然,是个链接文件,链接到了/etc/alternatives/java
    [hadoop@localhost hadoop-2.7.3]$ ll /usr/bin/java
    lrwxrwxrwx. 1 root root 22 11月 30 22:55 /usr/bin/java -> /etc/alternatives/java
    #仍然是果不其然,/etc/alternatives/java链接到了/usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java文件,从文件路径来看,/usr/lib/jvm/jre-1.7.0-openjdk.x86_64目录应该就是JAVA_HOME了
    [hadoop@localhost hadoop-2.7.3]$ ll /etc/alternatives/java
    lrwxrwxrwx. 1 root root 46 11月 30 22:55 /etc/alternatives/java -> /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/java
    #我们来看看/usr/lib/jvm/jre-1.7.0-openjdk.x86_64目录下都有啥
    [hadoop@localhost hadoop-2.7.3]$ ls /usr/lib/jvm/jre-1.7.0-openjdk.x86_64
    bin  lib
    [hadoop@localhost hadoop-2.7.3]$ ls /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/lib/
    accessibility.properties  charsets.jar              ext                   jsse.jar              meta-index            rhino.jar         zi
    amd64                     classlist                 flavormap.properties  jvm.hprof.txt         net.properties        rt.jar
    applet                    cmm                       images                logging.properties    psfontj2d.properties  security
    audio                     content-types.properties  jce.jar               management            psfont.properties.ja  sound.properties
    calendars.properties      currency.data             jexec                 management-agent.jar  resources.jar         tz.properties
    [hadoop@localhost hadoop-2.7.3]$ ls /usr/lib/jvm/jre-1.7.0-openjdk.x86_64/bin/
    java  keytool  orbd  pack200  policytool  rmid  rmiregistry  servertool  tnameserv  unpack200
    #好像并不存在我们期待的jre目录,好,再来看看/usr/lib/jvm/目录
    [hadoop@localhost hadoop-2.7.3]$ ls /usr/lib/jvm/
    java                                java-1.8.0                                     jre-1.6.0-openjdk.x86_64
    java-1.5.0-gcj-1.5.0.0              java-1.8.0-openjdk                             jre-1.7.0
    java-1.6.0                          java-1.8.0-openjdk-1.8.0.45-35.b13.el6.x86_64  jre-1.7.0-openjdk.x86_64
    java-1.6.0-openjdk-1.6.0.35.x86_64  java-openjdk                                   jre-1.8.0
    java-1.6.0-openjdk.x86_64           jre                                            jre-1.8.0-openjdk
    java-1.7.0                          jre-1.5.0                                      jre-1.8.0-openjdk-1.8.0.45-35.b13.el6.x86_64
    java-1.7.0-openjdk-1.7.0.79.x86_64  jre-1.5.0-gcj                                  jre-gcj
    java-1.7.0-openjdk.x86_64           jre-1.6.0                                      jre-openjdk
    #因为当前java -version是1.7,那就看看java-1.7.0-openjdk-1.7.0.79.x86_64目录吧
    [hadoop@localhost hadoop-2.7.3]$ ls /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64/
    ASSEMBLY_EXCEPTION  bin  include  jre  lib  LICENSE  tapset  THIRD_PARTY_README
    [hadoop@localhost hadoop-2.7.3]$ ls /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64/jre/
    bin  lib
    [hadoop@localhost hadoop-2.7.3]$ ls /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64/jre/lib/
    accessibility.properties  charsets.jar              ext                   jsse.jar              meta-index            rhino.jar         zi
    amd64                     classlist                 flavormap.properties  jvm.hprof.txt         net.properties        rt.jar
    applet                    cmm                       images                logging.properties    psfontj2d.properties  security
    audio                     content-types.properties  jce.jar               management            psfont.properties.ja  sound.properties
    calendars.properties      currency.data             jexec                 management-agent.jar  resources.jar         tz.properties
    [hadoop@localhost hadoop-2.7.3]$ ls /usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64/jre/bin
    java  keytool  orbd  pack200  policytool  rmid  rmiregistry  servertool  tnameserv  unpack200
    #的确是我们需要的jre目录!所以,基本确定/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64就是我们要找的JAVA_HOME目录了
    

    最后设置hadoop-env.sh中的JAVA_HOME变量

    [hadoop@localhost hadoop-2.7.3]$ vim etc/hadoop/hadoop-env.sh 
    [hadoop@localhost hadoop-2.7.3]$ ./bin/hadoop
    Error: JAVA_HOME is not set and could not be found.
    [hadoop@localhost hadoop-2.7.3]$ source etc/hadoop/hadoop-env.sh 
    [hadoop@localhost hadoop-2.7.3]$ ./bin/hadoop
    Usage: hadoop [--config confdir] [COMMAND | CLASSNAME]
      CLASSNAME            run the class named CLASSNAME
     or
      where COMMAND is one of:
      fs                   run a generic filesystem user client
      version              print the version
      jar <jar>            run a jar file
                           note: please use "yarn jar" to launch
                                 YARN applications, not this command.
      checknative [-a|-h]  check native hadoop and compression libraries availability
      distcp <srcurl> <desturl> copy file or directories recursively
      archive -archiveName NAME -p <parent path> <src>* <dest> create a hadoop archive
      classpath            prints the class path needed to get the
      credential           interact with credential providers
                           Hadoop jar and the required libraries
      daemonlog            get/set the log level for each daemon
      trace                view and modify Hadoop tracing settings
    
    Most commands print help when invoked w/o parameters.
    [hadoop@localhost hadoop-2.7.3]$ 
    
    

    可以看到,运行hadoop命令成功输出了帮助信息。但是为了方便维护和模块化,还是应该将java环境变量配置到系统脚本中,而在hadoop脚本中使用JAVA_HOME=${JAVA_HOME}的形式。
    编辑并加入以下信息到/etc/profile文件

    export HADOOP_HOME=/home/hadoop/hadoop-2.7.3
    export JAVA_HOME=/usr/lib/jvm/java-1.7.0-openjdk-1.7.0.79.x86_64
    export JRE_HOME=${JAVA_HOME}/jre
    export CLASSPATH=.:${JAVA_HOME}/lib:${JRE_HOME}/lib
    export PATH=$PATH:${JAVA_HOME}/bin:${JRE_HOME}/bin:${HADOOP_HOME}/bin:${HADOOP_HOME}/sbin
    

    执行以下命令使之生效:

    source /etc/profile
    or
    . /etc/profile
    

    三、测试官方Demo


    使用下列命令测试官方Demo

    [hadoop@localhost hadoop-2.7.3]$cd `echo $HADOOP_HOME` # 到hadoop安装路径
    [hadoop@localhost hadoop-2.7.3]$mkdir ./input
    [hadoop@localhost hadoop-2.7.3]$./bin/hadoop jar ./share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep ./input ./output 'dfs[a-z.]+'
    #查看输出结果
    [hadoop@localhost hadoop-2.7.3]$ cat output/part-r-00000 
    1   dfsadmin
    

    四、Hadoop的伪分布式环境搭建


    什么是伪分布式?Hadoop伪分布式模式是在一台机器上模拟Hadoop分布式,单机上的分布式并不是真正的分布式,而是使用线程模拟的分布式。分布式和伪分布式这两种配置也很相似,唯一不同的地方是伪分布式是在一台机器上配置,也就是名字节点(namenode)和数据节点(datanode)均是同一台机器。
    需要配置的文件有core-site.xml和hdfs-site.xml这两个文件他们都位于${HADOOP_HOME}/etc/hadoop/文件夹下。
    其中core-site.xml:

    <configuration>
        <property>
         <name>hadoop.tmp.dir</name>
         <value>file:/home/hadoop/tmp</value>
         <description>Abase for other temporary directories.</description>
       </property>
       <property>
         <name>fs.default.name</name>
         <value>hdfs://localhost:9000</value>
       </property>
    </configuration> 
    

    文件hdfs-site.xml的配置如下:

    <configuration>
       <property>
         <name>dfs.replication</name>
         <value>1</value>
       </property>
       <property>
         <name>dfs.namenode.name.dir</name>
         <value>file:/home/hadoop/tmp/dfs/name</value>
       </property>
       <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:/home/hadoop/tmp/dfs/data</value>
       </property>    
    </configuration>
    
    

    配置完成后,执行格式化命令,使HDFS将制定的目录进行格式化:

    #其实由于配置了hadoop的环境变量,hadoop命令可以在任意目录下执行
    [hadoop@localhost hadoop-2.7.3]$ hdfs namenode -format
    

    若成功执行,则会看到包含下列信息的输出:

    17/03/18 04:15:49 INFO util.ExitUtil: Exiting with status 0
    

    在配置文件指定的HDFS目录下也产生了相应的文件

    [hadoop@localhost ~]$ ls
    hadoop-2.7.3  tmp
    [hadoop@localhost ~]$ ls tmp/
    dfs
    [hadoop@localhost ~]$ ls tmp/dfs/
    name
    [hadoop@localhost ~]$ ls tmp/dfs/name/
    current
    [hadoop@localhost ~]$ ls tmp/dfs/name/current/
    fsimage_0000000000000000000  fsimage_0000000000000000000.md5  seen_txid  VERSION
    [hadoop@localhost ~]$ pwd
    /home/hadoop
    [hadoop@localhost ~]$ 
    
    

    五、启动HDFS


    启动HDFS的脚本位于Hadoop目录下的sbin文件夹中,当然,由于配置了hadoop的系统path变量,也可以直接执行命令:

    [hadoop@localhost ~]$ start-dfs.sh 
    

    执行脚本时发现提示要求输入登录密码,还是配置免密码登录吧

    [hadoop@localhost ~]$ ssh-keygen -t rsa
    ...
    [hadoop@localhost ~]$ cp ~/.ssh/id_rsa.pub ~/.ssh/authorized_keys
    

    然后再次执行脚本

    [hadoop@localhost ~]$ start-dfs.sh 
    Starting namenodes on [localhost]
    localhost: starting namenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-namenode-localhost.localdomain.out
    localhost: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-localhost.localdomain.out
    Starting secondary namenodes [0.0.0.0]
    0.0.0.0: starting secondarynamenode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-secondarynamenode-localhost.localdomain.out
    [hadoop@localhost ~]$ 
    

    用jps命令验证HDFS是否成功启动

    [hadoop@localhost ~]$ jps
    110998 Jps
    110865 SecondaryNameNode
    110143 NameNode
    110292 DataNode
    [hadoop@localhost ~]$ 
    

    如何关闭HDFS呢?执行如下脚本命令:

    [hadoop@localhost ~]$ stop-dfs.sh 
    Stopping namenodes on [localhost]
    localhost: stopping namenode
    localhost: stopping datanode
    Stopping secondary namenodes [0.0.0.0]
    0.0.0.0: stopping secondarynamenode
    [hadoop@localhost ~]$ 
    
    

    最后,可以通过浏览器查看hadoop的状态:ip:50070

    六、运行伪分布式实例


    以上第四节只是使用的是本机的源生文件运行的测试Demo实例。既然搭建好了伪分布式的环境,那就使用分布式上存储(HDFS)的数据来进行一次Demo测试:

    • 先将数据源搞定,也就是仿照“四”中的Demo一样,新建一个文件夹作为数据源目录:
    #在HDF文件系统中建立input目录
    [hadoop@localhost ~]$ hdfs dfs -mkdir /input
    #查看HDFS文件系统中的目录
    [hadoop@localhost ~]$ hdfs dfs -ls /
    Found 1 items
    drwxr-xr-x   - hadoop supergroup          0 2017-03-18 05:05 /input
    
    
    • 然后上传本地数据到HDFS中去
    #将本地的文件上传到HDFS文件系统中去
    [hadoop@localhost ~]$ hdfs dfs -put `echo $HADOOP_HOME`/etc/hadoop/*.xml /input
    
    
    • 查看上传的文件
    [hadoop@localhost ~]$ hdfs dfs -ls /input
    Found 8 items
    -rw-r--r--   1 hadoop supergroup       4436 2017-03-18 05:09 /input/capacity-scheduler.xml
    -rw-r--r--   1 hadoop supergroup       1055 2017-03-18 05:09 /input/core-site.xml
    -rw-r--r--   1 hadoop supergroup       9683 2017-03-18 05:09 /input/hadoop-policy.xml
    -rw-r--r--   1 hadoop supergroup       1173 2017-03-18 05:09 /input/hdfs-site.xml
    -rw-r--r--   1 hadoop supergroup        620 2017-03-18 05:09 /input/httpfs-site.xml
    -rw-r--r--   1 hadoop supergroup       3518 2017-03-18 05:09 /input/kms-acls.xml
    -rw-r--r--   1 hadoop supergroup       5511 2017-03-18 05:09 /input/kms-site.xml
    -rw-r--r--   1 hadoop supergroup        690 2017-03-18 05:09 /input/yarn-site.xml
    
    • 再次运行如之前运行的那个Demo
    #其实这行命令挺容易理解的,既是在hadoop分布式环境中运行了一个grep命令,这里是多线程
    [hadoop@localhost ~]$ hadoop jar /home/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep /input /output 'dfs[a-z.]+'
    
    • 查看运行结果
    [hadoop@localhost ~]$ hdfs dfs -ls /output
    Found 2 items
    -rw-r--r--   1 hadoop supergroup          0 2017-03-18 05:16 /output/_SUCCESS
    -rw-r--r--   1 hadoop supergroup         77 2017-03-18 05:16 /output/part-r-00000
    [hadoop@localhost ~]$ hdfs dfs -cat /output/part-r-00000
    1   dfsadmin
    1   dfs.replication
    1   dfs.namenode.name.dir
    1   dfs.datanode.data.dir
    [hadoop@localhost ~]$ 
    
    • 也可以直接运行官方实例pi:
    [hadoop@master ~]$ hadoop jar /home/hadoop/hadoop-2.7.3/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.3.jar grep input output 'dfs[a-z.]+'
    
    

    七、Hadoop集群安装


    • 上面安装配置的机器作为master,然后再按照类似的步骤部署两台slave01和slave02。

    再来总结一下部署的步骤吧:

    首先建立hadoop用户
    然后将hadoop安装文件解压或者拷贝到hadoop用户目录下,当然,也可以是/usr/local/etc下
    配置java环境以及hadoop环境
    最后创建HDFS文件系统,若是从之前配置好的机器拷贝而来,则几乎不需要创建
    
    • 配置master
      为便于区别master和slave,将作为master的主机名改为”master“,修改/etc/sysconfig/network文件,将里面以前的名称替换成‘master’:
    [hadoop@localhost ~]$ sudo vim /etc/sysconfig/network
    NETWORKING=yes
    HOSTNAME=master
    [hadoop@localhost ~]$ sudo /etc/init.d.network restart
    
    • 修改master以及所有slave主机上的IP地址映射关系,添加master机器的IP以及slave机器的IP及对应的机器名称:
    [hadoop@localhost ~]$ sudo vim /etc/hosts
    10.10.18.236 master
    10.10.18.221 slave01
    10.10.19.231 slave02
    
    
    • 验证一下是否能互相ping通
      master主机上ping所有:
    [hadoop@localhost hadoop-2.7.3]$ ping slave01
    PING slave01 (10.10.18.221) 56(84) bytes of data.
    64 bytes from slave01 (10.10.18.221): icmp_seq=1 ttl=64 time=0.208 ms
    64 bytes from slave01 (10.10.18.221): icmp_seq=2 ttl=64 time=0.135 ms
    [hadoop@localhost hadoop-2.7.3]$ ping slave02
    PING slave02 (10.10.19.231) 56(84) bytes of data.
    64 bytes from slave02 (10.10.19.231): icmp_seq=1 ttl=64 time=0.141 ms
    64 bytes from slave02 (10.10.19.231): icmp_seq=2 ttl=64 time=0.158 ms
    
    
    • master配置SSH无密码登陆slave节点
      这个操作是要让master节点可以无需密码通过SSH登陆到各个slave节点上
    #将master主机上生成的公钥复制到slave01和slave02上
    [hadoop@localhost hadoop-2.7.3]$ scp ~/.ssh/id_rsa.pub slave01:~/.ssh/authorized_keys
    [hadoop@localhost hadoop-2.7.3]$ scp ~/.ssh/id_rsa.pub slave02:~/.ssh/authorized_keys
    

    八、配置集群/分布式环境


    • 配置分布式集群环境需对以下几个文件进行配置:
      slaves: 文件 slaves,配置datanode的主机名,每行一个,默认为 localhost,所以在伪分布式配置时,节点即作为namenode也作为datanode。分布式配置可以保留localhost,也可以删掉,让master节点仅作为namenode使用。
      现配置两个slave则在该文件中编辑如下字段:
    #这里只需要配置master主机上的slaves文件即可?
    [hadoop@localhost hadoop-2.7.3]$ vim etc/hadoop/slaves 
    slave01
    slave02
    

    core-site.xml:(注意这里namenode属性要改为master,后面还有将配置文件复制到slave中去)

    <configuration>
        <property>
         <name>hadoop.tmp.dir</name>  <!-- Hadoop的默认临时文件存放路径 -->
         <value>file:/home/hadoop/tmp</value>
         <description>Abase for other temporary directories.</description>
       </property>
       <property>
         <name>fs.default.name</name> <!-- namenode的URI -->
         <value>hdfs://master:9000</value>
       </property>
    </configuration>
    
    

    hdfs-site.xml:

     <configuration>
       <property>
         <name>dfs.replication</name> <!-- 数据节点个数 -->
         <value>2</value>
       </property>
       <property>
         <name>dfs.namenode.name.dir</name> <!--namenode节点的namenode存储URL  -->
         <value>file:/home/hadoop/tmp/dfs/name</value>
       </property>
       <property>
         <name>dfs.datanode.data.dir</name>
         <value>file:/home/hadoop/tmp/dfs/data</value>
       </property>
       <property>
         <name>dfs.namenode.secondary.http-address</name>
         <value>master:50090</value>
       </property>
    </configuration>    
    
    

    mapred-site.xml,该文件一开始为一个模版,所以先拷贝并重命名一份:

    cp etc/hadoop/mapred-site.xml.template etc/hadoop/mapred-site.xml
    [hadoop@localhost hadoop-2.7.3]$ vim etc/hadoop/mapred-site.xml
    
     <configuration>
       <property>
         <name>mapreduce.framework.name</name>
         <value>yarn</value>
       </property>
       <property>
         <name>mapreduce.jobhistory.address</name>
         <value>master:10020</value>
       </property>
       <property>
         <name>mapreduce.jobhistory.webapp.address</name>
         <value>master:19888</value>
       </property>
    </configuration>  
    
    

    yarn-site.xml:

     <configuration>
     <!-- Site specific YARN configuration properties -->
       <property>
         <name>yarn.resourcemanager.hostname</name>
         <value>master</value>
       </property>
       <property>
         <name>yarn.nodemanager.aux-services</name>
         <value>mapreduce_shuffle</value>
       </property>
    </configuration>
    
    

    这些配置文件其他的相关配置可参考官方文档。配置好后,因为之前有跑过伪分布式模式,建议在切换到集群模式前先删除之前的临时文件:

    [hadoop@localhost ~]$ ls
    hadoop-2.7.3  tmp
    [hadoop@localhost ~]$ rm -rf tmp/
    [hadoop@localhost ~]$ cd hadoop-2.7.3
    [hadoop@localhost hadoop-2.7.3]$ scp -r etc slave01:~/hadoop-2.7.3/
    [hadoop@localhost hadoop-2.7.3]$ scp -r etc slave02:~/hadoop-2.7.3/
    
    
    

    再将配置好的master上的/usr/local/etc/hadoop文件夹复制到各个节点上(也就是覆盖原来的slave节点上安装的hadoop)。

    • 以上步骤完毕后,首次启动需要先在master节点执行namenode的格式化:
    hdfs namenode -format # 首次运行需要执行初始化,之后并不需要
    
    • 接着可以启动hadoop了,启动需要在master节点上进行:
    [hadoop@localhost ~]$ start-dfs.sh
    Starting namenodes on [master]
    The authenticity of host 'master (10.10.18.236)' can't be established.
    RSA key fingerprint is 90:18:2d:a7:a7:0d:d9:b7:9e:1d:d3:59:17:fa:ea:3f.
    Are you sure you want to continue connecting (yes/no)? yes
    master: Warning: Permanently added 'master' (RSA) to the list of known hosts.
    master: namenode running as process 111555. Stop it first.
    slave02: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-localhost.localdomain.out
    slave01: starting datanode, logging to /home/hadoop/hadoop-2.7.3/logs/hadoop-hadoop-datanode-localhost.localdomain.out
    localhost: datanode running as process 111711. Stop it first.
    Starting secondary namenodes [master]
    master: secondarynamenode running as process 111895. Stop it first.
    [hadoop@localhost ~]$ start-yarn.sh
    starting yarn daemons
    starting resourcemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-resourcemanager-localhost.localdomain.out
    slave02: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-localhost.localdomain.out
    slave01: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-localhost.localdomain.out
    localhost: starting nodemanager, logging to /home/hadoop/hadoop-2.7.3/logs/yarn-hadoop-nodemanager-localhost.localdomain.out
    [hadoop@localhost ~]$ mr-jobhistory-daemon.sh start historyserver
    starting historyserver, logging to /home/hadoop/hadoop-2.7.3/logs/mapred-hadoop-historyserver-localhost.localdomain.out
    [hadoop@localhost ~]$ 
    
    
    • 使用jps分别到master和slave 查看启动之后的状态:
    #master
    [hadoop@localhost ~]$ jps
    116498 Jps
    111555 NameNode
    116440 JobHistoryServer
    111895 SecondaryNameNode
    111711 DataNode
    116134 NodeManager
    116010 ResourceManager
    #slave01
    [hadoop@localhost ~]$ jps
    6360 DataNode
    6495 NodeManager
    6665 Jps
    [hadoop@localhost ~]$ 
    #slave02
    [hadoop@localhost ~]$ jps
    12177 NodeManager
    12027 DataNode
    12361 Jps
    [hadoop@localhost ~]$ 
    
    
    • 缺少任一进程都表示出错。另外还需要在 master 节点上通过命令hdfs dfsadmin -report -live查看 datanode 是否正常启动,如果 Live datanodes 不为 0 ,则说明集群启动成功。例如在此配置了两个datanode,则这边一共有 2 个 datanodes:
    [hadoop@localhost ~]$ hdfs dfsadmin -report -live
    Configured Capacity: 5197119221760 (4.73 TB)
    Present Capacity: 4102764367872 (3.73 TB)
    DFS Remaining: 4102764318720 (3.73 TB)
    DFS Used: 49152 (48 KB)
    DFS Used%: 0.00%
    Under replicated blocks: 0
    Blocks with corrupt replicas: 0
    Missing blocks: 0
    Missing blocks (with replication factor 1): 0
    
    -------------------------------------------------
    Live datanodes (2):
    
    Name: 10.10.19.231:50010 (slave02)
    Hostname: localhost
    Decommission Status : Normal
    Configured Capacity: 3484317450240 (3.17 TB)
    DFS Used: 24576 (24 KB)
    Non DFS Used: 212999049216 (198.37 GB)
    DFS Remaining: 3271318376448 (2.98 TB)
    DFS Used%: 0.00%
    DFS Remaining%: 93.89%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Sat Mar 18 20:28:54 CST 2017
    
    
    Name: 10.10.18.221:50010 (slave01)
    Hostname: localhost
    Decommission Status : Normal
    Configured Capacity: 1712801771520 (1.56 TB)
    DFS Used: 24576 (24 KB)
    Non DFS Used: 881355804672 (820.83 GB)
    DFS Remaining: 831445942272 (774.34 GB)
    DFS Used%: 0.00%
    DFS Remaining%: 48.54%
    Configured Cache Capacity: 0 (0 B)
    Cache Used: 0 (0 B)
    Cache Remaining: 0 (0 B)
    Cache Used%: 100.00%
    Cache Remaining%: 0.00%
    Xceivers: 1
    Last contact: Sat Mar 18 20:28:54 CST 2017
    
    

    刚开始遇到了如下的问题

    [hadoop@localhost hadoop-2.7.3]$ hdfs dfsadmin -report -live
    report: Call From java.net.UnknownHostException: localhost.localdomain: localhost.localdomain to master:9000 failed on connection exception: java.net.ConnectException: 拒绝连接; For more details see:  http://wiki.apache.org/hadoop/ConnectionRefused
    
    

    发现是因为master主机修改hostname后没有重启网卡使之生效。然后我重启了网卡,并重新格式化了HDFS。同步了机器时间

    • 验证分布式HDFS,即在HDFS中建立一个测试目录,如果在每个节点上都能看到这个目录,那就说明成功了
    #master
    [hadoop@localhost ~]$ hdfs dfs -mkdir /test
    [hadoop@localhost ~]$ hdfs dfs -ls /
    Found 2 items
    drwxr-xr-x   - hadoop supergroup          0 2017-03-18 20:33 /test
    drwxrwx---   - hadoop supergroup          0 2017-03-18 20:27 /tmp
    #slave01
    [hadoop@localhost ~]$ hdfs dfs -ls /
    Found 2 items
    drwxr-xr-x   - hadoop supergroup          0 2017-03-18 20:33 /test
    drwxrwx---   - hadoop supergroup          0 2017-03-18 20:27 /tmp
    [hadoop@localhost ~]$ 
    #slave02
    [hadoop@localhost ~]$ hdfs dfs -ls /
    Found 2 items
    drwxr-xr-x   - hadoop supergroup          0 2017-03-18 20:33 /test
    drwxrwx---   - hadoop supergroup          0 2017-03-18 20:27 /tmp
    [hadoop@localhost ~]$ 
    
    

    相关文章

      网友评论

          本文标题:Hadoop环境搭建

          本文链接:https://www.haomeiwen.com/subject/gslnlqtx.html